arXiv論文一覧 - stat.ML updates on arXiv.org

#1 Learning Optimal Distributionally Robust Individualized Treatment Rules Integrating Multi-Source Data

著者: Wenhai Cui, Wen Su, Xingqiu Zhao

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.05568

要約:
Integrative analysis of multiple datasets for estimating optimal individualized treatment rules (ITRs) can enhance decision efficiency. A central challenge is posterior shift, wherein the conditional distribution of potential outcomes given covariates differs between source and target populations. We propose a prior information-based distributionally robust ITR (PDRO-ITR) that maximizes the worst-case policy value over a covariate-dependent distributional uncertainty set, ensuring robust performance under posterior shift. The uncertainty set is constructed as an individualized combination of source distributions, with weights combining prior source-membership probabilities and deviation terms constrained to the probability simplex to accommodate posterior shift. We derive a closed-form solution for the PDRO-ITR and develop an adaptive procedure to tune the uncertainty level. We establish risk bounds for the PDRO-ITR estimator, which guarantees robust performance under the worst case. Extensive simulations and two real-data applications demonstrate that the proposed method achieves superior performance compared to existing approaches.

#2 Prediction-Powered Conditional Inference

著者: Yang Sui, Jin Zhou, Hua Zhou, Xiaowu Dai

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.05575

要約:
We study prediction-powered conditional inference in the setting where labeled data are scarce, unlabeled covariates are abundant, and a black-box machine-learning predictor is available. The goal is to perform statistical inference on conditional functionals evaluated at a fixed test point, such as conditional means, without imposing a parametric model for the conditional relationship. Our approach combines localization with prediction-based variance reduction. First, we introduce a reproducing kernel-based localization method that learns a data-adaptive weight function from covariates and reformulates the target conditional moment at the test point as a weighted unconditional moment. Second, we incorporate machine-learning predictions through a correction-based decomposition of this localized moment, yielding a prediction-powered estimator and confidence interval that reduce variance when the predictor is informative while preserving validity regardless of predictor accuracy. We establish nonasymptotic error bounds and minimax-optimal convergence rates for the resulting estimator, prove pointwise asymptotic normality with consistent variance estimation, and provide an explicit variance decomposition that characterizes how machine-learning predictions and unlabeled covariates improve statistical efficiency. Numerical experiments on simulated and real datasets demonstrate valid conditional coverage and substantially sharper confidence intervals than alternative methods.

#3 SPPCSO: Adaptive Penalized Estimation Method for High-Dimensional Correlated Data

著者: Ying Hu, Hu Yang

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.06251

要約:
With the rise of high-dimensional correlated data, multicollinearity poses a significant challenge to model stability, often leading to unstable estimation and reduced predictive accuracy. This work proposes the Single-Parametric Principal Component Selection Operator (SPPCSO), an innovative penalized estimation method that integrates single-parametric principal component regression and $L_{1}$ regularization to adaptively adjust the shrinkage factor by incorporating principal component information. This approach achieves a balance between variable selection and coefficient estimation, ensuring model stability and robust estimation even in high-dimensional, high-noise environments. The primary contribution lies in addressing the instability of traditional variable selection methods when applied to high-noise, high-dimensional correlated data. Theoretically, our method exhibits selection consistency and achieves a smaller estimation error bound compared to traditional penalized estimation approaches. Extensive numerical experiments demonstrate that SPPCSO not only delivers stable and reliable estimation in high-noise settings but also accurately distinguishes signal variables from noise variables in group-effect structured data with highly correlated noise variables, effectively eliminating redundant variables and achieving more stable variable selection. Furthermore, SPPCSO successfully identifies disease-associated genes in gene expression data analysis, showcasing strong practical value. The results indicate that SPPCSO serves as an ideal tool for high-dimensional variable selection, offering an efficient and interpretable solution for modeling correlated data.

#4 Robust support vector model based on bounded asymmetric elastic net loss for binary classification

著者: Haiyan Du, Hu Yang

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.06257

要約:
In this paper, we propose a novel bounded asymmetric elastic net ($L_{baen}$) loss function and combine it with the support vector machine (SVM), resulting in the BAEN-SVM. The $L_{baen}$ is bounded and asymmetric and can degrade to the asymmetric elastic net hinge loss, pinball loss, and asymmetric least squares loss. BAEN-SVM not only effectively handles noise-contaminated data but also addresses the geometric irrationalities in the traditional SVM. By proving the violation tolerance upper bound (VTUB) of BAEN-SVM, we show that the model is geometrically well-defined. Furthermore, we derive that the influence function of BAEN-SVM is bounded, providing a theoretical guarantee of its robustness to noise. The Fisher consistency of the model further ensures its generalization capability. Since the $ L_{\text{baen}} $ loss is non-convex, we designed a clipping dual coordinate descent-based half-quadratic algorithm to solve the non-convex optimization problem efficiently. Experimental results on artificial and benchmark datasets indicate that the proposed method outperforms classical and advanced SVMs, particularly in noisy environments.

#5 Semantics-Aware Caching for Concept Learning

著者: Louis Mozart Kamdem Teyou, Caglar Demir, Axel-Cyrille Ngonga Ngomo

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.06506

要約:
Concept learning is a form of supervised machine learning that operates on knowledge bases in description logics. State-of-the-art concept learners often rely on an iterative search through a countably infinite concept space. In each iteration, they retrieve instances of candidate solutions to select the best concept for the next iteration. While simple learning problems might require a few dozen instance retrieval calls to find a fitting solution, complex learning problems might necessitate thousands of calls. We alleviate the resulting runtime challenge by presenting a semantics-aware caching approach. Our cache is essentially a subsumption-aware map that links concepts to a set of instances via crisp set operations. Our experiments on 5 datasets with 4 symbolic reasoners, a neuro-symbolic reasoner, and 5 popular pagination policies demonstrate that our cache can reduce the runtime of concept retrieval and concept learning by an order of magnitude while being effective for both symbolic and neuro-symbolic reasoners.

#6 Behavior-dLDS: A decomposed linear dynamical systems model for neural activity partially constrained by behavior

著者: Eva Yezerets, En Yang, Misha B. Ahrens, Adam S. Charles

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.05612

要約:
Brain-wide recordings of large-scale networks of neurons now provide an unprecedented view into how the brain drives behavior. However, brain activity contains both information directly related to behavior as well as the potential for many internal computations. Moreover, observable behavior is executed not only by the brain, but also by the spinal cord and peripheral nervous system. Behavior is a coarse-grained product of neural activity, and we thus take the view that it can be best represented by lower-dimensional latent neural dynamics. Capturing this indirect relationship while disambiguating behavior-generating networks from internal computations running in parallel requires new modeling approaches that can embody the parallel and distributed nature of large-scale neural populations. We thus present behavior-decomposed linear dynamical systems (b-dLDS) to disentangle simultaneously recorded subsystems and identify how the latent neural subsystems relate to behavior. We demonstrate the ability of b-dLDS to decouple behavioral vs. internal computations on controlled, simulated data, showing improvements over a state-of-the-art model that uses behavior to supervise all dynamics based on behavior. We then show that b-dLDS can further scale up to tens of thousands of neurons by applying our model to large-scale recording of a zebrafish hindbrain during the complex positional homeostasis behavior, wherein b-dLDS highlights behavior-related dynamic connectivity networks.

#7 Improved Scaling Laws via Weak-to-Strong Generalization in Random Feature Ridge Regression

著者: Diyuan Wu, Lehan Chen, Theodor Misiakiewicz, Marco Mondelli

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.05691

要約:
It is increasingly common in machine learning to use learned models to label data and then employ such data to train more capable models. The phenomenon of weak-to-strong generalization exemplifies the advantage of this two-stage procedure: a strong student is trained on imperfect labels obtained from a weak teacher, and yet the strong student outperforms the weak teacher. In this paper, we show that the potential improvement is substantial, in the sense that it affects the scaling law followed by the test error. Specifically, we consider students and teachers trained via random feature ridge regression (RFRR). Our main technical contribution is to derive a deterministic equivalent for the excess test error of the student trained on labels obtained via the teacher. Via this deterministic equivalent, we then identify regimes in which the scaling law of the student improves upon that of the teacher, unveiling that the improvement can be achieved both in bias-dominated and variance-dominated settings. Strikingly, the student may attain the minimax optimal rate regardless of the scaling law of the teacher -- in fact, when the test error of the teacher does not even decay with the sample size.

#8 Design Experiments to Compare Multi-armed Bandit Algorithms

著者: Huiling Meng, Ningyuan Chen, Xuefeng Gao

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.05919

要約:
Online platforms routinely compare multi-armed bandit algorithms, such as UCB and Thompson Sampling, to select the best-performing policy. Unlike standard A/B tests for static treatments, each run of a bandit algorithm over $T$ users produces only one dependent trajectory, because the algorithm's decisions depend on all past interactions. Reliable inference therefore demands many independent restarts of the algorithm, making experimentation costly and delaying deployment decisions. We propose Artificial Replay (AR) as a new experimental design for this problem. AR first runs one policy and records its trajectory. When the second policy is executed, it reuses a recorded reward whenever it selects an action the first policy already took, and queries the real environment only otherwise. We develop a new analytical framework for this design and prove three key properties of the resulting estimator: it is unbiased; it requires only $T + o(T)$ user interactions instead of $2T$ for a run of the treatment and control policies, nearly halving the experimental cost when both policies have sub-linear regret; and its variance grows sub-linearly in $T$, whereas the estimator from a na\"ive design has a linearly-growing variance. Numerical experiments with UCB, Thompson Sampling, and $\epsilon$-greedy policies confirm these theoretical gains.

#9 Large deviation principles for convolutional Bayesian neural networks

著者: Federico Bassetti, Vassili De Palma, Lucia Ladelli

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.06023

要約:
While suitably scaled CNNs with Gaussian initialization are known to converge to Gaussian processes as the number of channels diverges, little is known beyond this Gaussian limit. We establish a large deviation principle (LDP) for convolutional neural networks in the infinite-channel regime. We consider a broad class of multidimensional CNN architectures characterized by general receptive fields encoded through a patch-extractor function satisfying mild structural assumptions. Our main result establishes a large deviation principle (LDP) for the sequence of conditional covariance matrices under Gaussian prior distribution on the weights. We further derive an LDP for the posterior distribution obtained by conditioning on a finite number of observations. In addition, we provide a streamlined proof of the concentration of the conditional covariances and of the Gaussian equivalence of the network. To the best of our knowledge, this is the first large deviation principle established for convolutional neural networks.

#10 Agnostic learning in (almost) optimal time via Gaussian surface area

著者: Lucas Pesenti, Lucas Slot, Manuel Wiedmer

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.06027

要約:
The complexity of learning a concept class under Gaussian marginals in the difficult agnostic model is closely related to its $L_1$-approximability by low-degree polynomials. For any concept class with Gaussian surface area at most $\Gamma$, Klivans et al. (2008) show that degree $d = O(\Gamma^2 / \varepsilon^4)$ suffices to achieve an $\varepsilon$-approximation. This leads to the best-known bounds on the complexity of learning a variety of concept classes. In this note, we improve their analysis by showing that degree $d = \tilde O (\Gamma^2 / \varepsilon^2)$ is enough. In light of lower bounds due to Diakonikolas et al. (2021), this yields (near) optimal bounds on the complexity of agnostically learning polynomial threshold functions in the statistical query model. Our proof relies on a direct analogue of a construction of Feldman et al. (2020), who considered $L_1$-approximation on the Boolean hypercube.

#11 Predictive Coding Graphs are a Superset of Feedforward Neural Networks

著者: Bj\"orn van Zwol

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.06142

要約:
Predictive coding graphs (PCGs) are a recently introduced generalization to predictive coding networks, a neuroscience-inspired probabilistic latent variable model. Here, we prove how PCGs define a mathematical superset of feedforward artificial neural networks (multilayer perceptrons). This positions PCNs more strongly within contemporary machine learning (ML), and reinforces earlier proposals to study the use of non-hierarchical neural networks for ML tasks, and more generally the notion of topology in neural networks.

#12 Gradient Flow Polarizes Softmax Outputs towards Low-Entropy Solutions

著者: Aditya Varre, Mark Rofin, Nicolas Flammarion

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.06248

要約:
Understanding the intricate non-convex training dynamics of softmax-based models is crucial for explaining the empirical success of transformers. In this article, we analyze the gradient flow dynamics of the value-softmax model, defined as ${L}(\mathbf{V} \sigma(\mathbf{a}))$, where $\mathbf{V}$ and $\mathbf{a}$ are a learnable value matrix and attention vector, respectively. As the matrix times softmax vector parameterization constitutes the core building block of self-attention, our analysis provides direct insight into transformer's training dynamics. We reveal that gradient flow on this structure inherently drives the optimization toward solutions characterized by low-entropy outputs. We demonstrate the universality of this polarizing effect across various objectives, including logistic and square loss. Furthermore, we discuss the practical implications of these theoretical results, offering a formal mechanism for empirical phenomena such as attention sinks and massive activations.

#13 Synthetic Monitoring Environments for Reinforcement Learning

著者: Leonard Pleiss, Carolin Schmidt, Maximilian Schiffer

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.06252

要約:
Reinforcement Learning (RL) lacks benchmarks that enable precise, white-box diagnostics of agent behavior. Current environments often entangle complexity factors and lack ground-truth optimality metrics, making it difficult to isolate why algorithms fail. We introduce Synthetic Monitoring Environments (SMEs), an infinite suite of continuous control tasks. SMEs provide fully configurable task characteristics and known optimal policies. As such, SMEs allow for the exact calculation of instantaneous regret. Their rigorous geometric state space bounds allow for systematic within-distribution (WD) and out-of-distribution (OOD) evaluation. We demonstrate the framework's benefit through multidimensional ablations of PPO, TD3, and SAC, revealing how specific environmental properties - such as action or state space size, reward sparsity and complexity of the optimal policy - impact WD and OOD performance. We thereby show that SMEs offer a standardized, transparent testbed for transitioning RL evaluation from empirical benchmarking toward rigorous scientific analysis.

#14 Certified and accurate computation of function space norms of deep neural networks

著者: Johannes Gr\"undler, Moritz Maibaum, Philipp Petersen

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.06431

要約:
Neural network methods for PDEs require reliable error control in function space norms. However, trained neural networks can typically only be probed at a finite number of point values. Without strong assumptions, point evaluations alone do not provide enough information to derive tight deterministic and guaranteed bounds on function space norms. In this work, we move beyond a purely black-box setting and exploit the neural network structure directly. We present a framework for the certified and accurate computation of integral quantities of neural networks, including Lebesgue and Sobolev norms, by combining interval arithmetic enclosures on axis-aligned boxes with adaptive marking/refinement and quadrature-based aggregation. On each box, we compute guaranteed lower and upper bounds for function values and derivatives, and propagate these local certificates to global lower and upper bounds for the target integrals. Our analysis provides a general convergence theorem for such certified adaptive quadrature procedures and instantiates it for function values, Jacobians, and Hessians, yielding certified computation of $L^p$, $W^{1,p}$, and $W^{2,p}$ norms. We further show how these ingredients lead to practical certified bounds for PINN interior residuals. Numerical experiments illustrate the accuracy and practical behavior of the proposed methods.

#15 Bayesian Additive Distribution Regression

著者: Antonio R. Linero, Soumyabrata Bose, Jared Murray

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.06462

要約:
Distribution regression, where the goal is to predict a scalar response from a distribution-valued predictor, arises naturally in settings where observations are grouped and outcomes depend on group-level characteristics rather than on individual measurements. We introduce DistBART, a Bayesian nonparametric approach to distribution regression that models the regression function as a linear functional with the Riesz representer assigned a Bayesian additive regression trees (BART) prior. We argue that shallow decision tree ensembles encode reasonable inductive biases for tabular data, making them appropriate in settings where the functional depends primarily on low-dimensional marginals of the distributions. We show this both empirically on synthetic and real data and theoretically through an adaptive posterior concentration result. We also establish connections to kernel methods, and use this connection to motivate variants of DistBART that can learn nonlinear functionals. To enable scalability to large datasets, we develop a random-feature approximation that samples trees from the BART prior and reduces inference to sparse Bayesian linear regression, achieving computational efficiency while retaining uncertainty quantification.

#16 Kernel VICReg for Self-Supervised Learning in Reproducing Kernel Hilbert Space

著者: M. Hadi Sepanj, Benyamin Ghojogh, Saed Moradi, Paul Fieguth

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2509.07289

要約:
Self-supervised learning (SSL) has emerged as a powerful paradigm for representation learning by optimizing geometric objectives, such as invariance to augmentations, variance preservation, and feature decorrelation, without requiring labels. However, most existing methods operate in Euclidean space, limiting their ability to capture nonlinear dependencies and geometric structures. In this work, we propose Kernel VICReg, a novel self-supervised learning framework that pulls the VICReg objective into a Reproducing Kernel Hilbert Space (RKHS). By kernelizing each term of the loss, variance, invariance, and covariance, we obtain a general formulation that operates on double-centered kernel matrices and Hilbert--Schmidt norms, enabling nonlinear feature learning without explicit mappings. We demonstrate that Kernel VICReg mitigates the risk of representational collapse under challenging conditions and improves performance on datasets exhibiting nonlinear structure or limited sample regimes. Empirical evaluations across MNIST, CIFAR-10, STL-10, TinyImageNet, and ImageNet100 show consistent gains over Euclidean VICReg, with particularly strong improvements on datasets where nonlinear structures are prominent. UMAP visualizations are provided only as a qualitative illustration of embedding geometry and are not used as a calibration or statistical validation. Our results suggest that kernelizing SSL objectives is a promising direction for bridging classical kernel methods with modern representation learning.

#17 Spectral/Spatial Tensor Atomic Cluster Expansion with Universal Embeddings in Cartesian Space

著者: Zemin Xu, Wenbo Xie, P. Hu

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2509.14961

要約:
Equivariant atomistic machine learning models have largely been built on spherical-tensor representations, where explicit angular-momentum coupling introduces substantial complexity and systematic extensions beyond energies and forces remain challenging, often requires problem-specific architectural choices. Here we introduce the Tensor Atomic Cluster Expansion (TACE), which unifies scalar and tensorial modeling in Cartesian and space by decomposing local environments into irreducible Cartesian tensors (ICT) constructing a controlled many-body hierarchy with atomic cluster expansion (ACE). In addition to performing ACE in the frequency domain, we propose an efficient Clebsch-Gordan-free alternative in the spatial domain. TACE provides universal invariant (e.g., fidelity tags and charges) and equivariant (e.g., external electric fields and non-collinear magnetic moments) embeddings and predicted tensorial observables are handled on equal footing and enabling explicit control at inference. We demonstrate the accuracy, stability, and efficiency across finite molecules and extended materials, including in-domain and out-of-domain benchmarks, spectra, Hessian, external-field responses, charged systems, and multi-fidelity/head training. We further show its robustness on nonequilibrium/reactive datasets and controlled scaling when extending to large foundation model datasets.

#18 Self-Speculative Masked Diffusions

diffusion

著者: Andrew Campbell, Valentin De Bortoli, Jiaxin Shi, Arnaud Doucet

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2510.03929

要約:
We present self-speculative masked diffusions, a new class of masked diffusion generative models for discrete data that require significantly fewer function evaluations to generate samples. Standard masked diffusion models predict factorized logits over currently masked positions. A number of masked positions are then sampled, however, the factorization approximation means that sampling too many positions in one go leads to poor sample quality. As a result, many simulation steps and therefore neural network function evaluations are required to generate high-quality data. We reduce the computational burden by generating non-factorized predictions over masked positions. This is achieved by modifying the final transformer attention mask from non-causal to causal, enabling draft token generation and parallel validation via a novel, model-integrated speculative sampling mechanism. This results in a non-factorized predictive distribution over masked positions in a single forward pass. We apply our method to GPT2 scale text modelling and protein sequence generation, finding that we can achieve a ~2x reduction in the required number of network forward passes relative to standard masked diffusion models.

#19 Escaping Model Collapse via Synthetic Data Verification: Near-term Improvements and Long-term Convergence

synthetic data

著者: Bingji Yi, Qiyuan Liu, Yuwei Cheng, Haifeng Xu

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2510.16657

要約:
Synthetic data has been increasingly used to train frontier generative models. However, recent studies raise key concerns that iteratively retraining a generative model on its self-generated synthetic data may keep deteriorating model performance, a phenomenon often coined model collapse. In this paper, we investigate ways to modify the synthetic retraining process to avoid model collapse, and even possibly help reverse the trend from collapse to improvement. Our key finding is that by injecting information through an external synthetic data verifier, whether a human or a better model, synthetic retraining will not cause model collapse. Specifically, we situate our theoretical analysis in the fundamental linear regression setting, showing that verifier-guided retraining can yield near-term improvements, but ultimately drives the parameter estimate to the verifier's "knowledge center" in the long run. Our theory further predicts that, unless the verifier is perfectly reliable, these early gains will plateau and may even reverse. Indeed, our experiments across linear regression, Variational Autoencoders (VAEs) trained on MNIST, and fining-tuning SmolLM2-135M on the XSUM task confirm these theoretical insights.

#20 DAISI: Data Assimilation with Inverse Sampling using Stochastic Interpolants

著者: Martin Andrae, Erik Larsson, So Takao, Tomas Landelius, Fredrik Lindsten

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2512.00252

要約:
Data assimilation (DA) is a cornerstone of scientific and engineering applications, combining model forecasts with sparse and noisy observations to estimate latent system states. Classical high-dimensional DA methods, such as the ensemble Kalman filter, rely on Gaussian approximations that are violated for complex dynamics or observation operators. To address this limitation, we introduce DAISI, a scalable filtering algorithm built on flow-based generative models that enables flexible probabilistic inference using data-driven priors. The core idea is to use a stationary, pre-trained generative prior that first incorporates forecast information through a novel inverse-sampling step, before assimilating observations via guidance-based conditional sampling. This allows us to leverage any forecasting model as part of the DA pipeline without having to retrain or fine-tune the generative prior at each assimilation step. Experiments on challenging nonlinear systems show that DAISI achieves accurate filtering results in regimes with sparse, noisy, and nonlinear observations where traditional methods struggle.

#21 Expert-Aided Causal Discovery of Ancestral Graphs

著者: Tiago da Silva, Bruna Bazaluk, Eliezer de Souza da Silva, Ant\'onio G\'ois, Salem Lahlou, Dominik Heider, Samuel Kaski, Diego Mesquita, Ad\`ele Helena Ribeiro

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2309.12032

要約:
Causal discovery (CD) is an important component of many scientific applications, yet most techniques produce unreliable point estimates that often contradict expert knowledge. To mitigate this, recent research has focused on ex-ante incorporation of background knowledge into the CD process, typically under an unrealistic causal sufficiency assumption. When probing experts is costly (e.g., hidden behind expensive LLM APIs), however, ex-post model refinement that maximizes query utility is preferable. Also, when independent experts provide conflicting but better-than-random feedback, a principled aggregation method is required. In this context, we introduce the first CD algorithm that enables (i) distributional inference over ancestral graphs (AGs), which represent causal systems under latent confounding, and (ii) integration of both ex-ante and uncertain ex-post expert knowledge. Briefly, our method is a diversity-seeking reinforcement learning algorithm, termed Ancestral GFlowNet (AGFN), whose policy we iteratively refine based on a Bayesian model of the noisy expert feedback. Importantly, we prove convergence to the true AG given sufficiently accurate responses. Through validation on synthetic and realistic datasets using simulated humans and LLMs, we show AGFN is competitive with or superior to strong baselines in terms of structural Hamming distance and Bayesian Information Criterion.

#22 Predictive Coding Networks and Inference Learning: Tutorial and Survey

著者: Bj\"orn van Zwol, Ro Jefferson, Egon L. van den Broek

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2407.04117

要約:
Recent years have witnessed a growing call for renewed emphasis on neuroscience-inspired approaches in artificial intelligence research, under the banner of NeuroAI. A prime example of this is predictive coding networks (PCNs), based on the neuroscientific framework of predictive coding. This framework views the brain as a hierarchical Bayesian inference model that minimizes prediction errors through feedback connections. Unlike traditional neural networks trained with backpropagation (BP), PCNs utilize inference learning (IL), a more biologically plausible algorithm that explains patterns of neural activity that BP cannot. Historically, IL has been more computationally intensive, but recent advancements have demonstrated that it can achieve higher efficiency than BP with sufficient parallelization. Furthermore, PCNs can be mathematically considered a superset of traditional feedforward neural networks (FNNs), significantly extending the range of trainable architectures. As inherently probabilistic (graphical) latent variable models, PCNs provide a versatile framework for both supervised learning and unsupervised (generative) modeling that goes beyond traditional artificial neural networks. This work provides a comprehensive review and detailed formal specification of PCNs, particularly situating them within the context of modern ML methods. This positions PC as a promising framework for future ML innovations.

#23 Theoretical Foundations of Conformal Prediction

著者: Anastasios N. Angelopoulos, Rina Foygel Barber, Stephen Bates

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2411.11824

要約:
This book is about conformal prediction and related inferential techniques that build on permutation tests and exchangeability. These techniques are useful in a diverse array of tasks, including hypothesis testing and providing uncertainty quantification guarantees for machine learning systems. Much of the current interest in conformal prediction is due to its ability to integrate into complex machine learning workflows, solving the problem of forming prediction sets without any assumptions on the form of the data generating distribution. Since contemporary machine learning algorithms have generally proven difficult to analyze directly, conformal prediction's main appeal is its ability to provide formal, finite-sample guarantees when paired with such methods. The goal of this book is to teach the reader about the fundamental technical arguments that arise when researching conformal prediction and related questions in distribution-free inference. Many of these proof strategies, especially the more recent ones, are scattered among research papers, making it difficult for researchers to understand where to look, which results are important, and how exactly the proofs work. We hope to bridge this gap by curating what we believe to be some of the most important results in the literature and presenting their proofs in a unified language, with illustrations, and with an eye towards pedagogy.

#24 L0-Regularized Quadratic Surface Support Vector Machines

著者: Ahmad Mousavi, Ramin Zandvakili, Zheming Gao

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2501.11268

要約:
Kernel-free quadratic surface support vector machines (QSVM) have recently gained traction due to their flexibility in modeling nonlinear decision boundaries without relying on kernel functions. However, the introduction of a full quadratic classifier significantly increases the number of model parameters, scaling quadratically with data dimensionality, which often leads to overfitting and makes interpretation difficult. To address these challenges, we propose sparse variants of the QSVM by enforcing a cardinality constraint on the model parameters. While enhancing generalization and promoting sparsity, leveraging the $\ell_0$-norm inevitably incurs additional computational complexity. To tackle this, we develop a penalty decomposition algorithm capable of producing solutions that provably satisfy the first-order Lu-Zhang optimality conditions. We show that the subproblems arising within the algorithm either admit closed-form solutions or can be solved efficiently through dual formulations, which contributes to the method's overall effectiveness. Besides, we analyze the convergence behavior of the algorithm under both loss settings. In addition, the numerical experiments on public benchmark datasets indicate that the proposed model is competitive with commonly used SVM variants and produces sparse solutions as expected. Moreover, its strong performance on real-world credit datasets demonstrates its potential for credit scoring applications.

#25 Entropic Mirror Descent for Linear Systems: Polyak's Stepsize and Implicit Bias

著者: Yura Malitsky, Alexander Posch

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2505.02614

要約:
This paper focuses on applying entropic mirror descent to solve linear systems, where the main challenge for the convergence analysis stems from the unboundedness of the domain. To overcome this without imposing restrictive assumptions, we introduce a variant of Polyak-type stepsizes. Along the way, we strengthen the bound for $\ell_1$-norm implicit bias, obtain sublinear and linear convergence results, and generalize the convergence result to arbitrary convex $L$-smooth functions. We also propose an alternative method that avoids exponentiation, resembling the original Hadamard descent, but with provable convergence.

#26 ContextBench: Modifying Contexts for Targeted Latent Activation

著者: Robert Graham, Edward Stevinson, Leo Richter, Alexander Chia, Joseph Miller, Joseph Isaac Bloom

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2506.15735

要約:
Identifying inputs that trigger specific behaviours or latent features in language models could have a wide range of safety use cases. We investigate a class of methods capable of generating targeted, linguistically fluent inputs that activate specific latent features or elicit model behaviours. We formalise this approach as context modification and present ContextBench -- a benchmark with tasks assessing core method capabilities and potential safety applications. Our evaluation framework measures both elicitation strength (activation of latent features or behaviours) and linguistic fluency, highlighting how current state-of-the-art methods struggle to balance these objectives. We enhance Evolutionary Prompt Optimisation (EPO) with LLM-assistance and diffusion model inpainting, and demonstrate that these variants achieve state-of-the-art performance in balancing elicitation effectiveness and fluency.

#27 Iterative Quantum Feature Maps

著者: Nasa Matsumoto, Quoc Hoan Tran, Koki Chinzei, Yasuhiro Endo, Hirotaka Oshima

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2506.19461

要約:
Quantum machine learning models that leverage quantum circuits as quantum feature maps (QFMs) are recognized for their enhanced expressive power in learning tasks. Such models have demonstrated rigorous end-to-end quantum speedups for specific families of classification problems. However, deploying deep QFMs on real quantum hardware remains challenging due to circuit noise and hardware constraints. Additionally, variational quantum algorithms often suffer from computational bottlenecks, particularly in accurate gradient estimation, which significantly increases quantum resource demands during training. We propose Iterative Quantum Feature Maps (IQFMs), a hybrid quantum-classical framework that constructs a deep architecture by iteratively connecting shallow QFMs with classically computed augmentation weights. By incorporating contrastive learning and a layer-wise training mechanism, the IQFMs framework effectively reduces quantum runtime and mitigates noise-induced degradation. In tasks involving noisy quantum data, numerical experiments show that the IQFMs framework outperforms quantum convolutional neural networks, without requiring the optimization of variational quantum parameters. Even for a typical classical image classification benchmark, a carefully designed IQFMs framework achieves performance comparable to that of classical neural networks. This framework presents a promising path to address current limitations and harness the full potential of quantum-enhanced machine learning.

#28 Learning the action for long-time-step simulations of molecular dynamics

著者: Filippo Bigi, Johannes Spies, Michele Ceriotti

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2508.01068

要約:
The equations of classical mechanics can be used to model the time evolution of countless physical systems, from the astrophysical to the atomic scale. Accurate numerical integration requires small time steps, which limits the computational efficiency -- especially in cases such as molecular dynamics that span wildly different time scales. Using machine-learning (ML) algorithms to predict trajectories allows one to greatly extend the integration time step, at the cost of introducing artifacts such as lack of energy conservation and loss of equipartition between different degrees of freedom of a system. We propose learning data-driven structure-preserving (symplectic and time-reversible) maps to generate long time-step classical dynamics and show that this method is equivalent to learning the mechanical action of the system of interest. These models can be learned based on short reference trajectories, and be transferred across thermodynamic conditions and chemical composition. We show that an action-derived ML integrator eliminates the pathological behavior of non-structure-preserving ML predictors, and that the method can be applied iteratively, serving as a correction to computationally cheaper direct predictors.

#29 Learning Centre Partitions from Summaries

著者: Zinsou Max Debaly, Jean-Francois Ethier, Michael H. Neumann, F\'elix Camirand-Lemyre

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2509.16337

要約:
Multi-centre studies increasingly rely on distributed inference, where sites share only centre-level summaries. Homogeneity of parameters across centres is often violated, motivating methods that both \emph{test} for equality and \emph{learn} centre groupings before estimation. We develop multivariate Cochran-type tests that operate on summary statistics and embed them in a sequential, test-driven \emph{Clusters-of-Centres (CoC)} algorithm that merges centres (or blocks) only when equality is not rejected. We derive the asymptotic $\chi^2$-mixture distributions of the test statistics and provide plug-in estimators for implementation. To improve finite-sample integration, we introduce a multi-round bootstrap CoC that re-evaluates merges across independently resampled summary sets; under mild regularity and a separation condition, we prove a \emph{golden-partition recovery} result: as the number of rounds grows with $n$, the true partition is recovered with probability tending to one. We also give simple numerical guidelines, including a plateau-based stopping rule, to make the multi-round procedure reproducible. Simulations and a real-data analysis of U.S.\ airline on-time performance (2007) show accurate heterogeneity detection and partitions that change little with the choice of resampling scheme.

#30 CADM: Cluster-customized Adaptive Distance Metric for Categorical Data Clustering

著者: Taixi Chen, Yiu-ming Cheung, Yiqun Zhang

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2511.05826

要約:
An appropriate distance metric is crucial for categorical data clustering, as the distance between categorical data cannot be directly calculated. However, the distances between attribute values usually vary in different clusters induced by their different distributions, which has not been taken into account, thus leading to unreasonable distance measurement. Therefore, we propose a cluster-customized distance metric for categorical data clustering, which can competitively update distances based on different distributions of attributes in each cluster. In addition, we extend the proposed distance metric to the mixed data that contains both numerical and categorical attributes. Experiments demonstrate the efficacy of the proposed method, i.e., achieving an average ranking of around first in fourteen datasets. The source code is available at https://anonymous.4open.science/r/CADM-47D8

#31 Data-Driven Global Sensitivity Analysis for Engineering Design Based on Individual Conditional Expectations

著者: Pramudita Satria Palar, Paul Saves, Rommel G. Regis, Koji Shimoyama, Shigeru Obayashi, Nicolas Verstaevel, Joseph Morlier

公開日: Mon, 09 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2512.11946

要約:
Explainable machine learning techniques have gained increasing attention in engineering applications, especially in aerospace design and analysis, where understanding how input variables influence data-driven models is essential. Partial Dependence Plots (PDPs) are widely used for interpreting black-box models by showing the average effect of an input variable on the prediction. However, their global sensitivity metric can be misleading when strong interactions are present, as averaging tends to obscure interaction effects. To address this limitation, we propose a global sensitivity metric based on Individual Conditional Expectation (ICE) curves. The method computes the expected feature importance across ICE curves, along with their standard deviation, to more effectively capture the influence of interactions. We provide a mathematical proof demonstrating that the PDP-based sensitivity is a lower bound of the proposed ICE-based metric under truncated orthogonal polynomial expansion. In addition, we introduce an ICE-based correlation value to quantify how interactions modify the relationship between inputs and the output. Comparative evaluations were performed on three cases: a 5-variable analytical function, a 5-variable wind-turbine fatigue problem, and a 9-variable airfoil aerodynamics case, where ICE-based sensitivity was benchmarked against PDP, SHapley Additive exPlanations (SHAP), and Sobol' indices. The results show that ICE-based feature importance provides richer insights than the traditional PDP-based approach, while visual interpretations from PDP, ICE, and SHAP complement one another by offering multiple perspectives.

stat.ML updates on arXiv.org

📋 論文タイトル一覧