arXiv論文一覧 - stat.ML updates on arXiv.org

#1 Sparse Network Inference under Imperfect Detection and its Application to Ecological Networks

著者: Aoran Zhang, Tianyao Wei, Maria J. Guerrero, C\'esar A. Uribe

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.18820

要約:
Recovering latent structure from count data has received considerable attention in network inference, particularly when one seeks both cross-group interactions and within-group similarity patterns in bipartite networks, which is widely used in ecology research. Such networks are often sparse and inherently imperfect in their detection. Existing models mainly focus on interaction recovery, while the induced similarity graphs are much less studied. Moreover, sparsity is often not controlled, and scale is unbalanced, leading to oversparse or poorly rescaled estimates with degrading structural recovery. To address these issues, we propose a framework for structured sparse nonnegative low-rank factorization with detection probability estimation. We impose nonconvex $\ell_{1/2}$ regularization on the latent similarity and connectivity structures to promote sparsity within-group similarity and cross-group connectivity with better relative scale. The resulting optimization problem is nonconvex and nonsmooth. To solve it, we develop an ADMM-based algorithm with adaptive penalization and scale-aware initialization and establish its asymptotic feasibility and KKT stationarity of cluster points under mild regularity conditions. Experiments on synthetic and real-world ecological datasets demonstrate improved recovery of latent factors and similarity/connectivity structure relative to existing baselines.

#2 Beyond Bellman: High-Order Generator Regression for Continuous-Time Policy Evaluation

著者: Yaowei Zheng, Richong Zhang, Shenxi Wu, Shirui Bian, Haosong Zhang, Li Zeng, Xingjian Ma, Yichi Zhang

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.18972

要約:
We study finite-horizon continuous-time policy evaluation from discrete closed-loop trajectories under time-inhomogeneous dynamics. The target value surface solves a backward parabolic equation, but the Bellman baseline obtained from one-step recursion is only first-order in the grid width. We estimate the time-dependent generator from multi-step transitions using moment-matching coefficients that cancel lower-order truncation terms, and combine the resulting surrogate with backward regression. The main theory gives an end-to-end decomposition into generator misspecification, projection error, pooling bias, finite-sample error, and start-up error, together with a decision-frequency regime map explaining when higher-order gains should be visible. Across calibration studies, four-scale benchmarks, feature and start-up ablations, and gain-mismatch stress tests, the second-order estimator consistently improves on the Bellman baseline and remains stable in the regime where the theory predicts visible gains. These results position high-order generator regression as an interpretable continuous-time policy-evaluation method with a clear operating region.

#3 Fast estimation of Gaussian mixture components via centering and singular value thresholding

著者: Huan Qing

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.19091

要約:
Estimating the number of components is a fundamental challenge in unsupervised learning, particularly when dealing with high-dimensional data with many components or severely imbalanced component sizes. This paper addresses this challenge for classical Gaussian mixture models. The proposed estimator is simple: center the data, compute the singular values of the centered matrix, and count those above a threshold. No iterative fitting, no likelihood calculation, and no prior knowledge of the number of components are required. We prove that, under a mild separation condition on the component centers, the estimator consistently recovers the true number of components. The result holds in high-dimensional settings where the dimension can be much larger than the sample size. It also holds when the number of components grows to the smaller of the dimension and the sample size, even under severe imbalance among component sizes. Computationally, the method is extremely fast: for example, it processes ten million samples in one hundred dimensions within one minute. Extensive experimental studies confirm its accuracy in challenging settings such as high dimensionality, many components, and severe class imbalance.

#4 Analytical Extraction of Conditional Sobol' Indices via Basis Decomposition of Polynomial Chaos Expansions

著者: Shijie Zhong, Jiangfeng Fu

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.19165

要約:
In uncertainty quantification, evaluating sensitivity measures under specific conditions (i.e., conditional Sobol' indices) is essential for systems with parameterized responses, such as spatial fields or varying operating conditions. Traditional approaches often rely on point-wise modeling, which is computationally expensive and may lack consistency across the parameter space. This paper demonstrates that for a pre-trained global Polynomial Chaos Expansion (PCE) model, the analytical conditional Sobol' indices are inherently embedded within its basis functions. By leveraging the tensor-product property of PCE bases, we reformulate the global expansion into a set of analytical coefficient fields that depend on the conditioning variables. Based on the preservation of orthogonality under conditional probability measures, we derive closed-form expressions for conditional variances and Sobol' indices. This framework bypasses the need for repetitive modeling or additional sampling, transforming conditional sensitivity analysis into a purely algebraic post-processing step. Numerical benchmarks indicate that the proposed method ensures physical coherence and offers superior numerical robustness and computational efficiency compared to conventional point-wise approaches.

#5 Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training

著者: Vin Bhaskara, Haicheng Wang

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.18701

要約:
Local prediction-error-based curiosity rewards focus on the current transition without considering the world model's cumulative prediction error across all visited transitions. We introduce Curiosity-Critic, which grounds its intrinsic reward in the improvement of this cumulative objective, and show that it reduces to a tractable per-step form: the difference between the current prediction error and the asymptotic error baseline of the current state transition. We estimate this baseline online with a learned critic co-trained alongside the world model; regressing a single scalar, the critic converges well before the world model saturates, redirecting exploration toward learnable transitions without oracle knowledge of the noise floor. The reward is higher for learnable transitions and collapses toward the baseline for stochastic ones, effectively separating epistemic (reducible) from aleatoric (irreducible) prediction error online. Prior prediction-error curiosity formulations, from Schmidhuber (1991) to learned-feature-space variants, emerge as special cases corresponding to specific approximations of this baseline. Experiments on a stochastic grid world show that Curiosity-Critic outperforms prediction-error and visitation-count baselines in convergence speed and final world model accuracy.

#6 Discrete Tilt Matching

著者: Yuyuan Chen, Shiyi Wang, Peter Potaptchik, Jaeyeon Kim, Michael S. Albergo

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.18739

要約:
Masked diffusion large language models (dLLMs) are a promising alternative to autoregressive generation. While reinforcement learning (RL) methods have recently been adapted to dLLM fine-tuning, their objectives typically depend on sequence-level marginal likelihoods, which are intractable for masked diffusion models. To address this, we derive Discrete Tilt Matching (DTM), a likelihood-free method that recasts dLLM fine-tuning as state-level matching of local unmasking posteriors under reward tilting. DTM takes the form of a weighted cross-entropy objective with explicit minimizer, and admits control variates that improve training stability. On a synthetic maze-planning task, we analyze how DTM's annealing schedule and control variates affect training stability and prevent mode collapse. At scale, fine-tuning LLaDA-8B-Instruct with DTM yields strong gains on Sudoku and Countdown while remaining competitive on MATH500 and GSM8K.

#7 ParamBoost: Gradient Boosted Piecewise Cubic Polynomials

著者: Nicolas Salvad\'e, Tim Hillel

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.18864

要約:
Generalized Additive Models (GAMs) can be used to create non-linear glass-box (i.e. explicitly interpretable) models, where the predictive function is fully observable over the complete input space. However, glass-box interpretability itself does not allow for the incorporation of expert knowledge from the modeller. In this paper, we present ParamBoost, a novel GAM whose shape functions (i.e. mappings from individual input features to the output) are learnt using a Gradient Boosting algorithm that fits cubic polynomial functions at leaf nodes. ParamBoost incorporates several constraints commonly used in parametric analysis to ensure well-refined shape functions. These constraints include: (i) continuity of the shape functions and their derivatives (up to C2); (ii) monotonicity; (iii) convexity; (iv) feature interaction constraints; and (v) model specification constraints. Empirical results show that the unconstrained ParamBoost model consistently outperforms state-of-the-art GAMs across several real-world datasets. We further demonstrate that modellers can selectively impose required constraints at a modest trade-off in predictive performance, allowing the model to be fully tailored to application-specific interpretability and parametric-analysis requirements.

#8 Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control

著者: Julian Skifstad, Xinyue Annie Yang, Glen Chou

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.19018

要約:
Inference-time LLM alignment methods, particularly activation steering, offer an alternative to fine-tuning by directly modifying activations during generation. Existing methods, however, often rely on non-anticipative interventions that ignore how perturbations propagate through transformer layers and lack online error feedback, resulting in suboptimal, open-loop control. To address this, we show empirically that, despite the nonlinear structure of transformer blocks, layer-wise dynamics across multiple LLM architectures and scales are well-approximated by locally-linear models. Exploiting this property, we model LLM inference as a linear time-varying dynamical system and adapt the classical linear quadratic regulator to compute feedback controllers using layer-wise Jacobians, steering activations toward desired semantic setpoints in closed-loop with minimal computational overhead and no offline training. We also derive theoretical bounds on setpoint tracking error, enabling formal guarantees on steering performance. Using a novel adaptive semantic feature setpoint signal, our method yields robust, fine-grained behavior control across models, scales, and tasks, including state-of-the-art modulation of toxicity, truthfulness, refusal, and arbitrary concepts, surpassing baseline steering methods. Our code is available at: https://github.com/trustworthyrobotics/lqr-activation-steering

#9 Last-Iterate Guarantees for Learning in Co-coercive Games

著者: Siddharth Chandak, Ramanan Tamizholi, Nicholas Bambos

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.19065

要約:
We establish finite-time last-iterate guarantees for vanilla stochastic gradient descent in co-coercive games under noisy feedback. This is a broad class of games that is more general than strongly monotone games, allows for multiple Nash equilibria, and includes examples such as quadratic games with negative semidefinite interaction matrices and potential games with smooth concave potentials. Prior work in this setting has relied on relative noise models, where the noise vanishes as iterates approach equilibrium, an assumption that is often unrealistic in practice. We work instead under a substantially more general noise model in which the second moment of the noise is allowed to scale affinely with the squared norm of the iterates, an assumption natural in learning with unbounded action spaces. Under this model, we prove a last-iterate bound of order $O(\log(t)/t^{1/3})$, the first such bound for co-coercive games under non-vanishing noise. We additionally establish almost sure convergence of the iterates to the set of Nash equilibria and derive time-average convergence guarantees.

#10 S2MAM: Semi-supervised Meta Additive Model for Robust Estimation and Variable Selection

著者: Xuelin Zhang, Hong Chen, Yingjie Wang, Tieliang Gong, Bin Gu

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.19072

要約:
Semi-supervised learning with manifold regularization is a classical framework for jointly learning from both labeled and unlabeled data, where the key requirement is that the support of the unknown marginal distribution has the geometric structure of a Riemannian manifold. Typically, the Laplace-Beltrami operator-based manifold regularization can be approximated empirically by the Laplacian regularization associated with the entire training data and its corresponding graph Laplacian matrix. However, the graph Laplacian matrix depends heavily on the prespecified similarity metric and may lead to inappropriate penalties when dealing with redundant or noisy input variables. To address the above issues, this paper proposes a new \textit{Semi-Supervised Meta Additive Model (S$^2$MAM) based on a bilevel optimization scheme that automatically identifies informative variables, updates the similarity matrix, and simultaneously achieves interpretable predictions. Theoretical guarantees are provided for S$^2$MAM, including the computing convergence and the statistical generalization bound. Experimental assessments across 4 synthetic and 12 real-world datasets, with varying levels and categories of corruption, validate the robustness and interpretability of the proposed approach.

#11 Heterogeneity-Aware Personalized Federated Learning for Industrial Predictive Analytics

著者: Yuhan Hu, Xiaolei Fang

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.19451

要約:
Federated prognostics enable clients (e.g., companies, factories, and production lines) to collaboratively develop a failure time prediction model while keeping each client's data local and confidential. However, traditional federated models often assume homogeneity in the degradation processes across clients, an assumption that may not hold in many industrial settings. To overcome this, this paper proposes a personalized federated prognostic model designed to accommodate clients with heterogeneous degradation processes, allowing them to build tailored prognostic models. The prognostic model iteratively facilitates the underlying pairwise collaborations between clients with similar degradation patterns, which enhances the performance of personalized federated learning. To estimate parameters jointly using decentralized datasets, we develop a federated parameter estimation algorithm based on proximal gradient descent. The proposed approach addresses the limitations of existing federated prognostic models by simultaneously achieving model personalization, preserving data privacy, and providing comprehensive failure time distributions. The superiority of the proposed model is validated through extensive simulation studies and a case study using the turbofan engine degradation dataset from the NASA repository.

#12 Calibrating Scientific Foundation Models with Inference-Time Stochastic Attention

著者: Akash Yadav, Taiwo A. Adebiyi, Ruda Zhang

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.19530

要約:
Transformer-based scientific foundation models are increasingly deployed in high-stakes settings, but current architectures give deterministic outputs and provide limited support for calibrated predictive uncertainty. We propose Stochastic Attention, a lightweight inference-time modification that randomizes attention by replacing softmax weights with normalized multinomial samples controlled by a single concentration parameter, and produces predictive ensembles without retraining. To set this parameter, we introduce a calibration objective that matches the stochastic attention output with the target, yielding an efficient univariate post-hoc tuning problem. We evaluate this mechanism on two scientific foundation models for weather and timeseries forecasting along with an additional regression task. Across benchmarks against uncertainty-aware baselines, we find that Stochastic Attention achieves the strongest native calibration and the sharpest prediction intervals at comparable coverage, while requiring only minutes of post-hoc tuning versus days of retraining for competitive baselines.

#13 Separating Geometry from Probability in the Analysis of Generalization

著者: Maxim Raginsky, Benjamin Recht

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.19560

要約:
The goal of machine learning is to find models that minimize prediction error on data that has not yet been seen. Its operational paradigm assumes access to a dataset $S$ and articulates a scheme for evaluating how well a given model performs on an arbitrary sample. The sample can be $S$ (in which case we speak of ``in-sample'' performance) or some entirely new $S'$ (in which case we speak of ``out-of-sample'' performance). Traditional analysis of generalization assumes that both in- and out-of-sample data are i.i.d.\ draws from an infinite population. However, these probabilistic assumptions cannot be verified even in principle. This paper presents an alternative view of generalization through the lens of sensitivity analysis of solutions of optimization problems to perturbations in the problem data. Under this framework, generalization bounds are obtained by purely deterministic means and take the form of variational principles that relate in-sample and out-of-sample evaluations through an error term that quantifies how close out-of-sample data are to in-sample data. Statistical assumptions can then be used \textit{ex post} to characterize the situations when this error term is small (either on average or with high probability).

#14 Budgeted Online Influence Maximization

著者: Pierre Perrault, Jennifer Healey, Zheng Wen, Michal Valko

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.19672

要約:
We introduce a new budgeted framework for online influence maximization, considering the total cost of an advertising campaign instead of the common cardinality constraint on a chosen influencer set. Our approach better models the real-world setting where the cost of influencers varies and advertisers want to find the best value for their overall social advertising budget. We propose an algorithm assuming an independent cascade diffusion model and edge level semi-bandit feedback, and provide both theoretical and experimental results. Our analysis is also valid for the cardinality constraint setting and improves the state of the art regret bound in this case.

#15 Ultrametric OGP - parametric RDT \emph{symmetric} binary perceptron connection

著者: Mihailo Stojnic

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.19712

要約:
In [97,99,100], an fl-RDT framework is introduced to characterize \emph{statistical computational gaps} (SCGs). Studying \emph{symmetric binary perceptrons} (SBPs), [100] obtained an \emph{algorithmic} threshold estimate $\alpha_a\approx \alpha_c^{(7)}\approx 1.6093$ at the 7th lifting level (for $\kappa=1$ margin), closely approaching $1.58$ local entropy (LE) prediction [18]. In this paper, we further connect parametric RDT to overlap gap properties (OGPs), another key geometric feature of the solution space. Specifically, for any positive integer $s$, we consider $s$-level ultrametric OGPs ($ult_s$-OGPs) and rigorously upper-bound the associated constraint densities $\alpha_{ult_s}$. To achieve this, we develop an analytical union-bounding program consisting of combinatorial and probabilistic components. By casting the combinatorial part as a convex problem and the probabilistic part as a nested integration, we conduct numerical evaluations and obtain that the tightest bounds at the first two levels, $\bar{\alpha}_{ult_1} \approx 1.6578$ and $\bar{\alpha}_{ult_2} \approx 1.6219$, closely approach the 3rd and 4th lifting level parametric RDT estimates, $\alpha_c^{(3)} \approx 1.6576$ and $\alpha_c^{(4)} \approx 1.6218$. We also observe excellent agreement across other key parameters, including overlap values and the relative sizes of ultrametric clusters. Based on these observations, we propose several conjectures linking $ult$-OGP and parametric RDT. Specifically, we conjecture that algorithmic threshold $\alpha_a=\lim_{s\rightarrow\infty} \alpha_{ult_s} = \lim_{s\rightarrow\infty} \bar{\alpha}{ult_s} = \lim_{r\rightarrow\infty} \alpha_{c}^{(r)}$, and $\alpha_{ult_s} \leq \alpha_{c}^{(s+2)}$ (with possible equality for some (maybe even all) $s$). Finally, we discuss the potential existence of a full isomorphism connecting all key parameters of $ult$-OGP and parametric RDT.

#16 Phase Transitions in the Fluctuations of Functionals of Random Neural Networks

著者: Simmaco Di Lillo, Leonardo Maini, Domenico Marinucci

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.19738

要約:
We establish central and non-central limit theorems for sequences of functionals of the Gaussian output of an infinitely-wide random neural network on the d-dimensional sphere . We show that the asymptotic behaviour of these functionals as the depth of the network increases depends crucially on the fixed points of the covariance function, resulting in three distinct limiting regimes: convergence to the same functional of a limiting Gaussian field, convergence to a Gaussian distribution, convergence to a distribution in the Qth Wiener chaos. Our proofs exploit tools that are now classical (Hermite expansions, Diagram Formula, Stein-Malliavin techniques), but also ideas which have never been used in similar contexts: in particular, the asymptotic behaviour is determined by the fixed-point structure of the iterative operator associated with the covariance, whose nature and stability governs the different limiting regimes.

#17 Generalization at the Edge of Stability

著者: Mario Tuci, Caner Korkmaz, Umut \c{S}im\c{s}ekli, Tolga Birdal

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.19740

要約:
Training modern neural networks often relies on large learning rates, operating at the edge of stability, where the optimization dynamics exhibit oscillatory and chaotic behavior. Empirically, this regime often yields improved generalization performance, yet the underlying mechanism remains poorly understood. In this work, we represent stochastic optimizers as random dynamical systems, which often converge to a fractal attractor set (rather than a point) with a smaller intrinsic dimension. Building on this connection and inspired by Lyapunov dimension theory, we introduce a novel notion of dimension, coined the `sharpness dimension', and prove a generalization bound based on this dimension. Our results show that generalization in the chaotic regime depends on the complete Hessian spectrum and the structure of its partial determinants, highlighting a complexity that cannot be captured by the trace or spectral norm considered in prior work. Experiments across various MLPs and transformers validate our theory while also providing new insights into the recently observed phenomenon of grokking.

#18 LatticeVision: Image to Image Networks for Modeling Non-Stationary Spatial Data

著者: Antony Sikorski, Michael Ivanitskiy, Nathan Lenssen, Douglas Nychka, Daniel McKenzie

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2505.09803

要約:
In many applications, we wish to fit a parametric statistical model to a small ensemble of spatially distributed random variables ('fields'). However, parameter inference using maximum likelihood estimation (MLE) is computationally prohibitive, especially for large, non-stationary fields. Thus, many recent works train neural networks to estimate parameters given spatial fields as input, sidestepping MLE completely. In this work we focus on a popular class of parametric, spatially autoregressive (SAR) models. We make a simple yet impactful observation; because the SAR parameters can be arranged on a regular grid, both inputs (spatial fields) and outputs (model parameters) can be viewed as images. Using this insight, we demonstrate that image-to-image (I2I) networks enable faster and more accurate parameter estimation for a class of non-stationary SAR models with unprecedented complexity.

#19 Highly Efficient and Effective LLMs with Multi-Boolean Architectures

著者: Ba-Hien Tran, Van Minh Nguyen

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2505.22811

要約:
Weight binarization has emerged as a promising strategy to reduce the complexity of large language models (LLMs). Existing approaches fall into post-training binarization, which is simple but causes severe performance loss, and training-aware methods, which depend on full-precision latent weights, adding complexity and limiting efficiency. We propose a novel framework that represents LLMs with multi-kernel Boolean parameters and, for the first time, enables direct finetuning LMMs in the Boolean domain, eliminating the need for latent weights. This enhances representational capacity and dramatically reduces complexity during both finetuning and inference. Extensive experiments across diverse LLMs show our method outperforms recent ultra low-bit quantization and binarization techniques.

#20 Energy-Weighted Flow Matching: Unlocking Continuous Normalizing Flows for Efficient and Scalable Boltzmann Sampling

著者: Niclas Dern, Lennart Redl, Sebastian Pfister, Marcel Kollovieh, David L\"udke, Stephan G\"unnemann

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2509.03726

要約:
Sampling from unnormalized target distributions, e.g.\ Boltzmann distributions $\mu_{\text{target}}(x) \propto \exp(-E(x)/T)$, is fundamental to many scientific applications yet computationally challenging due to complex, high-dimensional energy landscapes. Existing approaches applying modern generative models to Boltzmann distributions either require large datasets of samples drawn from the target distribution or, when using only energy evaluations for training, cannot efficiently leverage the expressivity of advanced architectures like continuous normalizing flows that have shown promise for molecular sampling. To address these shortcomings, we introduce Energy-Weighted Flow Matching (EWFM), a novel training objective enabling continuous normalizing flows to model Boltzmann distributions using only energy function evaluations. Our objective reformulates conditional flow matching via importance sampling, allowing training with samples from arbitrary proposal distributions. Based on this objective, we develop two algorithms: iterative EWFM (iEWFM), which progressively refines proposals through iterative training, and annealed EWFM (aEWFM), which additionally incorporates temperature annealing for challenging energy landscapes. On benchmark systems, including challenging 55-particle Lennard-Jones clusters, our algorithms demonstrate sample quality competitive with established energy-only methods while requiring up to three orders of magnitude fewer energy evaluations.

#21 When Langevin Monte Carlo Meets Randomization: Non-asymptotic Error Bounds beyond Log-Concavity and Gradient Lipschitzness

著者: Xiaojie Wang, Bin Yang

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2509.25630

要約:
Efficient sampling from complex and high dimensional target distributions turns out to be a fundamental task in diverse disciplines such as scientific computing, statistics and machine learning. In this paper, we revisit the randomized Langevin Monte Carlo (RLMC) for sampling from high dimensional distributions without log-concavity. Under the gradient Lipschitz condition and the log-Sobolev inequality, we prove a uniform-in-time error bound in $\mathcal{W}_2$-distance of order $O(\sqrt{d}h)$ for the RLMC sampling algorithm, which matches the best one in the literature under the log-concavity condition. Moreover, when the gradient of the potential $U$ is non-globally Lipschitz with superlinear growth, modified RLMC algorithms are proposed and analyzed, with non-asymptotic error bounds established. To the best of our knowledge, the modified RLMC algorithms and their non-asymptotic error bounds are new in the non-globally Lipschitz setting.

#22 Efficient Autoregressive Inference for Transformer Probabilistic Models

著者: Conor Hassan, Nasrulloh Loka, Cen-You Li, Daolang Huang, Paul E. Chang, Yang Yang, Francesco Silvestrin, Samuel Kaski, Luigi Acerbi

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2510.09477

要約:
Set-based transformer models for amortized probabilistic inference and meta-learning, such as neural processes, prior-fitted networks, and tabular foundation models, excel at single-pass marginal prediction. However, many applications require joint distributions over multiple predictions. Purely autoregressive architectures generate these efficiently but sacrifice flexible set-conditioning. Obtaining joint distributions from set-based models requires re-encoding the entire context at each autoregressive step, which scales poorly. We introduce a causal autoregressive buffer that combines the strengths of both paradigms. The model encodes the context once and caches it; a lightweight causal buffer captures dependencies among generated targets, with each new prediction attending to both the cached context and all previously predicted targets added to the buffer. This enables efficient batched autoregressive sampling and joint predictive density evaluation. Training integrates set-based and autoregressive modes through masked attention at minimal overhead. Across synthetic functions, EEG time series, a Bayesian model comparison task, and tabular regression, our method closely matches the performance of full context re-encoding while delivering up to $20\times$ faster joint sampling and density evaluation, and up to $7\times$ lower memory usage.

#23 Quantifying Data Similarity Using Cross Learning

著者: Shudong Sun, Hao Helen Zhang, Joseph C Watkins

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2510.10866

要約:
Measuring dataset similarity is fundamental in machine learning, particularly for transfer learning and domain adaptation. In the context of supervised learning, most existing approaches quantify similarity of two data sets based on their input feature distributions, neglecting label information and feature-response alignment. To address this, we propose the Cross-Learning Score (CLS), which measures dataset similarity through bidirectional generalization performance of decision rules. We establish its theoretical foundation by linking CLS to cosine similarity between decision boundaries under canonical linear models, providing a geometric interpretation. A robust ensemble-based estimator is developed that is easy to implement and bypasses high-dimensional density estimation entirely. For transfer learning applications, we introduce a "transferable zones" framework that categorizes source datasets into positive, ambiguous, and negative transfer regions. To accommodate deep learning, we extend CLS to encoder-head architectures, aligning with modern representation-based pipelines. Extensive experiments on synthetic and real-world datasets validate the effectiveness of CLS for similarity measurement and transfer assessment.

#24 PriorGuide: Test-Time Prior Adaptation for Simulation-Based Inference

著者: Yang Yang, Severi Rissanen, Paul E. Chang, Nasrulloh Loka, Daolang Huang, Arno Solin, Markus Heinonen, Luigi Acerbi

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2510.13763

要約:
Amortized simulator-based inference offers a powerful framework for tackling Bayesian inference in computational fields such as engineering or neuroscience, increasingly leveraging modern generative methods like diffusion models to map observed data to model parameters or future predictions. These approaches yield posterior or posterior-predictive samples for new datasets without requiring further simulator calls after training on simulated parameter-data pairs. However, their applicability is often limited by the prior distribution(s) used to generate model parameters during this training phase. To overcome this constraint, we introduce PriorGuide, a technique specifically designed for diffusion-based amortized inference methods. PriorGuide leverages a novel guidance approximation that enables flexible adaptation of the trained diffusion model to new priors at test time, crucially without costly retraining. This allows users to readily incorporate updated information or expert knowledge post-training, enhancing the versatility of pre-trained inference models.

#25 Fitted Q Evaluation Without Bellman Completeness via Stationary Weighting

著者: Lars van der Laan, Nathan Kallus

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2512.23805

要約:
Fitted Q-evaluation (FQE) is a foundational method for off-policy evaluation in reinforcement learning, but existing theory typically relies on Bellman completeness of the function class, a condition often violated in practice. This reliance is due to a fundamental norm mismatch: the Bellman operator is gamma-contractive in the L^2 norm induced by the target policy's stationary distribution, whereas standard FQE fits Bellman regressions under the behavior distribution. To resolve this mismatch, we reweight each Bellman regression step by an estimate of the stationary density ratio, inspired by emphatic weighting in temporal-difference learning. This makes the update behave as if it were performed under the target stationary distribution, restoring contraction without Bellman completeness while preserving the simplicity of regression-based evaluation. Illustrative experiments, including Baird's classical counterexample, show that stationary weighting can stabilize FQE under off-policy sampling.

#26 Byzantine-tolerant distributed learning of finite mixture models

著者: Qiong Zhang, Yan Shuo Tan, Jiahua Chen

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2407.13980

要約:
Traditional statistical methods need to be updated to work with modern distributed data storage paradigms. A common approach is the split-and-conquer framework, which involves learning models on local machines and averaging their parameter estimates. However, this does not work for the important problem of learning finite mixture models, because subpopulation indices on each local machine may be arbitrarily permuted (the "label switching problem"). Zhang and Chen (2022) proposed Mixture Reduction (MR) to address this issue, but MR remains vulnerable to Byzantine failure, whereby a fraction of local machines may transmit arbitrarily erroneous information. This paper introduces Distance Filtered Mixture Reduction (DFMR), a Byzantine tolerant adaptation of MR that is both computationally efficient and statistically sound. DFMR leverages the densities of local estimates to construct a robust filtering mechanism. By analysing the pairwise L2 distances between local estimates, DFMR identifies and removes severely corrupted local estimates while retaining the majority of uncorrupted ones. We provide theoretical justification for DFMR, proving its optimal convergence rate and asymptotic equivalence to the global maximum likelihood estimate under standard assumptions. Numerical experiments on simulated and real-world data validate the effectiveness of DFMR in achieving robust and accurate aggregation in the presence of Byzantine failure.

#27 The Data-Driven Censored Newsvendor Problem

著者: Chamsi Hssaine, Sean R. Sinclair

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2412.01763

要約:
We study a censored variant of the data-driven newsvendor problem, where the decision-maker must select an ordering quantity that minimizes expected overage and underage costs based only on offline censored sales data, rather than historical demand realizations. Our goal is to understand how the degree of historical demand censoring affects the performance of any learning algorithm for this problem. To isolate this impact, we adopt a distributionally robust optimization framework, evaluating policies according to their worst-case regret over an ambiguity set of distributions. This set is defined by the largest historical order quantity (the observable boundary of the dataset), and contains all distributions matching the true demand distribution up to this boundary, while allowing them to be arbitrary afterwards. We demonstrate a spectrum of achievability under demand censoring by deriving a natural necessary and sufficient condition under which vanishing regret is an achievable goal. In regimes in which it is not, we exactly characterize the information loss due to censoring: an insurmountable lower bound on the performance of any policy, even when the decision-maker has access to infinitely many demand samples. We then leverage these sharp characterizations to propose a natural robust algorithm that adapts to the historical level of demand censoring. We derive finite-sample guarantees for this algorithm across all possible censoring regimes and show its near-optimality with matching lower bounds (up to polylogarithmic factors). We moreover demonstrate its robust performance via extensive numerical experiments on both synthetic and real-world datasets.

#28 Private Minimum Hellinger Distance Estimation via Hellinger Distance Differential Privacy

privacy

著者: Fengnan Deng, Anand N. Vidyashankar

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2501.14974

要約:
Objective functions based on Hellinger distance yield robust and efficient estimators of model parameters. Motivated by privacy and regulatory requirements encountered in contemporary applications, we derive in this paper \emph{private minimum Hellinger distance estimators}. The estimators satisfy a new privacy constraint, namely, Hellinger differential privacy, while retaining the robustness and efficiency properties. We demonstrate that Hellinger differential privacy shares several features of standard differential privacy while allowing for sharper inference. Additionally, for computational purposes, we also develop Hellinger differentially private gradient descent and Newton-Raphson algorithms. We illustrate the behavior of our estimators in finite samples using numerical experiments and verify that they retain robustness properties under gross-error contamination.

#29 Online Learning of Whittle Indices for Restless Bandits with Non-Stationary Transition Kernels

著者: Md Kamran Chowdhury Shisher, Vishrant Tripathi, Mung Chiang, Christopher G. Brinton

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2506.18186

要約:
The restless multi-armed bandit (RMAB) framework is a popular approach to solving resource allocation problems in networked systems. In this paper, we study optimal resource allocation in RMABs facing unknown and non-stationary dynamics. Solving RMABs optimally is known to be PSPACE-hard even with full knowledge of model parameters. While Whittle index policies offer asymptotic optimality with low computational cost, they require access to stationary transition kernels, an unrealistic assumption in many modern networking applications. To address this challenge, we propose a Sliding-Window Online Whittle (SW-Whittle) policy that remains computationally efficient while adapting to time-varying kernels. Through theoretical analysis, we show that our algorithm achieves sub-linear dynamic regret with respect to the number of episodes. We further address the important case where the variation budget is unknown in advance by combining a Bandit-over-Bandit framework with our sliding-window design. In our scheme, window lengths are tuned online as a function of the estimated variation, while Whittle indices are computed via an upper-confidence-bound of the estimated transition kernels and a bilinear optimization routine. Numerical experiments demonstrate that our algorithm consistently outperforms baselines, achieving the lowest cumulative regret across a range of non-stationary environments.

#30 Best Agent Identification for General Game Playing

agent

著者: Matthew Stephenson, Alex Newcombe, Eric Piette, Dennis Soemers

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2507.00451

要約:
We present an efficient and generalised procedure to accurately identify the best (or near best) performing algorithm for each sub-task in a multi-problem domain. Our approach treats this as a set of best arm identification problems for multi-armed bandits, where each bandit corresponds to a specific task and each arm corresponds to a specific algorithm or agent. We propose an optimistic selection process based on a chosen confidence interval, that ranks each arm across all bandits in terms of their potential to influence our overall simple regret. We evaluate the performance of our approach on two of the most popular general game playing domains, the General Video Game AI (GVGAI) framework and the Ludii general game playing system, with the goal of selecting a high-performing agent for each game using a limited number of available trials. Compared to previous best arm identification algorithms for multi-armed bandits, our results demonstrate a substantial performance improvement in terms of average simple regret and average probability of error. This novel approach can be used to significantly improve the quality and accuracy of agent evaluation procedures for general game frameworks, as well as other multi-task domains with high algorithm runtimes.

#31 End-to-End Large Portfolio Optimization for Variance Minimization with Neural Networks through Covariance Cleaning

著者: Christian Bongiorno, Efstratios Manolakis, Rosario Nunzio Mantegna

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2507.01918

要約:
We develop a rotation-invariant neural network that provides the global minimum-variance portfolio by jointly learning how to lag-transform historical returns and marginal volatilities and how to regularise the eigenvalues of large equity covariance matrices. This explicit mathematical mapping offers clear interpretability of each module's role, so the model cannot be regarded as a pure black box. The architecture mirrors the analytical form of the global minimum-variance solution yet remains agnostic to dimension, so a single model can be calibrated on panels of a few hundred stocks and applied, without retraining, to one thousand US equities, a cross-sectional jump that indicates robust generalization capability. The loss function is the future short-term realized minimum variance and is optimized end-to-end on real returns. In out-of-sample tests from January 2000 to December 2024, the estimator delivers systematically lower realized volatility, smaller maximum drawdowns, and higher Sharpe ratios than the best competitors, including state-of-the-art non-linear shrinkage, and these advantages persist across both short and long evaluation horizons despite the model's training focus is short-term. Furthermore, although the model is trained end-to-end to produce an unconstrained minimum-variance portfolio, we show that its learned covariance representation can be used in general optimizers under long-only constraints with virtually no loss in its performance advantage over competing estimators. These advantages persist when the strategy is executed under a highly realistic implementation framework that models market orders at the auctions, empirical slippage, exchange fees, and financing charges for leverage, and they remain stable during episodes of acute market stress.

#32 IMPACT: Importance-Aware Activation Space Reconstruction

privacy

著者: Md Mokarram Chowdhury, Daniel Agyei Asante, Ernie Chang, Yang Li

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2507.03828

要約:
Large language models (LLMs) achieve strong performance across diverse domains but remain difficult to deploy in resource-constrained environments due to their size. Low-rank compression is a common remedy, typically minimizing weight reconstruction error under the assumption that weights are low-rank. However, this assumption often does not hold in LLMs. In contrast, LLM activations exhibit a more pronounced low-rank structure, motivating approaches that minimize activation reconstruction error. This shift alone, however, is not sufficient: different activation dimensions contribute unequally to model performance, and treating them uniformly can lead to accuracy loss. We introduce IMPACT, an importance-aware activation reconstruction framework that links compression to its effect on model performance. IMPACT formulates compression as an optimization problem that integrates activation structure with gradient-based importance, deriving a closed-form solution where reconstruction bases arise from an importance-weighted activation covariance matrix. This yields low-rank compression explicitly optimized for accuracy preservation. Experiments across multiple models and tasks demonstrate that IMPACT achieves up to 55.4% greater model size reduction while maintaining accuracy comparable to or better than state-of-the-art baselines.

#33 BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design

著者: Deepro Choudhury, Sinead Williamson, Adam Goli\'nski, Ning Miao, Freddie Bickford Smith, Michael Kirchhof, Yizhe Zhang, Tom Rainforth

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2508.21184

要約:
We propose a general-purpose approach for improving the ability of large language models (LLMs) to intelligently and adaptively gather information from a user or other external source using the framework of sequential Bayesian experimental design (BED). This enables LLMs to act as effective multi-turn conversational agents and interactively interface with external environments. Our approach, which we call BED-LLM (Bayesian experimental design with large language models), is based on iteratively choosing questions or queries that maximize the expected information gain (EIG) with respect to a variable of interest given the responses gathered previously. We show how this EIG can be formulated (and then estimated) in a principled way using a probabilistic model derived from the LLM's predictive distributions and provide detailed insights into key decisions in its construction and updating procedure. We find that BED-LLM achieves substantial gains in performance across a wide range of tests based on the 20 Questions game and using the LLM to actively infer user preferences, compared to purely prompting-based design generation and other adaptive design strategies.

#34 All Emulators are Wrong, Many are Useful, and Some are More Useful Than Others: A Reproducible Comparison of Computer Model Surrogates

著者: Kellin N. Rumsey, Graham C. Gibson, Devin Francom, Reid Morris

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2512.09060

要約:
Accurate and efficient surrogate modeling is essential for modern computational science, and there are a staggering number of emulation methods to choose from. With new methods being developed all the time, comparing the relative strengths and weaknesses of different methods remains a challenge due to inconsistent benchmarking practices and (sometimes) limited reproducibility and transparency. In this work, we present a large-scale, fully reproducible comparison of $29$ distinct emulators across $60$ canonical test functions and $40$ real emulation datasets. To facilitate rigorous, apples-to-apples comparisons, we introduce the R package \texttt{duqling}, which streamlines reproducible simulation studies using a consistent, simple syntax, and automatic internal scaling of inputs. This framework allows researchers to compare emulators in a unified environment and makes it possible to replicate or extend previous studies with minimal effort, even across different publications. Our results provide detailed empirical insight into the strengths and weaknesses of state-of-the-art emulators and offer guidance for both method developers and practitioners selecting a surrogate for new data. We discuss best practices for emulator comparison and highlight how \texttt{duqling} can accelerate research in emulator design and application.

#35 Optimized Architectures for Kolmogorov-Arnold Networks

著者: James Bagrow, Josh Bongard

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2512.12448

要約:
Efforts to improve Kolmogorov--Arnold networks (KANs) with architectural enhancements have been stymied by the complexity those enhancements bring, undermining the interpretability that makes KANs attractive in the first place. Here we study overprovisioned architectures combined with sparsification, deep supervision, and depth selection, to learn compact, interpretable KANs without sacrificing accuracy. Crucially, we focus on differentiable mechanisms under a principled minimum description length objective, jointly optimizing activations, structure, and depth end-to-end. Experiments across function approximation benchmarks, dynamical systems forecasting, and real-world prediction tasks demonstrate that sparsification alone is insufficient, but the combination with depth selection achieves competitive or superior accuracy while discovering substantially smaller models. The result is a principled path toward models that are both more expressive and more interpretable, addressing a key tension in scientific machine learning.

#36 Drift Localization using Conformal Predictions

著者: Fabian Hinder, Valerie Vaquet, Johannes Brinkrolf, Barbara Hammer

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2602.19790

要約:
Concept drift -- the change of the distribution over time -- poses significant challenges for learning systems and is of central interest for monitoring. Understanding drift is thus paramount, and drift localization -- determining which samples are affected by the drift -- is essential. While several approaches exist, most rely on local testing schemes, which tend to fail in high-dimensional, low-signal settings. In this work, we consider a fundamentally different approach based on conformal predictions. We discuss and show the shortcomings of common approaches and demonstrate the performance of our approach on state-of-the-art image datasets.

#37 Neyman-Pearson multiclass classification under label noise via empirical likelihood

著者: Qiong Zhang, Qinglong Tian, Pengfei Li

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.21623

要約:
In many classification problems, misclassification costs are highly asymmetric, while training labels are often corrupted due to measurement error, annotator variability, or adversarial noise. The Neyman-Pearson multiclass classification (NPMC) framework addresses such asymmetry by controlling class-specific errors, but existing methods assume that training labels are correctly observed. To our knowledge, no existing approach handles NPMC under label noise in the multiclass setting, and the only binary method requires prior knowledge of the noise mechanism. A fundamental difficulty is that, without structural assumptions, noisy-label models are non-identifiable: distinct combinations of class-conditional distributions and noise mechanisms can induce the same observed distribution, preventing recovery of the quantities required for error control. We show that the exponential tilting density ratio model restores identifiability, and leverage this structure to develop an empirical likelihood approach for NPMC with noisy labels. The proposed method jointly estimates clean-label class proportions, posterior probabilities, and the noise mechanism from noisy data, without requiring prior knowledge of the confusion matrix. An expectation-maximization algorithm enables efficient computation. The resulting estimators are root n consistent and asymptotically normal, and the induced classifiers satisfy Neyman-Pearson oracle inequalities in both binary and multiclass settings. Simulation and real-data experiments demonstrate near-oracle performance.

#38 Adversarial Label Invariant Graph Data Augmentations for Out-of-Distribution Generalization

著者: Simon Zhang, Ryan P. DeMilt, Kun Jin, Cathy H. Xia

公開日: Wed, 22 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.08404

要約:
Out-of-distribution (OoD) generalization occurs when representation learning encounters a distribution shift. This occurs frequently in practice when training and testing data come from different environments. Covariate shift is a type of distribution shift that occurs only in the input data, while the concept distribution stays invariant. We propose RIA - Regularization for Invariance with Adversarial training, a new method for OoD generalization under convariate shift. Motivated by an analogy to $Q$-learning, it performs an adversarial exploration for counterfactual data environments. These new environments are induced by adversarial label invariant data augmentations that prevent a collapse to an in-distribution trained learner. It works with many existing OoD generalization methods for covariate shift that can be formulated as constrained optimization problems. We develop an alternating gradient descent-ascent algorithm to solve the problem in the context of causally generated graph data, and perform extensive experiments on OoD graph classification for various kinds of synthetic and natural distribution shifts. We demonstrate that our method can achieve high accuracy compared with OoD baselines.

stat.ML updates on arXiv.org

📋 論文タイトル一覧