arXiv論文一覧 - stat.ML updates on arXiv.org

#1 Dependence-Aware Label Aggregation for LLM-as-a-Judge via Ising Models

著者: Krishnakumar Balasubramanian, Aleksandr Podkopaev, Shiva Prasad Kasiviswanathan

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.22336

要約:
Large-scale AI evaluation increasingly relies on aggregating binary judgments from $K$ annotators, including LLMs used as judges. Most classical methods, e.g., Dawid-Skene or (weighted) majority voting, assume annotators are conditionally independent given the true label $Y\in\{0,1\}$, an assumption often violated by LLM judges due to shared data, architectures, prompts, and failure modes. Ignoring such dependencies can yield miscalibrated posteriors and even confidently incorrect predictions. We study label aggregation through a hierarchy of dependence-aware models based on Ising graphical models and latent factors. For class-dependent Ising models, the Bayes log-odds is generally quadratic in votes; for class-independent couplings, it reduces to a linear weighted vote with correlation-adjusted parameters. We present finite-$K$ examples showing that methods based on conditional independence can flip the Bayes label despite matching per-annotator marginals. We prove separation results demonstrating that these methods remain strictly suboptimal as the number of judges grows, incurring nonvanishing excess risk under latent factors. Finally, we evaluate the proposed method on three real-world datasets, demonstrating improved performance over the classical baselines.

#2 Amortized Simulation-Based Inference in Generalized Bayes via Neural Posterior Estimation

著者: Shiyi Sun, Geoff K. Nicholls, Jeong Eun Lee

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.22367

要約:
Generalized Bayesian Inference (GBI) tempers a loss with a temperature $\beta>0$ to mitigate overconfidence and improve robustness under model misspecification, but existing GBI methods typically rely on costly MCMC or SDE-based samplers and must be re-run for each new dataset and each $\beta$ value. We give the first fully amortized variational approximation to the tempered posterior family $p_\beta(\theta \mid x) \propto \pi(\theta)\,p(x \mid \theta)^\beta$ by training a single $(x,\beta)$-conditioned neural posterior estimator $q_\phi(\theta \mid x,\beta)$ that enables sampling in a single forward pass, without simulator calls or inference-time MCMC. We introduce two complementary training routes: (i) synthesize off-manifold samples $(\theta,x) \sim \pi(\theta)\,p(x \mid \theta)^\beta$ and (ii) reweight a fixed base dataset $\pi(\theta)\,p(x \mid \theta)$ using self-normalized importance sampling (SNIS). We show that the SNIS-weighted objective provides a consistent forward-KL fit to the tempered posterior with finite weight variance. Across four standard simulation-based inference (SBI) benchmarks, including the chaotic Lorenz-96 system, our $\beta$-amortized estimator achieves competitive posterior approximations in standard two-sample metrics, matching non-amortized MCMC-based power-posterior samplers over a wide range of temperatures.

#3 It's all the (Exponential) Family: An Equivalence between Maximum Likelihood Estimation and Control Variates for Sketching Algorithms

著者: Keegan Kang, Kerong Wang, Ding Zhang, Rameshwar Pratap, Bhisham Dev Verma, Benedict H. W. Wong

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.22378

要約:
Maximum likelihood estimators (MLE) and control variate estimators (CVE) have been used in conjunction with known information across sketching algorithms and applications in machine learning. We prove that under certain conditions in an exponential family, an optimal CVE will achieve the same asymptotic variance as the MLE, giving an Expectation-Maximization (EM) algorithm for the MLE. Experiments show the EM algorithm is faster and numerically stable compared to other root finding algorithms for the MLE for the bivariate Normal distribution, and we expect this to hold across distributions satisfying these conditions. We show how the EM algorithm leads to reproducibility for algorithms using MLE / CVE, and demonstrate how the EM algorithm leads to finding the MLE when the CV weights are known.

#4 Simulation-based Bayesian inference with ameliorative learned summary statistics -- Part I

著者: Getachew K. Befekadu

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.22441

要約:
This paper, which is Part 1 of a two-part paper series, considers a simulation-based inference with learned summary statistics, in which such a learned summary statistic serves as an empirical-likelihood with ameliorative effects in the Bayesian setting, when the exact likelihood function associated with the observation data and the simulation model is difficult to obtain in a closed form or computationally intractable. In particular, a transformation technique which leverages the Cressie-Read discrepancy criterion under moment restrictions is used for summarizing the learned statistics between the observation data and the simulation outputs, while preserving the statistical power of the inference. Here, such a transformation of data-to-learned summary statistics also allows the simulation outputs to be conditioned on the observation data, so that the inference task can be performed over certain sample sets of the observation data that are considered as an empirical relevance or believed to be particular importance. Moreover, the simulation-based inference framework discussed in this paper can be extended further, and thus handling weakly dependent observation data. Finally, we remark that such an inference framework is suitable for implementation in distributed computing, i.e., computational tasks involving both the data-to-learned summary statistics and the Bayesian inferencing problem can be posed as a unified distributed inference problem that will exploit distributed optimization and MCMC algorithms for supporting large datasets associated with complex simulation models.

#5 Corrected Samplers for Discrete Flow Models

著者: Zhengyan Wan, Yidong Ouyang, Liyan Xie, Fang Fang, Hongyuan Zha, Guang Cheng

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.22519

要約:
Discrete flow models (DFMs) have been proposed to learn the data distribution on a finite state space, offering a flexible framework as an alternative to discrete diffusion models. A line of recent work has studied samplers for discrete diffusion models, such as tau-leaping and Euler solver. However, these samplers require a large number of iterations to control discretization error, since the transition rates are frozen in time and evaluated at the initial state within each time interval. Moreover, theoretical results for these samplers often require boundedness conditions of the transition rate or they focus on a specific type of source distributions. To address those limitations, we establish non-asymptotic discretization error bounds for those samplers without any restriction on transition rates and source distributions, under the framework of discrete flow models. Furthermore, by analyzing a one-step lower bound of the Euler sampler, we propose two corrected samplers: \textit{time-corrected sampler} and \textit{location-corrected sampler}, which can reduce the discretization error of tau-leaping and Euler solver with almost no additional computational cost. We rigorously show that the location-corrected sampler has a lower iteration complexity than existing parallel samplers. We validate the effectiveness of the proposed method by demonstrating improved generation quality and reduced inference time on both simulation and text-to-image generation tasks. Code can be found in https://github.com/WanZhengyan/Corrected-Samplers-for-Discrete-Flow-Models.

#6 An Efficient Algorithm for Thresholding Monte Carlo Tree Search

著者: Shoma Nameki (Graduate School of Information Science and Technology, Hokkaido University), Atsuyoshi Nakamura (Faculty of Information Science and Technology, Hokkaido University), Junpei Komiyama (Mohamed bin Zayed University of Artificial Intelligence, RIKEN AIP), Koji Tabata (Research Institute for Electronic Science, Hokkaido University)

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.22600

要約:
We introduce the Thresholding Monte Carlo Tree Search problem, in which, given a tree $\mathcal{T}$ and a threshold $\theta$, a player must answer whether the root node value of $\mathcal{T}$ is at least $\theta$ or not. In the given tree, `MAX' or `MIN' is labeled on each internal node, and the value of a `MAX'-labeled (`MIN'-labeled) internal node is the maximum (minimum) of its child values. The value of a leaf node is the mean reward of an unknown distribution, from which the player can sample rewards. For this problem, we develop a $\delta$-correct sequential sampling algorithm based on the Track-and-Stop strategy that has asymptotically optimal sample complexity. We show that a ratio-based modification of the D-Tracking arm-pulling strategy leads to a substantial improvement in empirical sample complexity, as well as reducing the per-round computational cost from linear to logarithmic in the number of arms.

#7 RPWithPrior: Label Differential Privacy in Regression

privacy

著者: Haixia Liu, Ruifan Huang

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.22625

要約:
With the wide application of machine learning techniques in practice, privacy preservation has gained increasing attention. Protecting user privacy with minimal accuracy loss is a fundamental task in the data analysis and mining community. In this paper, we focus on regression tasks under $\epsilon$-label differential privacy guarantees. Some existing methods for regression with $\epsilon$-label differential privacy, such as the RR-On-Bins mechanism, discretized the output space into finite bins and then applied RR algorithm. To efficiently determine these finite bins, the authors rounded the original responses down to integer values. However, such operations does not align well with real-world scenarios. To overcome these limitations, we model both original and randomized responses as continuous random variables, avoiding discretization entirely. Our novel approach estimates an optimal interval for randomized responses and introduces new algorithms designed for scenarios where a prior is either known or unknown. Additionally, we prove that our algorithm, RPWithPrior, guarantees $\epsilon$-label differential privacy. Numerical results demonstrate that our approach gets better performance compared with the Gaussian, Laplace, Staircase, and RRonBins, Unbiased mechanisms on the Communities and Crime, Criteo Sponsored Search Conversion Log, California Housing datasets.

#8 Generative and Nonparametric Approaches for Conditional Distribution Estimation: Methods, Perspectives, and Comparative Evaluations

著者: Yen-Shiu Chin, Zhi-Yu Jou, Toshinari Morimoto, Chia-Tse Wang, Ming-Chung Chang, Tso-Jung Yen, Su-Yun Huang, Tailen Hsing

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.22650

要約:
The inference of conditional distributions is a fundamental problem in statistics, essential for prediction, uncertainty quantification, and probabilistic modeling. A wide range of methodologies have been developed for this task. This article reviews and compares several representative approaches spanning classical nonparametric methods and modern generative models. We begin with the single-index method of Hall and Yao (2005), which estimates the conditional distribution through a dimension-reducing index and nonparametric smoothing of the resulting one-dimensional cumulative conditional distribution function. We then examine the basis-expansion approaches, including FlexCode (Izbicki and Lee, 2017) and DeepCDE (Dalmasso et al., 2020), which convert conditional density estimation into a set of nonparametric regression problems. In addition, we discuss two recent generative simulation-based methods that leverage modern deep generative architectures: the generative conditional distribution sampler (Zhou et al., 2023) and the conditional denoising diffusion probabilistic model (Fu et al., 2024; Yang et al., 2025). A systematic numerical comparison of these approaches is provided using a unified evaluation framework that ensures fairness and reproducibility. The performance metrics used for the estimated conditional distribution include the mean-squared errors of conditional mean and standard deviation, as well as the Wasserstein distance. We also discuss their flexibility and computational costs, highlighting the distinct advantages and limitations of each approach.

#9 Spectral Gradient Descent Mitigates Anisotropy-Driven Misalignment: A Case Study in Phase Retrieval

著者: Guillaume Braun, Han Bao, Wei Huang, Masaaki Imaizumi

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.22652

要約:
Spectral gradient methods, such as the Muon optimizer, modify gradient updates by preserving directional information while discarding scale, and have shown strong empirical performance in deep learning. We investigate the mechanisms underlying these gains through a dynamical analysis of a nonlinear phase retrieval model with anisotropic Gaussian inputs, equivalent to training a two-layer neural network with the quadratic activation and fixed second-layer weights. Focusing on a spiked covariance setting where the dominant variance direction is orthogonal to the signal, we show that gradient descent (GD) suffers from a variance-induced misalignment: during the early escaping stage, the high-variance but uninformative spike direction is multiplicatively amplified, degrading alignment with the true signal under strong anisotropy. In contrast, spectral gradient descent (SpecGD) removes this spike amplification effect, leading to stable alignment and accelerated noise contraction. Numerical experiments confirm the theory and show that these phenomena persist under broader anisotropic covariances.

#10 GRANITE: A Generalized Regional Framework for Identifying Agreement in Feature-Based Explanations

著者: Julia Herbinger, Gabriel Laberge, Maximilian Muschalik, Yann Pequignot, Marvin N. Wright, Fabian Fumagalli

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.22771

要約:
Feature-based explanation methods aim to quantify how features influence the model's behavior, either locally or globally, but different methods often disagree, producing conflicting explanations. This disagreement arises primarily from two sources: how feature interactions are handled and how feature dependencies are incorporated. We propose GRANITE, a generalized regional explanation framework that partitions the feature space into regions where interaction and distribution influences are minimized. This approach aligns different explanation methods, yielding more consistent and interpretable explanations. GRANITE unifies existing regional approaches, extends them to feature groups, and introduces a recursive partitioning algorithm to estimate such regions. We demonstrate its effectiveness on real-world datasets, providing a practical tool for consistent and interpretable feature explanations.

#11 Approximating $f$-Divergences with Rank Statistics

著者: Viktor Stein, Jos\'e Manuel de Frutos

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.22784

要約:
We introduce a rank-statistic approximation of $f$-divergences that avoids explicit density-ratio estimation by working directly with the distribution of ranks. For a resolution parameter $K$, we map the mismatch between two univariate distributions $\mu$ and $\nu$ to a rank histogram on $\{ 0, \ldots, K\}$ and measure its deviation from uniformity via a discrete $f$-divergence, yielding a rank-statistic divergence estimator. We prove that the resulting estimator of the divergence is monotone in $K$, is always a lower bound of the true $f$-divergence, and we establish quantitative convergence rates for $K\to\infty$ under mild regularity of the quantile-domain density ratio. To handle high-dimensional data, we define the sliced rank-statistic $f$-divergence by averaging the univariate construction over random projections, and we provide convergence results for the sliced limit as well. We also derive finite-sample deviation bounds along with asymptotic normality results for the estimator. Finally, we empirically validate the approach by benchmarking against neural baselines and illustrating its use as a learning objective in generative modelling experiments.

#12 OneFlowSBI: One Model, Many Queries for Simulation-Based Inference

著者: Mayank Nautiyal, Li Ju, Melker Ernfors, Klara Hagland, Ville Holma, Maximilian Werk\"o S\"oderholm, Andreas Hellander, Prashant Singh

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.22951

要約:
We introduce \textit{OneFlowSBI}, a unified framework for simulation-based inference that learns a single flow-matching generative model over the joint distribution of parameters and observations. Leveraging a query-aware masking distribution during training, the same model supports multiple inference tasks, including posterior sampling, likelihood estimation, and arbitrary conditional distributions, without task-specific retraining. We evaluate \textit{OneFlowSBI} on ten benchmark inference problems and two high-dimensional real-world inverse problems across multiple simulation budgets. \textit{OneFlowSBI} is shown to deliver competitive performance against state-of-the-art generalized inference solvers and specialized posterior estimators, while enabling efficient sampling with few ODE integration steps and remaining robust under noisy and partially observed data.

#13 Neural Backward Filtering Forward Guiding

著者: Gefan Yang, Frank van der Meulen, Stefan Sommer

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.23030

要約:
Inference in non-linear continuous stochastic processes on trees is challenging, particularly when observations are sparse (leaf-only) and the topology is complex. Exact smoothing via Doob's $h$-transform is intractable for general non-linear dynamics, while particle-based methods degrade in high dimensions. We propose Neural Backward Filtering Forward Guiding (NBFFG), a unified framework for both discrete transitions and continuous diffusions. Our method constructs a variational posterior by leveraging an auxiliary linear-Gaussian process. This auxiliary process yields a closed-form backward filter that serves as a ``guide'', steering the generative path toward high-likelihood regions. We then learn a neural residual--parameterized as a normalizing flow or a controlled SDE--to capture the non-linear discrepancies. This formulation allows for an unbiased path-wise subsampling scheme, reducing the training complexity from tree-size dependent to path-length dependent. Empirical results show that NBFFG outperforms baselines on synthetic benchmarks, and we demonstrate the method on a high-dimensional inference task in phylogenetic analysis with reconstruction of ancestral butterfly wing shapes.

#14 Asymptotic Theory of Iterated Empirical Risk Minimization, with Applications to Active Learning

著者: Hugo Cui, Yue M. Lu

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.23031

要約:
We study a class of iterated empirical risk minimization (ERM) procedures in which two successive ERMs are performed on the same dataset, and the predictions of the first estimator enter as an argument in the loss function of the second. This setting, which arises naturally in active learning and reweighting schemes, introduces intricate statistical dependencies across samples and fundamentally distinguishes the problem from classical single-stage ERM analyses. For linear models trained with a broad class of convex losses on Gaussian mixture data, we derive a sharp asymptotic characterization of the test error in the high-dimensional regime where the sample size and ambient dimension scale proportionally. Our results provide explicit, fully asymptotic predictions for the performance of the second-stage estimator despite the reuse of data and the presence of prediction-dependent losses. We apply this theory to revisit a well-studied pool-based active learning problem, removing oracle and sample-splitting assumptions made in prior work. We uncover a fundamental tradeoff in how the labeling budget should be allocated across stages, and demonstrate a double-descent behavior of the test error driven purely by data selection, rather than model size or sample count.

#15 A Random Matrix Theory of Masked Self-Supervised Regression

著者: Arie Wortsman Zurich, Federica Gerace, Bruno Loureiro, Yue M. Lu

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.23208

要約:
In the era of transformer models, masked self-supervised learning (SSL) has become a foundational training paradigm. A defining feature of masked SSL is that training aggregates predictions across many masking patterns, giving rise to a joint, matrix-valued predictor rather than a single vector-valued estimator. This object encodes how coordinates condition on one another and poses new analytical challenges. We develop a precise high-dimensional analysis of masked modeling objectives in the proportional regime where the number of samples scales with the ambient dimension. Our results provide explicit expressions for the generalization error and characterize the spectral structure of the learned predictor, revealing how masked modeling extracts structure from data. For spiked covariance models, we show that the joint predictor undergoes a Baik--Ben Arous--P\'ech\'e (BBP)-type phase transition, identifying when masked SSL begins to recover latent signals. Finally, we identify structured regimes in which masked self-supervised learning provably outperforms PCA, highlighting potential advantages of SSL objectives over classical unsupervised methods

#16 Graph Attention Network for Node Regression on Random Geometric Graphs with Erd\H{o}s--R\'enyi contamination

著者: Somak Laha, Suqi Liu, Morgane Austern

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.23239

要約:
Graph attention networks (GATs) are widely used and often appear robust to noise in node covariates and edges, yet rigorous statistical guarantees demonstrating a provable advantage of GATs over non-attention graph neural networks~(GNNs) are scarce. We partially address this gap for node regression with graph-based errors-in-variables models under simultaneous covariate and edge corruption: responses are generated from latent node-level covariates, but only noise-perturbed versions of the latent covariates are observed; and the sample graph is a random geometric graph created from the node covariates but contaminated by independent Erd\H{o}s--R\'enyi edges. We propose and analyze a carefully designed, task-specific GAT that constructs denoised proxy features for regression. We prove that regressing the response variables on the proxies achieves lower error asymptotically in (a) estimating the regression coefficient compared to the ordinary least squares (OLS) estimator on the noisy node covariates, and (b) predicting the response for an unlabelled node compared to a vanilla graph convolutional network~(GCN) -- under mild growth conditions. Our analysis leverages high-dimensional geometric tail bounds and concentration for neighbourhood counts and sample covariances. We verify our theoretical findings through experiments on synthetically generated data. We also perform experiments on real-world graphs and demonstrate the effectiveness of the attention mechanism in several node regression tasks.

#17 Variational Tail Bounds for Norms of Random Vectors and Matrices

著者: Sohail Bahmani

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2503.17300

要約:
We propose a variational tail bound for norms of random vectors under moment assumptions on their one-dimensional marginals. A simplified version of the bound that parametrizes the ``aggregating distribution'' using a certain pushforward of the Gaussian distribution is also provided. We apply the proposed method to reproduce some of the well-known bounds on norms of Gaussian random vectors, and also obtain dimension-free tail bounds for the Euclidean norm of random vectors with arbitrary moment profiles. Furthermore, we reproduce a dimension-free concentration inequality for sum of independent and identically distributed positive semidefinite matrices with sub-exponential marginals, and obtain a concentration inequality for the sample covariance matrix of sub-exponential random vectors. We also obtain a tail bound for the operator norm of a random matrix series whose random coefficients may have arbitrary moment profiles. Furthermore, we use coupling to formulate an abstraction of the proposed approach that applies more broadly.

#18 Large Language Models: A Mathematical Formulation

著者: Ricardo Baptista, Andrew Stuart, Son Tran

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.22170

要約:
Large language models (LLMs) process and predict sequences containing text to answer questions, and address tasks including document summarization, providing recommendations, writing software and solving quantitative problems. We provide a mathematical framework for LLMs by describing the encoding of text sequences into sequences of tokens, defining the architecture for next-token prediction models, explaining how these models are learned from data, and demonstrating how they are deployed to address a variety of tasks. The mathematical sophistication required to understand this material is not high, and relies on straightforward ideas from information theory, probability and optimization. Nonetheless, the combination of ideas resting on these different components from the mathematical sciences yields a complex algorithmic structure; and this algorithmic structure has demonstrated remarkable empirical successes. The mathematical framework established here provides a platform from which it is possible to formulate and address questions concerning the accuracy, efficiency and robustness of the algorithms that constitute LLMs. The framework also suggests directions for development of modified and new methodologies.

#19 Adaptive Benign Overfitting (ABO): Overparameterized RLS for Online Learning in Non-stationary Time-series

著者: Luis Ontaneda Mijares, Nick Firoozye

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.22200

要約:
Overparameterized models have recently challenged conventional learning theory by exhibiting improved generalization beyond the interpolation limit, a phenomenon known as benign overfitting. This work introduces Adaptive Benign Overfitting (ABO), extending the recursive least-squares (RLS) framework to this regime through a numerically stable formulation based on orthogonal-triangular updates. A QR-based exponentially weighted RLS (QR-EWRLS) algorithm is introduced, combining random Fourier feature mappings with forgetting-factor regularization to enable online adaptation under non-stationary conditions. The orthogonal decomposition prevents the numerical divergence associated with covariance-form RLS while retaining adaptability to evolving data distributions. Experiments on nonlinear synthetic time series confirm that the proposed approach maintains bounded residuals and stable condition numbers while reproducing the double-descent behavior characteristic of overparameterized models. Applications to forecasting foreign exchange and electricity demand show that ABO is highly accurate (comparable to baseline kernel methods) while achieving speed improvements of between 20 and 40 percent. The results provide a unified view linking adaptive filtering, kernel approximation, and benign overfitting within a stable online learning framework.

#20 Causal Imitation Learning Under Measurement Error and Distribution Shift

著者: Shi Bo, AmirEmad Ghassami

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.22206

要約:
We study offline imitation learning (IL) when part of the decision-relevant state is observed only through noisy measurements and the distribution may change between training and deployment. Such settings induce spurious state-action correlations, so standard behavioral cloning (BC) -- whether conditioning on raw measurements or ignoring them -- can converge to systematically biased policies under distribution shift. We propose a general framework for IL under measurement error, inspired by explicitly modeling the causal relationships among the variables, yielding a target that retains a causal interpretation and is robust to distribution shift. Building on ideas from proximal causal inference, we introduce \texttt{CausIL}, which treats noisy state observations as proxy variables, and we provide identification conditions under which the target policy is recoverable from demonstrations without rewards or interactive expert queries. We develop estimators for both discrete and continuous state spaces; for continuous settings, we use an adversarial procedure over RKHS function classes to learn the required parameters. We evaluate \texttt{CausIL} on semi-simulated longitudinal data from the PhysioNet/Computing in Cardiology Challenge 2019 cohort and demonstrate improved robustness to distribution shift compared to BC baselines.

#21 Matrix Factorization for Practical Continual Mean Estimation Under User-Level Differential Privacy

privacy

著者: Nikita P. Kalinin, Ali Najar, Valentin Roth, Christoph H. Lampert

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.22320

要約:
We study continual mean estimation, where data vectors arrive sequentially and the goal is to maintain accurate estimates of the running mean. We address this problem under user-level differential privacy, which protects each user's entire dataset even when they contribute multiple data points. Previous work on this problem has focused on pure differential privacy. While important, this approach limits applicability, as it leads to overly noisy estimates. In contrast, we analyze the problem under approximate differential privacy, adopting recent advances in the Matrix Factorization mechanism. We introduce a novel mean estimation specific factorization, which is both efficient and accurate, achieving asymptotically lower mean-squared error bounds in continual mean estimation under user-level differential privacy.

#22 Knowledge Gradient for Preference Learning

著者: Kaiwen Wu, Jacob R. Gardner

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.22335

要約:
The knowledge gradient is a popular acquisition function in Bayesian optimization (BO) for optimizing black-box objectives with noisy function evaluations. Many practical settings, however, allow only pairwise comparison queries, yielding a preferential BO problem where direct function evaluations are unavailable. Extending the knowledge gradient to preferential BO is hindered by its computational challenge. At its core, the look-ahead step in the preferential setting requires computing a non-Gaussian posterior, which was previously considered intractable. In this paper, we address this challenge by deriving an exact and analytical knowledge gradient for preferential BO. We show that the exact knowledge gradient performs strongly on a suite of benchmark problems, often outperforming existing acquisition functions. In addition, we also present a case study illustrating the limitation of the knowledge gradient in certain scenarios.

#23 Optimization, Generalization and Differential Privacy Bounds for Gradient Descent on Kolmogorov-Arnold Networks

privacy

著者: Puyu Wang, Junyu Zhou, Philipp Liznerski, Marius Kloft

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.22409

要約:
Kolmogorov--Arnold Networks (KANs) have recently emerged as a structured alternative to standard MLPs, yet a principled theory for their training dynamics, generalization, and privacy properties remains limited. In this paper, we analyze gradient descent (GD) for training two-layer KANs and derive general bounds that characterize their training dynamics, generalization, and utility under differential privacy (DP). As a concrete instantiation, we specialize our analysis to logistic loss under an NTK-separable assumption, where we show that polylogarithmic network width suffices for GD to achieve an optimization rate of order $1/T$ and a generalization rate of order $1/n$, with $T$ denoting the number of GD iterations and $n$ the sample size. In the private setting, we characterize the noise required for $(\epsilon,\delta)$-DP and obtain a utility bound of order $\sqrt{d}/(n\epsilon)$ (with $d$ the input dimension), matching the classical lower bound for general convex Lipschitz problems. Our results imply that polylogarithmic width is not only sufficient but also necessary under differential privacy, revealing a qualitative gap between non-private (sufficiency only) and private (necessity also emerges) training regimes. Experiments further illustrate how these theoretical insights can guide practical choices, including network width selection and early stopping.

#24 Weak Diffusion Priors Can Still Achieve Strong Inverse-Problem Performance

diffusion

著者: Jing Jia, Wei Yuan, Sifan Liu, Liyue Shen, Guanyang Wang

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.22443

要約:
Can a diffusion model trained on bedrooms recover human faces? Diffusion models are widely used as priors for inverse problems, but standard approaches usually assume a high-fidelity model trained on data that closely match the unknown signal. In practice, one often must use a mismatched or low-fidelity diffusion prior. Surprisingly, these weak priors often perform nearly as well as full-strength, in-domain baselines. We study when and why inverse solvers are robust to weak diffusion priors. Through extensive experiments, we find that weak priors succeed when measurements are highly informative (e.g., many observed pixels), and we identify regimes where they fail. Our theory, based on Bayesian consistency, gives conditions under which high-dimensional measurements make the posterior concentrate near the true signal. These results provide a principled justification on when weak diffusion priors can be used reliably.

#25 Changepoint Detection As Model Selection: A General Framework

著者: Michael Grantham, Xueheng Shi, Bertrand Clarke

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.22481

要約:
This dissertation presents a general framework for changepoint detection based on L0 model selection. The core method, Iteratively Reweighted Fused Lasso (IRFL), improves upon the generalized lasso by adaptively reweighting penalties to enhance support recovery and minimize criteria such as the Bayesian Information Criterion (BIC). The approach allows for flexible modeling of seasonal patterns, linear and quadratic trends, and autoregressive dependence in the presence of changepoints. Simulation studies demonstrate that IRFL achieves accurate changepoint detection across a wide range of challenging scenarios, including those involving nuisance factors such as trends, seasonal patterns, and serially correlated errors. The framework is further extended to image data, where it enables edge-preserving denoising and segmentation, with applications spanning medical imaging and high-throughput plant phenotyping. Applications to real-world data demonstrate IRFL's utility. In particular, analysis of the Mauna Loa CO2 time series reveals changepoints that align with volcanic eruptions and ENSO events, yielding a more accurate trend decomposition than ordinary least squares. Overall, IRFL provides a robust, extensible tool for detecting structural change in complex data.

#26 Neural-Inspired Posterior Approximation (NIPA)

著者: Babak Shahbaba, Zahra Moslemi

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.22539

要約:
Humans learn efficiently from their environment by engaging multiple interacting neural systems that support distinct yet complementary forms of control, including model-based (goal-directed) planning, model-free (habitual) responding, and episodic memory-based learning. Model-based mechanisms compute prospective action values using an internal model of the environment, supporting flexible but computationally costly planning; model-free mechanisms cache value estimates and build heuristics that enable fast, efficient habitual responding; and memory-based mechanisms allow rapid adaptation from individual experience. In this work, we aim to elucidate the computational principles underlying this biological efficiency and translate them into a sampling algorithm for scalable Bayesian inference through effective exploration of the posterior distribution. More specifically, our proposed algorithm comprises three components: a model-based module that uses the target distribution for guided but computationally slow sampling; a model-free module that uses previous samples to learn patterns in the parameter space, enabling fast, reflexive sampling without directly evaluating the expensive target distribution; and an episodic-control module that supports rapid sampling by recalling specific past events (i.e., samples). We show that this approach advances Bayesian methods and facilitates their application to large-scale statistical machine learning problems. In particular, we apply our proposed framework to Bayesian deep learning, with an emphasis on proper and principled uncertainty quantification.

#27 Cascaded Flow Matching for Heterogeneous Tabular Data with Mixed-Type Features

著者: Markus Mueller, Kathrin Gruber, Dennis Fok

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.22816

要約:
Advances in generative modeling have recently been adapted to tabular data containing discrete and continuous features. However, generating mixed-type features that combine discrete states with an otherwise continuous distribution in a single feature remains challenging. We advance the state-of-the-art in diffusion models for tabular data with a cascaded approach. We first generate a low-resolution version of a tabular data row, that is, the collection of the purely categorical features and a coarse categorical representation of numerical features. Next, this information is leveraged in the high-resolution flow matching model via a novel guided conditional probability path and data-dependent coupling. The low-resolution representation of numerical features explicitly accounts for discrete outcomes, such as missing or inflated values, and therewith enables a more faithful generation of mixed-type features. We formally prove that this cascade tightens the transport cost bound. The results indicate that our model generates significantly more realistic samples and captures distributional details more accurately, for example, the detection score increases by 40%.

#28 Perplexity Cannot Always Tell Right from Wrong

著者: Petar Veli\v{c}kovi\'c, Federico Barbero, Christos Perivolaropoulos, Simon Osindero, Razvan Pascanu

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.22950

要約:
Perplexity -- a function measuring a model's overall level of "surprise" when encountering a particular output -- has gained significant traction in recent years, both as a loss function and as a simple-to-compute metric of model quality. Prior studies have pointed out several limitations of perplexity, often from an empirical manner. Here we leverage recent results on Transformer continuity to show in a rigorous manner how perplexity may be an unsuitable metric for model selection. Specifically, we prove that, if there is any sequence that a compact decoder-only Transformer model predicts accurately and confidently -- a necessary pre-requisite for strong generalisation -- it must imply existence of another sequence with very low perplexity, but not predicted correctly by that same model. Further, by analytically studying iso-perplexity plots, we find that perplexity will not always select for the more accurate model -- rather, any increase in model confidence must be accompanied by a commensurate rise in accuracy for the new model to be selected.

#29 Value-at-Risk Constrained Policy Optimization

著者: Rohan Tangri, Jan-Peter Calliess

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.22993

要約:
We introduce the Value-at-Risk Constrained Policy Optimization algorithm (VaR-CPO), a sample efficient and conservative method designed to optimize Value-at-Risk (VaR) constraints directly. Empirically, we demonstrate that VaR-CPO is capable of safe exploration, achieving zero constraint violations during training in feasible environments, a critical property that baseline methods fail to uphold. To overcome the inherent non-differentiability of the VaR constraint, we employ the one-sided Chebyshev inequality to obtain a tractable surrogate based on the first two moments of the cost return. Additionally, by extending the trust-region framework of the Constrained Policy Optimization (CPO) method, we provide rigorous worst-case bounds for both policy improvement and constraint violation during the training process.

#30 YuriiFormer: A Suite of Nesterov-Accelerated Transformers

著者: Aleksandr Zimin, Yury Polyanskiy, Philippe Rigollet

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.23236

要約:
We propose a variational framework that interprets transformer layers as iterations of an optimization algorithm acting on token embeddings. In this view, self-attention implements a gradient step of an interaction energy, while MLP layers correspond to gradient updates of a potential energy. Standard GPT-style transformers emerge as vanilla gradient descent on the resulting composite objective, implemented via Lie--Trotter splitting between these two energy functionals. This perspective enables principled architectural design using classical optimization ideas. As a proof of concept, we introduce a Nesterov-style accelerated transformer that preserves the same attention and MLP oracles. The resulting architecture consistently outperforms a nanoGPT baseline on TinyStories and OpenWebText, demonstrating that optimization-theoretic insights can translate into practical gains.

#31 Nested Slice Sampling: Vectorized Nested Sampling for GPU-Accelerated Inference

著者: David Yallup, Namu Kroupa, Will Handley

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.23252

要約:
Model comparison and calibrated uncertainty quantification often require integrating over parameters, but scalable inference can be challenging for complex, multimodal targets. Nested Sampling is a robust alternative to standard MCMC, yet its typically sequential structure and hard constraints make efficient accelerator implementations difficult. This paper introduces Nested Slice Sampling (NSS), a GPU-friendly, vectorized formulation of Nested Sampling that uses Hit-and-Run Slice Sampling for constrained updates. A tuning analysis yields a simple near-optimal rule for setting the slice width, improving high-dimensional behavior and making per-step compute more predictable for parallel execution. Experiments on challenging synthetic targets, high dimensional Bayesian inference, and Gaussian process hyperparameter marginalization show that NSS maintains accurate evidence estimates and high-quality posterior samples, and is particularly robust on difficult multimodal problems where current state-of-the-art methods such as tempered SMC baselines can struggle. An open-source implementation is released to facilitate adoption and reproducibility.

#32 A VAE Approach to Sample Multivariate Extremes

著者: Nicolas Lafon, Philippe Naveau, Ronan Fablet

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2306.10987

要約:
Generating accurate extremes from an observational data set is crucial when seeking to estimate risks associated with the occurrence of future extremes which could be larger than those already observed. Applications range from the occurrence of natural disasters to financial crashes. Generative approaches from the machine learning community do not apply to extreme samples without careful adaptation. Besides, asymptotic results from extreme value theory (EVT) give a theoretical framework to model multivariate extreme events, especially through the notion of multivariate regular variation. Bridging these two fields, this paper details a variational autoencoder (VAE) approach for sampling multivariate heavy-tailed distributions, i.e., distributions likely to have extremes of particularly large intensities. We illustrate the relevance of our approach on a synthetic data set and on a real data set of discharge measurements along the Danube river network. The latter shows the potential of our approach for flood risks' assessment. In addition to outperforming the standard VAE for the tested data sets, we also provide a comparison with a competing EVT-based generative approach. On the tested cases, our approach improves the learning of the dependency structure between extremes.

#33 Extending Mean-Field Variational Inference via Entropic Regularization: Theory and Computation

著者: Bohan Wu, David Blei

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2404.09113

要約:
Variational inference (VI) has emerged as a popular method for approximate inference for high-dimensional Bayesian models. In this paper, we propose a novel VI method that extends the naive mean field via entropic regularization, referred to as $\Xi$-variational inference ($\Xi$-VI). $\Xi$-VI has a close connection to the entropic optimal transport problem and benefits from the computationally efficient Sinkhorn algorithm. We show that $\Xi$-variational posteriors effectively recover the true posterior dependency, where the dependence is downweighted by the regularization parameter. We analyze the role of dimensionality of the parameter space on the accuracy of $\Xi$-variational approximation and how it affects computational considerations, providing a rough characterization of the statistical-computational trade-off in $\Xi$-VI. We also investigate the frequentist properties of $\Xi$-VI and establish results on consistency, asymptotic normality, high-dimensional asymptotics, and algorithmic stability. We provide sufficient criteria for achieving polynomial-time approximate inference using the method. Finally, we demonstrate the practical advantage of $\Xi$-VI over mean-field variational inference on simulated and real data.

#34 Multivariate Bayesian Last Layer for Regression with Uncertainty Quantification and Decomposition

著者: Han Wang, Eiji Kawasaki, Guillaume Damblin, Geoffrey Daniel

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2405.01761

要約:
We present new Bayesian Last Layer neural network models in the setting of multivariate regression under heteroscedastic noise, and propose EM algorithms for parameter learning. Bayesian modeling of a neural network's final layer has the attractive property of uncertainty quantification with a single forward pass. The proposed framework is capable of disentangling the aleatoric and epistemic uncertainty, and can be used to enhance a canonically trained deep neural network with uncertainty-aware capabilities.

#35 Stein's method for marginals on large graphical models

著者: Tiangang Cui, Shuigen Liu, Xin T. Tong

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2410.11771

要約:
Many spatial models exhibit locality structures that effectively reduce their intrinsic dimensionality, enabling efficient approximation and sampling of high-dimensional distributions. However, existing approximation techniques primarily focus on joint distributions and do not provide precise accuracy control for low-dimensional marginals, which are of primary interest in many practical scenarios. By leveraging the locality structures, we establish a dimension independent uniform error bound for the marginals of approximate distributions. Inspired by the Stein's method, we introduce a novel $\delta$-locality condition that quantifies the locality in distributions, and link it to the structural assumptions such as the sparse graphical models. The theoretical guarantee motivates the localization of existing sampling methods, as we illustrate through the localized likelihood-informed subspace method and localized score matching. We show that by leveraging the locality structure, these methods greatly reduce the sample complexity and computational cost via localized and parallel implementations.

#36 Representation Learning for Extrapolation in Perturbation Modeling

著者: Julius von K\"ugelgen, Jakob Ketterer, Xinwei Shen, Nicolai Meinshausen, Jonas Peters

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2504.18522

要約:
We consider the problem of modeling the effects of perturbations, such as gene knockdowns or drugs, on measurements, such as single-cell RNA or protein counts. Given data for some perturbations, we aim to predict the distribution of measurements for new combinations of perturbations. To address this challenging extrapolation task, we posit that perturbations act additively in a suitable, unknown embedding space. We formulate the data-generating process as a latent variable model, in which perturbations amount to mean shifts in latent space and can be combined additively. We then prove that, given sufficiently diverse training perturbations, the representation and perturbation effects are identifiable up to orthogonal transformation and use this to characterize the class of unseen perturbations for which we obtain extrapolation guarantees. We establish a link between our model class and shift interventions in linear latent causal models. To estimate the model from data, we propose a new method, the perturbation distribution autoencoder (PDAE), which is trained by maximizing the distributional similarity between true and simulated perturbation distributions. The trained model can then be used to predict previously unseen perturbation distributions. Through simulations, we demonstrate that PDAE can accurately predict the effects of unseen but identifiable perturbations, supporting our theoretical results.

#37 Generalization Dynamics of Linear Diffusion Models

diffusion

著者: Claudia Merger, Sebastian Goldt

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.24769

要約:
Diffusion models are powerful generative models that produce high-quality samples from complex data. While their infinite-data behavior is well understood, their generalization with finite data remains less clear. Classical learning theory predicts that generalization occurs at a sample complexity that is exponential in the dimension, far exceeding practical needs. We address this gap by analyzing diffusion models through the lens of data covariance spectra, which often follow power-law decays, reflecting the hierarchical structure of real data. To understand whether such a hierarchical structure can benefit learning in diffusion models, we develop a theoretical framework based on linear neural networks, congruent with a Gaussian hypothesis on the data. We quantify how the hierarchical organization of variance in the data and regularization impacts generalization. We find two regimes: When $N d$, we find that the sampling distributions of linear diffusion models approach their optimum (measured by the Kullback-Leibler divergence) linearly with $d/N$, independent of the specifics of the data distribution. Our work clarifies how sample complexity governs generalization in a simple model of diffusion-based generative models.

#38 Calibrating Decision Robustness via Inverse Conformal Risk Control

著者: Wenbin Zhou, Shixiang Zhu

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.07750

要約:
Robust optimization safeguards decisions against uncertainty by optimizing against worst-case scenarios, yet their effectiveness hinges on a prespecified robustness level that is often chosen ad hoc, leading to either insufficient protection or overly conservative and costly solutions. Recent approaches using conformal prediction construct data-driven uncertainty sets with finite-sample coverage guarantees, but they still fix coverage targets a priori and offer little guidance for selecting robustness levels. We propose a new framework that provides distribution-free, finite-sample guarantees on both miscoverage and regret for any family of robust predict-then-optimize policies. Our method constructs valid estimators that trace out the miscoverage--regret Pareto frontier, enabling decision-makers to reliably evaluate and calibrate robustness levels according to their cost--risk preferences. The framework is simple to implement, broadly applicable across classical optimization formulations, and achieves sharper finite-sample performance. This paper offers a principled data-driven methodology for guiding robustness selection and empowers practitioners to balance robustness and conservativeness in high-stakes decision-making.

#39 Deep Ensembles for Epistemic Uncertainty: A Frequentist Perspective

著者: Anchit Jain, Stephen Bates

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.22063

要約:
Decomposing prediction uncertainty into aleatoric (irreducible) and epistemic (reducible) components is critical for the reliable deployment of machine learning systems. While the mutual information between the response variable and model parameters is a principled measure for epistemic uncertainty, it requires access to the parameter posterior, which is computationally challenging to approximate. Consequently, practitioners often rely on probabilistic predictions from deep ensembles to quantify uncertainty, which have demonstrated strong empirical performance. However, a theoretical understanding of their success from a frequentist perspective remains limited. We address this gap by first considering a bootstrap-based estimator for epistemic uncertainty, which we prove is asymptotically correct. Next, we connect deep ensembles to the bootstrap estimator by decomposing it into data variability and training stochasticity; specifically, we show that deep ensembles capture the training stochasticity component. Through empirical studies, we show that this stochasticity component constitutes the majority of epistemic uncertainty, thereby explaining the effectiveness of deep ensembles.

#40 Physics-Informed Neural Networks and Neural Operators for Parametric PDEs

著者: Zhuo Zhang, Xiong Xiong, Sen Zhang, Yuan Zhao, Xi Yang

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.04576

要約:
PDEs arise ubiquitously in science and engineering, where solutions depend on parameters (physical properties, boundary conditions, geometry). Traditional numerical methods require re-solving the PDE for each parameter, making parameter space exploration prohibitively expensive. Recent machine learning advances, particularly physics-informed neural networks (PINNs) and neural operators, have revolutionized parametric PDE solving by learning solution operators that generalize across parameter spaces. We critically analyze two main paradigms: (1) PINNs, which embed physical laws as soft constraints and excel at inverse problems with sparse data, and (2) neural operators (e.g., DeepONet, Fourier Neural Operator), which learn mappings between infinite-dimensional function spaces and achieve unprecedented generalization. Through comparisons across fluid dynamics, solid mechanics, heat transfer, and electromagnetics, we show neural operators can achieve computational speedups of $10^3$ to $10^5$ times faster than traditional solvers for multi-query scenarios, while maintaining comparable accuracy. We provide practical guidance for method selection, discuss theoretical foundations (universal approximation, convergence), and identify critical open challenges: high-dimensional parameters, complex geometries, and out-of-distribution generalization. This work establishes a unified framework for understanding parametric PDE solvers via operator learning, offering a comprehensive, incrementally updated resource for this rapidly evolving field

#41 CAOS: Conformal Aggregation of One-Shot Predictors

著者: Maja Waldron

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.05219

要約:
One-shot prediction enables rapid adaptation of pretrained foundation models to new tasks using only one labeled example, but lacks principled uncertainty quantification. While conformal prediction provides finite-sample coverage guarantees, standard split conformal methods are inefficient in the one-shot setting due to data splitting and reliance on a single predictor. We propose Conformal Aggregation of One-Shot Predictors (CAOS), a conformal framework that adaptively aggregates multiple one-shot predictors and uses a leave-one-out calibration scheme to fully exploit scarce labeled data. Despite violating classical exchangeability assumptions, we prove that CAOS achieves valid marginal coverage using a monotonicity-based argument. Experiments on one-shot facial landmarking and RAFT text classification tasks show that CAOS produces substantially smaller prediction sets than split conformal baselines while maintaining reliable coverage.

#42 Optimal Transport under Group Fairness Constraints

著者: Linus Bleistein, Mathieu Dagr\'eou, Francisco Andrade, Thomas Boudou, Aur\'elien Bellet

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07144

要約:
Ensuring fairness in matching algorithms is a key challenge in allocating scarce resources and positions. Focusing on Optimal Transport (OT), we introduce a novel notion of group fairness requiring that the probability of matching two individuals from any two given groups in the OT plan satisfies a predefined target. We first propose a modified Sinkhorn algorithm to compute perfectly fair transport plans efficiently. Since exact fairness can significantly degrade matching quality in practice, we then develop two relaxation strategies. The first one involves solving a penalized OT problem, for which we derive novel finite-sample complexity guarantees. Our second strategy leverages bilevel optimization to learn a ground cost that induces a fair OT solution, and we establish a bound on the deviation of fairness when matching unseen data. Finally, we present empirical results illustrating the performance of our approaches and the trade-off between fairness and transport cost.

#43 Uncertainty Quantification for Prior-Data Fitted Networks using Martingale Posteriors

著者: Thomas Nagler, David R\"ugamer

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.11325

要約:
Prior-data fitted networks (PFNs) have emerged as promising foundation models for prediction from tabular data sets, achieving state-of-the-art performance on small to moderate data sizes without tuning. While PFNs are motivated by Bayesian ideas, they do not provide any uncertainty quantification for predictive means, quantiles, or similar quantities. We propose a principled and efficient sampling procedure to construct Bayesian posteriors for such estimates based on Martingale posteriors, and prove its convergence. Several simulated and real-world data examples showcase the uncertainty quantification of our method in inference applications.

#44 Bias-Optimal Bounds for SGD: A Computer-Aided Lyapunov Analysis

著者: Daniel Cortild, Lucas Ketels, Juan Peypouquet, Guillaume Garrigos

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.17965

要約:
The non-asymptotic analysis of Stochastic Gradient Descent (SGD) typically yields bounds that decompose into a bias term and a variance term. In this work, we focus on the bias component and study the extent to which SGD can match the optimal convergence behavior of deterministic gradient descent. Assuming only (strong) convexity and smoothness of the objective, we derive new bounds that are bias-optimal, in the sense that the bias term coincides with the worst-case rate of gradient descent. Our results hold for the full range of constant step-sizes $\gamma L \in (0,2)$, including critical and large step-size regimes that were previously unexplored without additional variance assumptions. The bounds are obtained through the construction of a simple Lyapunov energy whose monotonicity yields sharp convergence guarantees. To design the parameters of this energy, we employ the Performance Estimation Problem framework, which we also use to provide numerical evidence for the optimality of the associated variance terms.

#45 Model Agnostic Differentially Private Causal Inference

privacy

著者: Christian Janos Lebeda, Mathieu Even, Aur\'elien Bellet, Julie Josse

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.19589

要約:
Estimating causal effects from observational data is essential in fields such as medicine, economics and social sciences, where privacy concerns are paramount. We propose a general, model-agnostic framework for differentially private estimation of average treatment effects (ATE) that avoids strong structural assumptions on the data-generating process or the models used to estimate propensity scores and conditional outcomes. In contrast to prior work, which enforces differential privacy by directly privatizing these nuisance components, our approach decouples nuisance estimation from privacy protection. This separation allows the use of flexible, state-of-the-art black-box models, while differential privacy is achieved by perturbing only predictions and aggregation steps within a fold-splitting scheme with ensemble techniques. We instantiate the framework for three classical estimators -- the G-Formula, inverse propensity weighting (IPW), and augmented IPW (AIPW) -- and provide formal utility and privacy guarantees, together with privatized confidence intervals. Empirical results on synthetic and real data show that our methods maintain competitive performance under realistic privacy budgets.

#46 Antithetic Noise in Diffusion Models

diffusion

著者: Jing Jia, Sifan Liu, Bowen Song, Wei Yuan, Liyue Shen, Guanyang Wang

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2506.06185

要約:
We systematically study antithetic initial noise in diffusion models, discovering that pairing each noise sample with its negation consistently produces strong negative correlation. This universal phenomenon holds across datasets, model architectures, conditional and unconditional sampling, and even other generative models such as VAEs and Normalizing Flows. To explain it, we combine experiments and theory and propose a \textit{symmetry conjecture} that the learned score function is approximately affine antisymmetric (odd symmetry up to a constant shift), supported by empirical evidence. This negative correlation leads to substantially more reliable uncertainty quantification with up to $90\%$ narrower confidence intervals. We demonstrate these gains on tasks including estimating pixel-wise statistics and evaluating diffusion inverse solvers. We also provide extensions with randomized quasi-Monte Carlo noise designs for uncertainty quantification, and explore additional applications of the antithetic noise design to improve image editing and generation diversity. Our framework is training-free, model-agnostic, and adds no runtime overhead. Code is available at https://github.com/jjia131/Antithetic-Noise-in-Diffusion-Models-page.

#47 Tokenization Multiplicity Leads to Arbitrary Price Variation in LLM-as-a-service

著者: Ivi Chatzi, Nina Corvelo Benz, Stratis Tsirtsis, Manuel Gomez-Rodriguez

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2506.06446

要約:
Providers of LLM-as-a-service have predominantly adopted a simple pricing model: users pay a fixed price per token. Consequently, one may think that the price two different users would pay for the same output string under the same input prompt is the same. In our work, we show that, surprisingly, this is not (always) true. We find empirical evidence that, particularly for non-english outputs, both proprietary and open-weights LLMs often generate the same (output) string with multiple different tokenizations, even under the same input prompt, and this in turn leads to arbitrary price variation. To address the problem of tokenization multiplicity, we introduce canonical generation, a type of constrained generation that restricts LLMs to only generate canonical tokenizations -- the unique tokenization in which each string is tokenized during the training process of an LLM. Further, we introduce an efficient sampling algorithm for canonical generation based on the Gumbel-Max trick. Experiments on a variety of natural language tasks demonstrate that our sampling algorithm for canonical generation is comparable to standard sampling in terms of performance and runtime, and it solves the problem of tokenization multiplicity.

#48 Diffusion Models under Alternative Noise: Simplified Analysis and Sensitivity

diffusion

著者: Juhyeok Choi, Chenglin Fan

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2506.08337

要約:
Diffusion models, typically formulated as discretizations of stochastic differential equations (SDEs), have achieved state-of-the-art performance in generative tasks. However, their theoretical analysis often involves complex proofs. In this work, we present a simplified framework for analyzing the Euler--Maruyama discretization of variance-preserving SDEs (VP-SDEs). Using Gr\"onwall's inequality, we derive a convergence rate of $O(T^{-1/2})$ under standard Lipschitz assumptions, streamlining prior analyses. We then demonstrate that the standard Gaussian noise can be replaced by computationally cheaper discrete random variables (e.g., Rademacher) without sacrificing this convergence guarantee, provided the mean and variance are matched. Our experiments validate this theory, showing that (i) discrete noise achieves sample quality comparable to Gaussian noise provided the variance is matched correctly, and (ii) performance degrades if the noise variance is scaled incorrectly.

#49 Error Analysis of Discrete Flow with Generator Matching

著者: Zhengyan Wan, Yidong Ouyang, Qiang Yao, Liyan Xie, Fang Fang, Hongyuan Zha, Guang Cheng

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.21906

要約:
Discrete flow models offer a powerful framework for learning distributions over discrete state spaces and have demonstrated superior performance compared to the discrete diffusion models. However, their convergence properties and error analysis remain largely unexplored. In this work, we develop a unified framework grounded in stochastic calculus theory to systematically investigate the theoretical properties of discrete flow models. Specifically, by leveraging a Girsanov-type theorem for the path measures of two continuous-time Markov chains (CTMCs), we present a comprehensive error analysis that accounts for both transition rate estimation error and early stopping error. In fact, the estimation error of transition rates has received little attention in existing works. Unlike discrete diffusion models, discrete flow incurs no initialization error caused by truncating the time horizon in the noising process. Building on generator matching and uniformization, we establish non-asymptotic error bounds for distribution estimation without the boundedness condition on oracle transition rates. Furthermore, we derive a faster rate of total variation convergence for the estimated distribution with the boundedness condition, yielding a nearly optimal rate in terms of sample size. Our results provide the first error analysis for discrete flow models. We also investigate model performance under different settings based on simulation results.

#50 Direct Bias-Correction Term Estimation for Average Treatment Effect Estimation

著者: Masahiro Kato

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.22122

要約:
This study considers the estimation of the direct bias-correction term for estimating the average treatment effect (ATE). Let $\{(X_i, D_i, Y_i)\}_{i=1}^{n}$ be the observations, where $X_i$ denotes $K$-dimensional covariates, $D_i \in \{0, 1\}$ denotes a binary treatment assignment indicator, and $Y_i$ denotes an outcome. In ATE estimation, $h_0(D_i, X_i) = \frac{1[D_i = 1]}{e_0(X_i)} - \frac{1[D_i = 0]}{1 - e_0(X_i)}$ is called the bias-correction term, where $e_0(X_i)$ is the propensity score. The bias-correction term is also referred to as the Riesz representer or clever covariates, depending on the literature, and plays an important role in construction of efficient ATE estimators. In this study, we propose estimating $h_0$ by directly minimizing the Bregman divergence between its model and $h_0$, which includes squared error and Kullback--Leibler divergence as special cases. Our proposed method is inspired by direct density ratio estimation methods and generalizes existing bias-correction term estimation methods, such as covariate balancing weights, Riesz regression, and nearest neighbor matching. Importantly, under specific choices of bias-correction term models and Bregman divergence, we can automatically ensure the covariate balancing property. Thus, our study provides a practical modeling and estimation approach through a generalization of existing methods.

#51 Fidel-TS: A High-Fidelity Multimodal Benchmark for Time Series Forecasting

著者: Zhijian Xu, Wanxu Cai, Xilin Dai, Zhaorong Deng, Qiang Xu

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.24789

要約:
The evaluation of time series forecasting models is hindered by a critical lack of high-quality benchmarks, leading to a potential illusion of progress. Existing datasets suffer from issues ranging from pre-training data contamination in the age of LLMs to the temporal and description leakage prevalent in early multimodal designs. To address this, we formalize the core principles of high-fidelity benchmarking, focusing on data sourcing integrity, leak-free and causally sound design, and structural clarity. We introduce Fidel-TS, a new large-scale benchmark built from the ground up on these principles by sourcing data from live APIs. Our experiments reveal the flaws of the previous benchmarks and the biases in model evaluation, providing new insights into multiple existing forecasting models and LLMs across various evaluation tasks.

#52 Test-Time Anchoring for Discrete Diffusion Posterior Sampling

diffusion

著者: Litu Rout, Andreas Lugmayr, Yasamin Jafarian, Srivatsan Varadharajan, Constantine Caramanis, Sanjay Shakkottai, Ira Kemelmacher-Shlizerman

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.02291

要約:
While continuous diffusion models have achieved remarkable success, discrete diffusion offers a unified framework for jointly modeling text and images. Beyond unification, discrete diffusion provides faster inference, finer control, and principled training-free guidance, making it well-suited for posterior sampling. Existing approaches to posterior sampling using discrete diffusion face severe challenges: derivative-free guidance yields sparse signals, continuous relaxations limit applicability, and split Gibbs samplers suffer from the curse of dimensionality. To overcome these limitations, we introduce Anchored Posterior Sampling (APS), built on two key innovations: quantized expectation for gradient-like guidance in discrete embedding space, and anchored remasking for adaptive decoding. APS achieves state-of-the-art performance among discrete diffusion samplers on both linear and nonlinear inverse problems across the standard image benchmarks. We demonstrate the generality of APS through training-free stylization and text-guided editing. We further apply APS to a large-scale diffusion language model, showing consistent improvement in question answering.

#53 A General Framework for Joint Multi-State Models

著者: F\'elix Laplante, Christophe Ambroise

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.07128

要約:
Conventional joint modeling approaches generally characterize the relationship between longitudinal biomarkers and discrete event occurrences within terminal, recurring or competing risk settings, thereby offering a limited representation of complex, multi-state trajectories. We propose a general multi-state joint modeling framework that unifies longitudinal biomarker dynamics with multi-state time-to-event processes defined on arbitrary directed graphs. The proposed framework also accomodates nonlinear longitudinal submodels and scalable inference via stochastic gradient descent. This formulation encompasses both Markovian and semi-Markovian transition structures, allowing recurrent cycles and terminal absorptions to be naturally represented. The longitudinal and event processes are linked through shared latent structures within nonlinear mixed-effects models, extending classical joint modeling formulations. We derive the complete likelihood, model selection criteria, and develop scalable inference procedures based on stochastic gradient descent to enable high-dimensional and large-scale applications. In addition, we formulate a dynamic prediction framework that provides individualized state-transition probabilities and personalized risk assessments along complex event trajectories. Through simulation and application to the PAQUID cohort, we demonstrate accurate parameter recovery and individualized prediction.

#54 MARS-M: When Variance Reduction Meets Matrices

著者: Yifeng Liu, Angela Yuan, Quanquan Gu

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.21800

要約:
Matrix-based preconditioned optimizers, such as Muon, have recently been shown to be more efficient than scalar-based optimizers for training large-scale neural networks, including large language models (LLMs). Recent benchmark studies of LLM pretraining optimizers have demonstrated that variance-reduction techniques such as MARS can substantially speed up training compared with standard optimizers that do not employ variance reduction. In this paper, we introduce MARS-M, a new optimizer that integrates MARS-style variance reduction with Muon. Under standard regularity conditions, we prove that MARS-M converges to a first-order stationary point at a rate of $\tilde{\mathcal{O}}(T^{-1/3})$, improving upon the $\tilde{\mathcal{O}}(T^{-1/4})$ rate attained by Muon. Empirical results on language modeling and computer vision tasks demonstrate that MARS-M consistently yields lower losses and improved performance across various downstream benchmarks. The implementation of MARS-M is available at https://github.com/AGI-Arena/MARS/tree/main/MARS_M.

#55 An Analysis of Causal Effect Estimation using Outcome Invariant Data Augmentation

著者: Uzair Akbar, Niki Kilbertus, Hao Shen, Krikamol Muandet, Bo Dai

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.25128

要約:
The technique of data augmentation (DA) is often used in machine learning for regularization purposes to better generalize under i.i.d. settings. In this work, we present a unifying framework with topics in causal inference to make a case for the use of DA beyond just the i.i.d. setting, but for generalization across interventions as well. Specifically, we argue that when the outcome generating mechanism is invariant to our choice of DA, then such augmentations can effectively be thought of as interventions on the treatment generating mechanism itself. This can potentially help to reduce bias in causal effect estimation arising from hidden confounders. In the presence of such unobserved confounding we typically make use of instrumental variables (IVs) -- sources of treatment randomization that are conditionally independent of the outcome. However, IVs may not be as readily available as DA for many applications, which is the main motivation behind this work. By appropriately regularizing IV based estimators, we introduce the concept of IV-like (IVL) regression for mitigating confounding bias and improving predictive performance across interventions even when certain IV properties are relaxed. Finally, we cast parameterized DA as an IVL regression problem and show that when used in composition can simulate a worst-case application of such DA, further improving performance on causal estimation and generalization tasks beyond what simple DA may offer. This is shown both theoretically for the population case and via simulation experiments for the finite sample case using a simple linear example. We also present real data experiments to support our case.

#56 Optimal Fairness under Local Differential Privacy

privacy

著者: Hrad Ghoukasian, Shahab Asoodeh

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.16377

要約:
We investigate how to optimally design local differential privacy (LDP) mechanisms that reduce data unfairness and thereby improve fairness in downstream classification. We first derive a closed-form optimal mechanism for binary sensitive attributes and then develop a tractable optimization framework that yields the corresponding optimal mechanism for multi-valued attributes. As a theoretical contribution, we establish that for discrimination-accuracy optimal classifiers, reducing data unfairness necessarily leads to lower classification unfairness, thus providing a direct link between privacy-aware pre-processing and classification fairness. Empirically, we demonstrate that our approach consistently outperforms existing LDP mechanisms in reducing data unfairness across diverse datasets and fairness metrics, while maintaining accuracy close to that of non-private models. Moreover, compared with leading pre-processing and post-processing fairness methods, our mechanism achieves a more favorable accuracy-fairness trade-off while simultaneously preserving the privacy of sensitive attributes. Taken together, these results highlight LDP as a principled and effective pre-processing fairness intervention technique.

#57 Tuning-Free Structured Sparse Recovery of Multiple Measurement Vectors using Implicit Regularization

著者: Lakshmi Jayalal, Sheetal Kalyani

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.03393

要約:
Recovering jointly sparse signals in the multiple measurement vectors (MMV) setting is a fundamental problem in machine learning, but traditional methods often require careful parameter tuning or prior knowledge of the sparsity of the signal and/or noise variance. We propose a tuning-free framework that leverages implicit regularization (IR) from overparameterization to overcome this limitation. Our approach reparameterizes the estimation matrix into factors that decouple the shared row-support from individual vector entries and applies gradient descent to a standard least-squares objective. We prove that with a sufficiently small and balanced initialization, the optimization dynamics exhibit a "momentum-like" effect where the true support grows significantly faster. Leveraging a Lyapunov-based analysis of the gradient flow, we further establish formal guarantees that the solution trajectory converges towards an idealized row-sparse solution. Empirical results demonstrate that our tuning-free approach achieves performance comparable to optimally tuned established methods. Furthermore, our framework significantly outperforms these baselines in scenarios where accurate priors are unavailable to the baselines.

#58 Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs

著者: Aaron Defazio, Konstantin Mishchenko, Parameswaran Raman, Hao-Jun Michael Shi, Lin Xiao

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.17131

要約:
We propose Generalized Primal Averaging (GPA), an extension of Nesterov's method that unifies and generalizes recent averaging-based optimizers like single-worker DiLoCo and Schedule-Free, within a non-distributed setting. While DiLoCo relies on a memory-intensive two-loop structure to periodically aggregate pseudo-gradients using Nesterov momentum, GPA eliminates this complexity by decoupling Nesterov's interpolation constants to enable smooth iterate averaging at every step. Structurally, GPA resembles Schedule-Free but replaces uniform averaging with exponential moving averaging. Empirically, GPA consistently outperforms single-worker DiLoCo and AdamW with reduced memory overhead. GPA achieves speedups of 8.71%, 10.13%, and 9.58% over the AdamW baseline in terms of steps to reach target validation loss for Llama-160M, 1B, and 8B models, respectively. Similarly, on the ImageNet ViT workload, GPA achieves speedups of 7% and 25.5% in the small and large batch settings respectively. Furthermore, we prove that for any base optimizer with $O(\sqrt{T})$ regret, where $T$ is the number of iterations, GPA matches or exceeds the original convergence guarantees depending on the interpolation constants.

#59 ScoreMatchingRiesz: Score Matching for Debiased Machine Learning and Policy Path Estimation

著者: Masahiro Kato

公開日: Mon, 02 Feb 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.20523

要約:
We propose ScoreMatchingRiesz, a family of Riesz representer estimators based on score matching. The Riesz representer is a key nuisance component in debiased machine learning, enabling $\sqrt{n}$-consistent and asymptotically efficient estimation of causal and structural targets via Neyman-orthogonal scores. We formulate Riesz representer estimation as a score estimation problem. This perspective stabilizes representer estimation by allowing us to leverage denoising score matching and telescoping density ratio estimation. We also introduce the policy path, a parameter that captures how policy effects evolve under continuous treatments. We show that the policy path can be estimated via score matching by smoothly connecting average marginal effect (AME) and average policy effect (APE) estimation, which improves the interpretability of policy effects.

stat.ML updates on arXiv.org

📋 論文タイトル一覧