arXiv論文一覧 - stat.ML updates on arXiv.org

#1 Deep Neural Networks as Iterated Function Systems and a Generalization Bound

著者: Jonathan Vacher (MAP5 - UMR 8145)

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.19958

要約:
Deep neural networks (DNNs) achieve remarkable performance on a wide range of tasks, yet their mathematical analysis remains fragmented: stability and generalization are typically studied in disparate frameworks and on a case-by-case basis. Architecturally, DNNs rely on the recursive application of parametrized functions, a mechanism that can be unstable and difficult to train, making stability a primary concern. Even when training succeeds, there are few rigorous results on how well such models generalize beyond the observed data, especially in the generative setting. In this work, we leverage the theory of stochastic Iterated Function Systems (IFS) and show that two important deep architectures can be viewed as, or canonically associated with, place-dependent IFS. This connection allows us to import results from random dynamical systems to (i) establish the existence and uniqueness of invariant measures under suitable contractivity assumptions, and (ii) derive a Wasserstein generalization bound for generative modeling. The bound naturally leads to a new training objective that directly controls the collage-type approximation error between the data distribution and its image under the learned transfer operator. We illustrate the theory on a controlled 2D example and empirically evaluate the proposed objective on standard image datasets (MNIST, CelebA, CIFAR-10).

#2 Minimax Rates for Hyperbolic Hierarchical Learning

著者: Divit Rawal, Sriram Vishwanath

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.20047

要約:
We prove an exponential separation in sample complexity between Euclidean and hyperbolic representations for learning on hierarchical data under standard Lipschitz regularization. For depth-$R$ hierarchies with branching factor $m$, we first establish a geometric obstruction for Euclidean space: any bounded-radius embedding forces volumetric collapse, mapping exponentially many tree-distant points to nearby locations. This necessitates Lipschitz constants scaling as $\exp(\Omega(R))$ to realize even simple hierarchical targets, yielding exponential sample complexity under capacity control. We then show this obstruction vanishes in hyperbolic space: constant-distortion hyperbolic embeddings admit $O(1)$-Lipschitz realizability, enabling learning with $n = O(mR \log m)$ samples. A matching $\Omega(mR \log m)$ lower bound via Fano's inequality establishes that hyperbolic representations achieve the information-theoretic optimum. We also show a geometry-independent bottleneck: any rank-$k$ prediction space captures only $O(k)$ canonical hierarchical contrasts.

#3 Efficient Evaluation of LLM Performance with Statistical Guarantees

著者: Skyler Wu, Yash Nair, Emmanuel J. Cand\'es

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.20251

要約:
Exhaustively evaluating many large language models (LLMs) on a large suite of benchmarks is expensive. We cast benchmarking as finite-population inference and, under a fixed query budget, seek tight confidence intervals (CIs) for model accuracy with valid frequentist coverage. We propose Factorized Active Querying (FAQ), which (a) leverages historical information through a Bayesian factor model; (b) adaptively selects questions using a hybrid variance-reduction/active-learning sampling policy; and (c) maintains validity through Proactive Active Inference -- a finite-population extension of active inference (Zrnic & Candes, 2024) that enables direct question selection while preserving coverage. With negligible overhead cost, FAQ delivers up to $5\times$ effective sample size gains over strong baselines on two benchmark suites, across varying historical-data missingness levels: this means that it matches the CI width of uniform sampling while using up to $5\times$ fewer queries. We release our source code and our curated datasets to support reproducible evaluation and future research.

#4 Empirical Likelihood-Based Fairness Auditing: Distribution-Free Certification and Flagging

著者: Jie Tang, Chuanlong Xie, Xianli Zeng, Lixing Zhu

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.20269

要約:
Machine learning models in high-stakes applications, such as recidivism prediction and automated personnel selection, often exhibit systematic performance disparities across sensitive subpopulations, raising critical concerns regarding algorithmic bias. Fairness auditing addresses these risks through two primary functions: certification, which verifies adherence to fairness constraints; and flagging, which isolates specific demographic groups experiencing disparate treatment. However, existing auditing techniques are frequently limited by restrictive distributional assumptions or prohibitive computational overhead. We propose a novel empirical likelihood-based (EL) framework that constructs robust statistical measures for model performance disparities. Unlike traditional methods, our approach is non-parametric; the proposed disparity statistics follow asymptotically chi-square or mixed chi-square distributions, ensuring valid inference without assuming underlying data distributions. This framework uses a constrained optimization profile that admits stable numerical solutions, facilitating both large-scale certification and efficient subpopulation discovery. Empirically, the EL methods outperform bootstrap-based approaches, yielding coverage rates closer to nominal levels while reducing computational latency by several orders of magnitude. We demonstrate the practical utility of this framework on the COMPAS dataset, where it successfully flags intersectional biases, specifically identifying a significantly higher positive prediction rate for African-American males under 25 and a systemic under-prediction for Caucasian females relative to the population mean.

#5 Physics-informed Blind Reconstruction of Dense Fields from Sparse Measurements using Neural Networks with a Differentiable Simulator

privacy

著者: Ofek Aloni, Barak Fishbain

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.20496

要約:
Generating dense physical fields from sparse measurements is a fundamental question in sampling, signal processing, and many other applications. State-of-the-art methods either use spatial statistics or rely on examples of dense fields in the training phase, which often are not available, and thus rely on synthetic data. Here, we present a reconstruction method that generates dense fields from sparse measurements, without assuming availability of the spatial statistics, nor of examples of the dense fields. This is made possible through the introduction of an automatically differentiable numerical simulator into the training phase of the method. The method is shown to have superior results over statistical and neural network based methods on a set of three standard problems from fluid mechanics.

#6 Incorporating data drift to perform survival analysis on credit risk

著者: Jianwei Peng (Humboldt-Universit\"at zu Berlin), Stefan Lessmann (Humboldt-Universit\"at zu Berlin, Bucharest University of Economic Studies)

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.20533

要約:
Survival analysis has become a standard approach for modelling time to default by time-varying covariates in credit risk. Unlike most existing methods that implicitly assume a stationary data-generating process, in practise, mortgage portfolios are exposed to various forms of data drift caused by changing borrower behaviour, macroeconomic conditions, policy regimes and so on. This study investigates the impact of data drift on survival-based credit risk models and proposes a dynamic joint modelling framework to improve robustness under non-stationary environments. The proposed model integrates a longitudinal behavioural marker derived from balance dynamics with a discrete-time hazard formulation, combined with landmark one-hot encoding and isotonic calibration. Three types of data drift (sudden, incremental and recurring) are simulated and analysed on mortgage loan datasets from Freddie Mac. Experiments and corresponding evidence show that the proposed landmark-based joint model consistently outperforms classical survival models, tree-based drift-adaptive learners and gradient boosting methods in terms of discrimination and calibration across all drift scenarios, which confirms the superiority of our model design.

#7 Sparse clustering via the Deterministic Information Bottleneck algorithm

著者: Efthymios Costa, Ioanna Papatsouma, Angelos Markos

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.20628

要約:
Cluster analysis relates to the task of assigning objects into groups which ideally present some desirable characteristics. When a cluster structure is confined to a subset of the feature space, traditional clustering techniques face unprecedented challenges. We present an information-theoretic framework that overcomes the problems associated with sparse data, allowing for joint feature weighting and clustering. Our proposal constitutes a competitive alternative to existing clustering algorithms for sparse data, as demonstrated through simulations on synthetic data. The effectiveness of our method is established by an application on a real-world genomics data set.

#8 Demystifying Prediction Powered Inference

著者: Yilin Song, Dan M. Kluger, Harsh Parikh, Tian Gu

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.20819

要約:
Machine learning predictions are increasingly used to supplement incomplete or costly-to-measure outcomes in fields such as biomedical research, environmental science, and social science. However, treating predictions as ground truth introduces bias while ignoring them wastes valuable information. Prediction-Powered Inference (PPI) offers a principled framework that leverages predictions from large unlabeled datasets to improve statistical efficiency while maintaining valid inference through explicit bias correction using a smaller labeled subset. Despite its potential, the growing PPI variants and the subtle distinctions between them have made it challenging for practitioners to determine when and how to apply these methods responsibly. This paper demystifies PPI by synthesizing its theoretical foundations, methodological extensions, connections to existing statistics literature, and diagnostic tools into a unified practical workflow. Using the Mosaiks housing price data, we show that PPI variants produce tighter confidence intervals than complete-case analysis, but that double-dipping, i.e. reusing training data for inference, leads to anti-conservative confidence intervals and coverages. Under missing-not-at-random mechanisms, all methods, including classical inference using only labeled data, yield biased estimates. We provide a decision flowchart linking assumption violations to appropriate PPI variants, a summary table of selective methods, and practical diagnostic strategies for evaluating core assumptions. By framing PPI as a general recipe rather than a single estimator, this work bridges methodological innovation and applied practice, helping researchers responsibly integrate predictions into valid inference.

#9 VSCOUT: A Hybrid Variational Autoencoder Approach to Outlier Detection in High-Dimensional Retrospective Monitoring

著者: Waldyn G. Martinez

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.20830

要約:
Modern industrial and service processes generate high-dimensional, non-Gaussian, and contamination-prone data that challenge the foundational assumptions of classical Statistical Process Control (SPC). Heavy tails, multimodality, nonlinear dependencies, and sparse special-cause observations can distort baseline estimation, mask true anomalies, and prevent reliable identification of an in-control (IC) reference set. To address these challenges, we introduce VSCOUT, a distribution-free framework designed specifically for retrospective (Phase I) monitoring in high-dimensional settings. VSCOUT combines an Automatic Relevance Determination Variational Autoencoder (ARD-VAE) architecture with ensemble-based latent outlier filtering and changepoint detection. The ARD prior isolates the most informative latent dimensions, while the ensemble and changepoint filters identify pointwise and structural contamination within the determined latent space. A second-stage retraining step removes flagged observations and re-estimates the latent structure using only the retained inliers, mitigating masking and stabilizing the IC latent manifold. This two-stage refinement produces a clean and reliable IC baseline suitable for subsequent Phase II deployment. Extensive experiments across benchmark datasets demonstrate that VSCOUT achieves superior sensitivity to special-cause structure while maintaining controlled false alarms, outperforming classical SPC procedures, robust estimators, and modern machine-learning baselines. Its scalability, distributional flexibility, and resilience to complex contamination patterns position VSCOUT as a practical and effective method for retrospective modeling and anomaly detection in AI-enabled environments.

#10 Classifier Calibration at Scale: An Empirical Study of Model-Agnostic Post-Hoc Methods

著者: Valery Manokhin, Daniel Gr{\o}nhaug

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.19944

要約:
We study model-agnostic post-hoc calibration methods intended to improve probabilistic predictions in supervised binary classification on real i.i.d. tabular data, with particular emphasis on conformal and Venn-based approaches that provide distribution-free validity guarantees under exchangeability. We benchmark 21 widely used classifiers, including linear models, SVMs, tree ensembles (CatBoost, XGBoost, LightGBM), and modern tabular neural and foundation models, on binary tasks from the TabArena-v0.1 suite using randomized, stratified five-fold cross-validation with a held-out test fold. Five calibrators; Isotonic regression, Platt scaling, Beta calibration, Venn-Abers predictors, and Pearsonify are trained on a separate calibration split and applied to test predictions. Calibration is evaluated using proper scoring rules (log-loss and Brier score) and diagnostic measures (Spiegelhalter's Z, ECE, and ECI), alongside discrimination (AUC-ROC) and standard classification metrics. Across tasks and architectures, Venn-Abers predictors achieve the largest average reductions in log-loss, followed closely by Beta calibration, while Platt scaling exhibits weaker and less consistent effects. Beta calibration improves log-loss most frequently across tasks, whereas Venn-Abers displays fewer instances of extreme degradation and slightly more instances of extreme improvement. Importantly, we find that commonly used calibration procedures, most notably Platt scaling and isotonic regression, can systematically degrade proper scoring performance for strong modern tabular models. Overall classification performance is often preserved, but calibration effects vary substantially across datasets and architectures, and no method dominates uniformly. In expectation, all methods except Pearsonify slightly increase accuracy, but the effect is marginal, with the largest expected gain about 0.008%.

#11 BayPrAnoMeta: Bayesian Proto-MAML for Few-Shot Industrial Image Anomaly Detection

著者: Soham Sarkar, Tanmay Sen, Sayantan Banerjee

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.19992

要約:
Industrial image anomaly detection is a challenging problem owing to extreme class imbalance and the scarcity of labeled defective samples, particularly in few-shot settings. We propose BayPrAnoMeta, a Bayesian generalization of Proto-MAML for few-shot industrial image anomaly detection. Unlike existing Proto-MAML approaches that rely on deterministic class prototypes and distance-based adaptation, BayPrAnoMeta replaces prototypes with task-specific probabilistic normality models and performs inner-loop adaptation via a Bayesian posterior predictive likelihood. We model normal support embeddings with a Normal-Inverse-Wishart (NIW) prior, producing a Student-$t$ predictive distribution that enables uncertainty-aware, heavy-tailed anomaly scoring and is essential for robustness in extreme few-shot settings. We further extend BayPrAnoMeta to a federated meta-learning framework with supervised contrastive regularization for heterogeneous industrial clients and prove convergence to stationary points of the resulting nonconvex objective. Experiments on the MVTec AD benchmark demonstrate consistent and significant AUROC improvements over MAML, Proto-MAML, and PatchCore-based methods in few-shot anomaly detection settings.

#12 Matching and mixing: Matchability of graphs under Markovian error

著者: Zhirui Li, Keith D. Levin, Zhiang Zhao, Vince Lyzinski

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.20020

要約:
We consider the problem of graph matching for a sequence of graphs generated under a time-dependent Markov chain noise model. Our edgelighter error model, a variant of the classical lamplighter random walk, iteratively corrupts the graph $G_0$ with edge-dependent noise, creating a sequence of noisy graph copies $(G_t)$. Much of the graph matching literature is focused on anonymization thresholds in edge-independent noise settings, and we establish novel anonymization thresholds in this edge-dependent noise setting when matching $G_0$ and $G_t$. Moreover, we also compare this anonymization threshold with the mixing properties of the Markov chain noise model. We show that when $G_0$ is drawn from an Erd\H{o}s-R\'enyi model, the graph matching anonymization threshold and the mixing time of the edgelighter walk are both of order $\Theta(n^2\log n)$. We further demonstrate that for more structured model for $G_0$ (e.g., the Stochastic Block Model), graph matching anonymization can occur in $O(n^\alpha\log n)$ time for some $\alpha<2$, indicating that anonymization can occur before the Markov chain noise model globally mixes. Through extensive simulations, we verify our theoretical bounds in the settings of Erd\H{o}s-R\'enyi random graphs and stochastic block model random graphs, and explore our findings on real-world datasets derived from a Facebook friendship network and a European research institution email communication network.

#13 Regime-Adaptive Bayesian Optimization via Dirichlet Process Mixtures of Gaussian Processes

著者: Yan Zhang, Xuefeng Liu, Sipeng Chen, Sascha Ranftl, Chong Liu, Shibo Li

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.20043

要約:
Standard Bayesian Optimization (BO) assumes uniform smoothness across the search space an assumption violated in multi-regime problems such as molecular conformation search through distinct energy basins or drug discovery across heterogeneous molecular scaffolds. A single GP either oversmooths sharp transitions or hallucinates noise in smooth regions, yielding miscalibrated uncertainty. We propose RAMBO, a Dirichlet Process Mixture of Gaussian Processes that automatically discovers latent regimes during optimization, each modeled by an independent GP with locally-optimized hyperparameters. We derive collapsed Gibbs sampling that analytically marginalizes latent functions for efficient inference, and introduce adaptive concentration parameter scheduling for coarse-to-fine regime discovery. Our acquisition functions decompose uncertainty into intra-regime and inter-regime components. Experiments on synthetic benchmarks and real-world applications, including molecular conformer optimization, virtual screening for drug discovery, and fusion reactor design, demonstrate consistent improvements over state-of-the-art baselines on multi-regime objectives.

#14 Concentration Inequalities for Exchangeable Tensors and Matrix-valued Data

著者: Chen Cheng, Rina Foygel Barber

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.20152

要約:
We study concentration inequalities for structured weighted sums of random data, including (i) tensor inner products and (ii) sequential matrix sums. We are interested in tail bounds and concentration inequalities for those structured weighted sums under exchangeability, extending beyond the classical framework of independent terms. We develop Hoeffding and Bernstein bounds provided with structure-dependent exchangeability. Along the way, we recover known results in weighted sum of exchangeable random variables and i.i.d. sums of random matrices to the optimal constants. Notably, we develop a sharper concentration bound for combinatorial sum of matrix arrays than the results previously derived from Chatterjee's method of exchangeable pairs. For applications, the richer structures provide us with novel analytical tools for estimating the average effect of multi-factor response models and studying fixed-design sketching methods in federated averaging. We apply our results to these problems, and find that our theoretical predictions are corroborated by numerical evidence.

#15 Order-Optimal Sample Complexity of Rectified Flows

著者: Hari Krishna Sahoo, Mudit Gaur, Vaneet Aggarwal

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.20250

要約:
Recently, flow-based generative models have shown superior efficiency compared to diffusion models. In this paper, we study rectified flow models, which constrain transport trajectories to be linear from the base distribution to the data distribution. This structural restriction greatly accelerates sampling, often enabling high-quality generation with a single Euler step. Under standard assumptions on the neural network classes used to parameterize the velocity field and data distribution, we prove that rectified flows achieve sample complexity $\tilde{O}(\varepsilon^{-2})$. This improves on the best known $O(\varepsilon^{-4})$ bounds for flow matching model and matches the optimal rate for mean estimation. Our analysis exploits the particular structure of rectified flows: because the model is trained with a squared loss along linear paths, the associated hypothesis class admits a sharply controlled localized Rademacher complexity. This yields the improved, order-optimal sample complexity and provides a theoretical explanation for the strong empirical performance of rectified flow models.

#16 Convergence Analysis of Randomized Subspace Normalized SGD under Heavy-Tailed Noise

著者: Gaku Omiya, Pierre-Louis Poirion, Akiko Takeda

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.20399

要約:
Randomized subspace methods reduce per-iteration cost; however, in nonconvex optimization, most analyses are expectation-based, and high-probability bounds remain scarce even under sub-Gaussian noise. We first prove that randomized subspace SGD (RS-SGD) admits a high-probability convergence bound under sub-Gaussian noise, achieving the same order of oracle complexity as prior in-expectation results. Motivated by the prevalence of heavy-tailed gradients in modern machine learning, we then propose randomized subspace normalized SGD (RS-NSGD), which integrates direction normalization into subspace updates. Assuming the noise has bounded $p$-th moments, we establish both in-expectation and high-probability convergence guarantees, and show that RS-NSGD can achieve better oracle complexity than full-dimensional normalized SGD.

#17 Spectral Diffusion Models on the Sphere

diffusion

著者: Pierpaolo Brutti, Claudio Durastanti, Francesco Mari

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.20498

要約:
Diffusion models provide a principled framework for generative modeling via stochastic differential equations and time-reversed dynamics. Extending spectral diffusion approaches to spherical data, however, raises nontrivial geometric and stochastic issues that are absent in the Euclidean setting. In this work, we develop a diffusion modeling framework defined directly on finite-dimensional spherical harmonic representations of real-valued functions on the sphere. We show that the spherical discrete Fourier transform maps spatial Brownian motion to a constrained Gaussian process in the frequency domain with deterministic, generally non-isotropic covariance. This induces modified forward and reverse-time stochastic differential equations in the spectral domain. As a consequence, spatial and spectral score matching objectives are no longer equivalent, even in the band-limited setting, and the frequency-domain formulation introduces a geometry-dependent inductive bias. We derive the corresponding diffusion equations and characterize the induced noise covariance.

#18 Spectral Bayesian Regression on the Sphere

著者: Claudio Durastanti

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.20528

要約:
We develop a fully intrinsic Bayesian framework for nonparametric regression on the unit sphere based on isotropic Gaussian field priors and the harmonic structure induced by the Laplace-Beltrami operator. Under uniform random design, the regression model admits an exact diagonalization in the spherical harmonic basis, yielding a Gaussian sequence representation with frequency-dependent multiplicities. Exploiting this structure, we derive closed-form posterior distributions, optimal spectral truncation schemes, and sharp posterior contraction rates under integrated squared loss. For Gaussian priors with polynomially decaying angular power spectra, including spherical Mat\'ern priors, we establish posterior contraction rates over Sobolev classes, which are minimax-optimal under correct prior calibration. We further show that the posterior mean admits an exact variational characterization as a geometrically intrinsic penalized least-squares estimator, equivalent to a Laplace-Beltrami smoothing spline.

#19 Robust Distributed Learning under Resource Constraints: Decentralized Quantile Estimation via (Asynchronous) ADMM

著者: Anna van Elst, Igor Colin, Stephan Cl\'emen\c{c}on

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.20571

要約:
Specifications for decentralized learning on resource-constrained edge devices require algorithms that are communication-efficient, robust to data corruption, and lightweight in memory usage. While state-of-the-art gossip-based methods satisfy the first requirement, achieving robustness remains challenging. Asynchronous decentralized ADMM-based methods have been explored for estimating the median, a statistical centrality measure that is notoriously more robust than the mean. However, existing approaches require memory that scales with node degree, making them impractical when memory is limited. In this paper, we propose AsylADMM, a novel gossip algorithm for decentralized median and quantile estimation, primarily designed for asynchronous updates and requiring only two variables per node. We analyze a synchronous variant of AsylADMM to establish theoretical guarantees and empirically demonstrate fast convergence for the asynchronous algorithm. We then show that our algorithm enables quantile-based trimming, geometric median estimation, and depth-based trimming, with quantile-based trimming empirically outperforming existing rank-based methods. Finally, we provide a novel theoretical analysis of rank-based trimming via Markov chain theory.

#20 SA-PEF: Step-Ahead Partial Error Feedback for Efficient Federated Learning

著者: Dawit Kiros Redie, Reza Arablouei, Stefan Werner

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.20738

要約:
Biased gradient compression with error feedback (EF) reduces communication in federated learning (FL), but under non-IID data, the residual error can decay slowly, causing gradient mismatch and stalled progress in the early rounds. We propose step-ahead partial error feedback (SA-PEF), which integrates step-ahead (SA) correction with partial error feedback (PEF). SA-PEF recovers EF when the step-ahead coefficient $\alpha=0$ and step-ahead EF (SAEF) when $\alpha=1$. For non-convex objectives and $\delta$-contractive compressors, we establish a second-moment bound and a residual recursion that guarantee convergence to stationarity under heterogeneous data and partial client participation. The resulting rates match standard non-convex Fed-SGD guarantees up to constant factors, achieving $O((\eta,\eta_0TR)^{-1})$ convergence to a variance/heterogeneity floor with a fixed inner step size. Our analysis reveals a step-ahead-controlled residual contraction $\rho_r$ that explains the observed acceleration in the early training phase. To balance SAEF's rapid warm-up with EF's long-term stability, we select $\alpha$ near its theory-predicted optimum. Experiments across diverse architectures and datasets show that SA-PEF consistently reaches target accuracy faster than EF.

#21 Analyzing decision tree bias towards the minority class

著者: Nathan Phelps, Daniel J. Lizotte, Douglas G. Woolford

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2501.04903

要約:
There is a widespread and longstanding belief that machine learning models are biased towards the majority class when learning from imbalanced binary response data, leading them to neglect or ignore the minority class. Motivated by a recent simulation study that found that decision trees can be biased towards the minority class, our paper aims to reconcile the conflict between that study and other published works. First, we critically evaluate past literature on this problem, finding that failing to consider the conditional distribution of the outcome given the predictors has led to incorrect conclusions about the bias in decision trees. We then show that, under specific conditions, decision trees fit to purity are biased towards the minority class, debunking the belief that decision trees are always biased towards the majority class. This bias can be reduced by adjusting the tree-fitting process to include regularization methods like pruning and setting a maximum tree depth, and/or by using post-hoc calibration methods. Our findings have implications on the use of popular tree-based models, such as random forests. Although random forests are often composed of decision trees fit to purity, our work adds to recent literature indicating that this may not be the best approach.

#22 Online Conformal Model Selection for Nonstationary Time Series

著者: Shibo Li, Yao Zheng

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2506.05544

要約:
This paper introduces the MPS (Model Prediction Set), a novel framework for online model selection for nonstationary time series. Classical model selection methods, such as information criteria and cross-validation, rely heavily on the stationarity assumption and often fail in dynamic environments which undergo gradual or abrupt changes over time. Yet real-world data are rarely stationary, and model selection under nonstationarity remains a largely open problem. To tackle this challenge, we combine conformal inference with model confidence sets to develop a procedure that adaptively selects models best suited to the evolving dynamics at any given time. Concretely, the MPS updates in real time a confidence set of candidate models that covers the best model for the next time period with a specified long-run probability, while adapting to nonstationarity of unknown forms. Through simulations and real-world data analysis, we demonstrate that MPS reliably and efficiently identifies optimal models under nonstationarity, an essential capability lacking in offline methods. Moreover, MPS frequently produces high-quality sets with small cardinality, whose evolution offers deeper insights into changing dynamics. As a generic framework, MPS accommodates any data-generating process, data structure, model class, training method, and evaluation metric, making it broadly applicable across diverse problem settings.

#23 Some Robustness Properties of Label Cleaning

著者: Chen Cheng, John Duchi

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.11379

要約:
We demonstrate that learning procedures that rely on aggregated labels, e.g., label information distilled from noisy responses, enjoy robustness properties impossible without data cleaning. This robustness appears in several ways. In the context of risk consistency -- when one takes the standard approach in machine learning of minimizing a surrogate (typically convex) loss in place of a desired task loss (such as the zero-one mis-classification error) -- procedures using label aggregation obtain stronger consistency guarantees than those even possible using raw labels. And while classical statistical scenarios of fitting perfectly-specified models suggest that incorporating all possible information -- modeling uncertainty in labels -- is statistically efficient, consistency fails for ``standard'' approaches as soon as a loss to be minimized is even slightly mis-specified. Yet procedures leveraging aggregated information still converge to optimal classifiers, highlighting how incorporating a fuller view of the data analysis pipeline, from collection to model-fitting to prediction time, can yield a more robust methodology by refining noisy signals.

#24 Sharpness of Minima in Deep Matrix Factorization

著者: Anil Kamber, Rahul Parhi

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.25783

要約:
Understanding the geometry of the loss landscape near a minimum is key to explaining the implicit bias of gradient-based methods in non-convex optimization problems such as deep neural network training and deep matrix factorization. A central quantity to characterize this geometry is the maximum eigenvalue of the Hessian of the loss. Currently, its precise role has been obfuscated because no exact expressions for this sharpness measure were known in general settings. In this paper, we present the first exact expression for the maximum eigenvalue of the Hessian of the squared-error loss at any minimizer in deep matrix factorization/deep linear neural network training problems, resolving an open question posed by Mulayoff & Michaeli (2020). This expression reveals a fundamental property of the loss landscape in deep matrix factorization: Having a constant product of the spectral norms of the left and right intermediate factors across layers is a sufficient condition for flatness. Most notably, in both depth-$2$ matrix and deep overparameterized scalar factorization, we show that this condition is both necessary and sufficient for flatness, which implies that flat minima are spectral-norm balanced even though they are not necessarily Frobenius-norm balanced. To complement our theory, we provide the first empirical characterization of an escape phenomenon during gradient-based training near a minimizer of a deep matrix factorization problem.

#25 Efficient Group Lasso Regularized Rank Regression with Data-Driven Parameter Determination

著者: Meixia Lin, Meijiao Shi, Yunhai Xiao, Qian Zhang

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.11546

要約:
High-dimensional regression often suffers from heavy-tailed noise and outliers, which can severely undermine the reliability of least-squares based methods. To improve robustness, we adopt a non-smooth Wilcoxon score based rank objective and incorporate structured group sparsity regularization, a natural generalization of the lasso, yielding a group lasso regularized rank regression method. By extending the tuning-free parameter selection scheme originally developed for the lasso, we introduce a data-driven, simulation-based tuning rule and further establish a finite-sample error bound for the resulting estimator. On the computational side, we develop a proximal augmented Lagrangian method for solving the associated optimization problem, which eliminates the singularity issues encountered in existing methods, thereby enabling efficient semismooth Newton updates for the subproblems. Extensive numerical experiments demonstrate the robustness and effectiveness of our proposed estimator against alternatives, and showcase the scalability of the algorithm across both simulated and real-data settings.

#26 Recurrent Neural Networks with Linear Structures for Electricity Price Forecasting

著者: Souhir Ben Amor, Florian Ziel

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.04690

要約:
We present a novel recurrent neural network architecture specifically designed for day-ahead electricity price forecasting, aimed at improving short-term decision-making and operational management in energy systems. Our combined forecasting model embeds linear structures, such as expert models and Kalman filters, into recurrent networks, enabling efficient computation and enhanced interpretability. The design leverages the strengths of both linear and non-linear model structures, allowing it to capture all relevant stylized price characteristics in power markets, including calendar and autoregressive effects, as well as influences from load, renewable energy, and related fuel and carbon markets. For empirical testing, we use hourly data from the largest European electricity market spanning 2018 to 2025 in a comprehensive forecasting study, comparing our model against state-of-the-art approaches, particularly high-dimensional linear and neural network models. In terms of RMSE, the proposed model achieves approximately 11% higher accuracy than the best-performing benchmark. We evaluate the contributions of the interpretable model components and conclude on the impact of combining linear and non-linear structures. We further evaluate the temporal robustness of the model by examining the stability of hyperparameters and the economic significance of key features. Additionally, we introduce a probabilistic extension to quantify forecast uncertainty.

#27 Gradient Dynamics of Attention: How Cross-Entropy Sculpts Bayesian Manifolds

著者: Naman Agarwal, Siddhartha R. Dalal, Vishal Misra

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.22473

要約:
Transformers empirically perform precise probabilistic reasoning in carefully constructed ``Bayesian wind tunnels'' and in large-scale language models, yet the mechanisms by which gradient-based learning creates the required internal geometry remain opaque. We provide a complete first-order analysis of how cross-entropy training reshapes attention scores and value vectors in a transformer attention head. Our core result is an \emph{advantage-based routing law} for attention scores, \[ \frac{\partial L}{\partial s_{ij}} = \alpha_{ij}\bigl(b_{ij}-\mathbb{E}_{\alpha_i}[b]\bigr), \qquad b_{ij} := u_i^\top v_j, \] coupled with a \emph{responsibility-weighted update} for values, \[ \Delta v_j = -\eta\sum_i \alpha_{ij} u_i, \] where $u_i$ is the upstream gradient at position $i$ and $\alpha_{ij}$ are attention weights. These equations induce a positive feedback loop in which routing and content specialize together: queries route more strongly to values that are above-average for their error signal, and those values are pulled toward the queries that use them. We show that this coupled specialization behaves like a two-timescale EM procedure: attention weights implement an E-step (soft responsibilities), while values implement an M-step (responsibility-weighted prototype updates), with queries and keys adjusting the hypothesis frame. Through controlled simulations, including a sticky Markov-chain task where we compare a closed-form EM-style update to standard SGD, we demonstrate that the same gradient dynamics that minimize cross-entropy also sculpt the low-dimensional manifolds identified in our companion work as implementing Bayesian inference. This yields a unified picture in which optimization (gradient flow) gives rise to geometry (Bayesian manifolds), which in turn supports function (in-context probabilistic reasoning).

#28 Universal Sequence Preconditioning

著者: Annie Marsden, Elad Hazan

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2502.06545

要約:
We study the problem of preconditioning in sequential prediction. From the theoretical lens of linear dynamical systems, we show that convolving the target sequence corresponds to applying a polynomial to the hidden transition matrix. Building on this insight, we propose a universal preconditioning method that convolves the target with coefficients from orthogonal polynomials such as Chebyshev or Legendre. We prove that this approach reduces regret for two distinct prediction algorithms and yields the first ever sublinear and hidden-dimension-independent regret bounds (up to logarithmic factors) that hold for systems with marginally table and asymmetric transition matrices. Finally, extensive synthetic and real-world experiments show that this simple preconditioning strategy improves the performance of a diverse range of algorithms, including recurrent neural networks, and generalizes to signals beyond linear dynamical systems.

#29 Summaries as Centroids for Interpretable and Scalable Text Clustering

著者: Jairo Diaz-Rodriguez

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2502.09667

要約:
We introduce k-NLPmeans and k-LLMmeans, text-clustering variants of k-means that periodically replace numeric centroids with textual summaries. The key idea, summary-as-centroid, retains k-means assignments in embedding space while producing human-readable, auditable cluster prototypes. The method is LLM-optional: k-NLPmeans uses lightweight, deterministic summarizers, enabling offline, low-cost, and stable operation; k-LLMmeans is a drop-in upgrade that uses an LLM for summaries under a fixed per-iteration budget whose cost does not grow with dataset size. We also present a mini-batch extension for real-time clustering of streaming text. Across diverse datasets, embedding models, and summarization strategies, our approach consistently outperforms classical baselines and approaches the accuracy of recent LLM-based clustering-without extensive LLM calls. Finally, we provide a case study on sequential text streams and release a StackExchange-derived benchmark for evaluating streaming text clustering.

#30 Fractal and Regular Geometry of Deep Neural Networks

著者: Simmaco Di Lillo, Domenico Marinucci, Michele Salvi, Stefano Vigogna

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2504.06250

要約:
We study the geometric properties of random neural networks by investigating the boundary volumes of their excursion sets for different activation functions, as the depth increases. More specifically, we show that, for activations which are not very regular (e.g., the Heaviside step function), the boundary volumes exhibit fractal behavior, with their Hausdorff dimension monotonically increasing with the depth. On the other hand, for activations which are more regular (e.g., ReLU, logistic and $\tanh$), as the depth increases, the expected boundary volumes can either converge to zero, remain constant or diverge exponentially, depending on a single spectral parameter which can be easily computed. Our theoretical results are confirmed in some numerical experiments based on Monte Carlo simulations.

#31 Taxonomy of reduction matrices for Graph Coarsening

著者: Antonin Joly, Nicolas Keriven, Aline Roumy

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2506.11743

要約:
Graph coarsening aims to diminish the size of a graph to lighten its memory footprint, and has numerous applications in graph signal processing and machine learning. It is usually defined using a reduction matrix and a lifting matrix, which, respectively, allows to project a graph signal from the original graph to the coarsened one and back. This results in a loss of information measured by the so-called Restricted Spectral Approximation (RSA). Most coarsening frameworks impose a fixed relationship between the reduction and lifting matrices, generally as pseudo-inverses of each other, and seek to define a coarsening that minimizes the RSA. In this paper, we remark that the roles of these two matrices are not entirely symmetric: indeed, putting constraints on the lifting matrix alone ensures the existence of important objects such as the coarsened graph's adjacency matrix or Laplacian. In light of this, in this paper, we introduce a more general notion of reduction matrix, that is not necessarily the pseudo-inverse of the lifting matrix. We establish a taxonomy of ``admissible'' families of reduction matrices, discuss the different properties that they must satisfy and whether they admit a closed-form description or not. We show that, for a fixed coarsening represented by a fixed lifting matrix, the RSA can be further reduced simply by modifying the reduction matrix. We explore different examples, including some based on a constrained optimization process of the RSA. Since this criterion has also been linked to the performance of Graph Neural Networks, we also illustrate the impact of this choices on different node classification tasks on coarsened graphs.

#32 Robust Deep Monte Carlo Counterfactual Regret Minimization: Addressing Theoretical Risks in Neural Fictitious Self-Play

著者: Zakaria El Jaafari

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.00923

要約:
Monte Carlo Counterfactual Regret Minimization (MCCFR) has emerged as a cornerstone algorithm for solving extensive-form games, but its integration with deep neural networks introduces scale-dependent challenges that manifest differently across game complexities. This paper presents a comprehensive analysis of how neural MCCFR component effectiveness varies with game scale and proposes an adaptive framework for selective component deployment. We identify that theoretical risks such as nonstationary target distribution shifts, action support collapse, variance explosion, and warm-starting bias have scale-dependent manifestation patterns, requiring different mitigation strategies for small versus large games. Our proposed Robust Deep MCCFR framework incorporates target networks with delayed updates, uniform exploration mixing, variance-aware training objectives, and comprehensive diagnostic monitoring. Through systematic ablation studies on Kuhn and Leduc Poker, we demonstrate scale-dependent component effectiveness and identify critical component interactions. The best configuration achieves final exploitability of 0.0628 on Kuhn Poker, representing a 60% improvement over the classical framework (0.156). On the more complex Leduc Poker domain, selective component usage achieves exploitability of 0.2386, a 23.5% improvement over the classical framework (0.3703) and highlighting the importance of careful component selection over comprehensive mitigation. Our contributions include: (1) a formal theoretical analysis of risks in neural MCCFR, (2) a principled mitigation framework with convergence guarantees, (3) comprehensive multi-scale experimental validation revealing scale-dependent component interactions, and (4) practical guidelines for deployment in larger games.

#33 Diagonal Linear Networks and the Lasso Regularization Path

著者: Rapha\"el Berthier

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.18766

要約:
Diagonal linear networks are neural networks with linear activation and diagonal weight matrices. Their theoretical interest is that their implicit regularization can be rigorously analyzed: from a small initialization, the training of diagonal linear networks converges to the linear predictor with minimal 1-norm among minimizers of the training loss. In this paper, we deepen this analysis showing that the full training trajectory of diagonal linear networks is closely related to the lasso regularization path. In this connection, the training time plays the role of an inverse regularization parameter. Both rigorous results and simulations are provided to illustrate this conclusion. Under a monotonicity assumption on the lasso regularization path, the connection is exact while in the general case, we show an approximate connection.

#34 Higher-Order Feature Attribution: Bridging Statistics, Explainable AI, and Topological Signal Processing

著者: Kurt Butler, Guanchao Feng, Petar Djuric

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.06165

要約:
Feature attributions are post-training analysis methods that assess how various input features of a machine learning model contribute to an output prediction. Their interpretation is straightforward when features act independently, but it becomes less clear when the predictive model involves interactions, such as multiplicative relationships or joint feature contributions. In this work, we propose a general theory of higher-order feature attribution, which we develop on the foundation of Integrated Gradients (IG). This work extends existing frameworks in the literature on explainable AI. When using IG as the method of feature attribution, we discover natural connections to statistics and topological signal processing. We provide several theoretical results that establish the theory, and we validate our theory on a few examples.

#35 LLM Prompt Duel Optimizer: Efficient Label-Free Prompt Optimization

著者: Yuanchen Wu, Saurabh Verma, Justin Lee, Fangzhou Xiong, Poppy Zhang, Amel Awadelkarim, Xu Chen, Yubai Yuan, Shawndra Hill

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.13907

要約:
Large language models (LLMs) are highly sensitive to prompts, but most automatic prompt optimization (APO) methods assume access to ground-truth references (e.g., labeled validation data) that are costly to obtain. We propose the Prompt Duel Optimizer (PDO), a sample-efficient framework for label-free prompt optimization based on pairwise preference feedback from an LLM judge. PDO casts prompt selection as a dueling-bandit problem and combines (i) Double Thompson Sampling to prioritize informative comparisons under a fixed judge budget, with (ii) top-performer guided mutation to expand the candidate pool while pruning weak prompts. Experiments on BIG-bench Hard (BBH) and MS MARCO show that PDO consistently identifies stronger prompts than label-free baselines, while offering favorable quality--cost trade-offs under constrained comparison budgets.

#36 Self-induced stochastic resonance: A physics-informed machine learning approach

著者: Divyesh Savaliya, Marius E. Yamakou

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.22848

要約:
Self-induced stochastic resonance (SISR) is the emergence of coherent oscillations in slow-fast excitable systems driven solely by noise, without external periodic forcing or proximity to a bifurcation. This work presents a physics-informed machine learning framework for modeling and predicting SISR in the stochastic FitzHugh-Nagumo neuron. We embed the governing stochastic differential equations and SISR-asymptotic timescale-matching constraints directly into a Physics-Informed Neural Network (PINN) based on a Noise-Augmented State Predictor architecture. The composite loss integrates data fidelity, dynamical residuals, and barrier-based physical constraints derived from Kramers' escape theory. The trained PINN accurately predicts the dependence of spike-train coherence on noise intensity, excitability, and timescale separation, matching results from direct stochastic simulations with substantial improvements in accuracy and generalization compared with purely data-driven methods, while requiring significantly less computation. The framework provides a data-efficient and interpretable surrogate model for simulating and analyzing noise-induced coherence in multiscale stochastic systems.

#37 The Bayesian Geometry of Transformer Attention

著者: Naman Agarwal, Siddhartha R. Dalal, Vishal Misra

公開日: Thu, 29 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.22471

要約:
Transformers often appear to perform Bayesian reasoning in context, but verifying this rigorously has been impossible: natural data lack analytic posteriors, and large models conflate reasoning with memorization. We address this by constructing \emph{Bayesian wind tunnels} -- controlled environments where the true posterior is known in closed form and memorization is provably impossible. In these settings, small transformers reproduce Bayesian posteriors with $10^{-3}$-$10^{-4}$ bit accuracy, while capacity-matched MLPs fail by orders of magnitude, establishing a clear architectural separation. Across two tasks -- bijection elimination and Hidden Markov Model (HMM) state tracking -- we find that transformers implement Bayesian inference through a consistent geometric mechanism: residual streams serve as the belief substrate, feed-forward networks perform the posterior update, and attention provides content-addressable routing. Geometric diagnostics reveal orthogonal key bases, progressive query-key alignment, and a low-dimensional value manifold parameterized by posterior entropy. During training this manifold unfurls while attention patterns remain stable, a \emph{frame-precision dissociation} predicted by recent gradient analyses. Taken together, these results demonstrate that hierarchical attention realizes Bayesian inference by geometric design, explaining both the necessity of attention and the failure of flat architectures. Bayesian wind tunnels provide a foundation for mechanistically connecting small, verifiable systems to reasoning phenomena observed in large language models.

stat.ML updates on arXiv.org

📋 論文タイトル一覧