arXiv論文一覧 - stat.ML updates on arXiv.org

#1 Contextual Strongly Convex Simulation Optimization: Optimize then Predict with Inexact Solutions

著者: Nifei Lin, Heng Luo, L. Jeff Hong

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.06270

要約:
In this work, we study contextual strongly convex simulation optimization and adopt an "optimize then predict" (OTP) approach for real-time decision making. In the offline stage, simulation optimization is conducted across a set of covariates to approximate the optimal-solution function; in the online stage, decisions are obtained by evaluating this approximation at the observed covariate. The central theoretical challenge is to understand how the inexactness of solutions generated by simulation-optimization algorithms affects the optimality gap, which is overlooked in existing studies. To address this, we develop a unified analysis framework that explicitly accounts for both solution bias and variance. Using Polyak-Ruppert averaging SGD as an illustrative simulation-optimization algorithm, we analyze the optimality gap of OTP under four representative smoothing techniques: $k$ nearest neighbor, kernel smoothing, linear regression, and kernel ridge regression. We establish convergence rates, derive the optimal allocation of the computational budget $\Gamma$ between the number of design covariates and the per-covariate simulation effort, and demonstrate the convergence rate can approximately achieve $\Gamma^{-1}$ under appropriate smoothing technique and sample-allocation rule. Finally, through a numerical study, we validate the theoretical findings and demonstrate the effectiveness and practical value of the proposed approach.

#2 Modeling Spatio-temporal Extremes via Conditional Variational Autoencoders

著者: Xiaoyu Ma, Likun Zhang, Christopher K. Wikle

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.06348

要約:
Extreme weather events are widely studied in fields such as agriculture, ecology, and meteorology. The spatio-temporal co-occurrence of extreme events can strengthen or weaken under changing climate conditions. In this paper, we propose a novel approach to model spatio-temporal extremes by integrating climate indices via a conditional variational autoencoder (cXVAE). A convolutional neural network (CNN) is embedded in the decoder to convolve climatological indices with the spatial dependence within the latent space, thereby allowing the decoder to be dependent on the climate variables. There are three main contributions here. First, we demonstrate through extensive simulations that the proposed conditional XVAE accurately emulates spatial fields and recovers spatially and temporally varying extremal dependence with very low computational cost post training. Second, we provide a simple, scalable approach to detecting condition-driven shifts and whether the dependence structure is invariant to the conditioning variable. Third, when dependence is found to be condition-sensitive, the conditional XVAE supports counterfactual experiments allowing intervention on the climate covariate and propagating the associated change through the learned decoder to quantify differences in joint tail risk, co-occurrence ranges, and return metrics. To demonstrate the practical utility and performance of the model in real-world scenarios, we apply our method to analyze the monthly maximum Fire Weather Index (FWI) over eastern Australia from 2014 to 2024 conditioned on the El Ni\~{n}o/Southern Oscillation (ENSO) index.

#3 Canonical Tail Dependence for Soft Extremal Clustering of Multichannel Brain Signals

著者: Mara Sherlin Talento, Jordan Richards, Raphael Huser, Hernando Ombao

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.06435

要約:
We develop a novel characterization of extremal dependence between two cortical regions of the brain when its signals display extremely large amplitudes. We show that connectivity in the tails of the distribution reveals unique features of extreme events (e.g., seizures) that can help to identify their occurrence. Numerous studies have established that connectivity-based features are effective for discriminating brain states. Here, we demonstrate the advantage of the proposed approach: that tail connectivity provides additional discriminatory power, enabling more accurate identification of extreme-related events and improved seizure risk management. Common approaches in tail dependence modeling use pairwise summary measures or parametric models. However, these approaches do not identify channels that drive the maximal tail dependence between two groups of signals -- an information that is useful when analyzing electroencephalography of epileptic patients where specific channels are responsible for seizure occurrences. A familiar approach in traditional signal processing is canonical correlation, which we extend to the tails to develop a visualization of extremal channel-contributions. Through the tail pairwise dependence matrix (TPDM), we develop a computationally-efficient estimator for our canonical tail dependence measure. Our method is then used for accurate frequency-based soft clustering of neonates, distinguishing those with seizures from those without.

#4 Latent Nonlinear Denoising Score Matching for Enhanced Learning of Structured Distributions

著者: Kaichen Shen, Wei Zhu

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.06615

要約:
We present latent nonlinear denoising score matching (LNDSM), a novel training objective for score-based generative models that integrates nonlinear forward dynamics with the VAE-based latent SGM framework. This combination is achieved by reformulating the cross-entropy term using the approximate Gaussian transition induced by the Euler-Maruyama scheme. To ensure numerical stability, we identify and remove two zero-mean but variance exploding terms arising from small time steps. Experiments on variants of the MNIST dataset demonstrate that the proposed method achieves faster synthesis and enhanced learning of inherently structured distributions. Compared to benchmark structure-agnostic latent SGMs, LNDSM consistently attains superior sample quality and variability.

#5 ADAM Optimization with Adaptive Batch Selection

著者: Gyu Yeol Kim, Min-hwan Oh

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.06795

要約:
Adam is a widely used optimizer in neural network training due to its adaptive learning rate. However, because different data samples influence model updates to varying degrees, treating them equally can lead to inefficient convergence. To address this, a prior work proposed adapting the sampling distribution using a bandit framework to select samples adaptively. While promising, the bandit-based variant of Adam suffers from limited theoretical guarantees. In this paper, we introduce Adam with Combinatorial Bandit Sampling (AdamCB), which integrates combinatorial bandit techniques into Adam to resolve these issues. AdamCB is able to fully utilize feedback from multiple samples at once, enhancing both theoretical guarantees and practical performance. Our regret analysis shows that AdamCB achieves faster convergence than Adam-based methods including the previous bandit-based variant. Numerical experiments demonstrate that AdamCB consistently outperforms existing methods.

#6 Symmetric Aggregation of Conformity Scores for Efficient Uncertainty Sets

著者: Nabil Alami, Jad Zakharia, Souhaib Ben Taieb

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.06945

要約:
Access to multiple predictive models trained for the same task, whether in regression or classification, is increasingly common in many applications. Aggregating their predictive uncertainties to produce reliable and efficient uncertainty quantification is therefore a critical but still underexplored challenge, especially within the framework of conformal prediction (CP). While CP methods can generate individual prediction sets from each model, combining them into a single, more informative set remains a challenging problem. To address this, we propose SACP (Symmetric Aggregated Conformal Prediction), a novel method that aggregates nonconformity scores from multiple predictors. SACP transforms these scores into e-values and combines them using any symmetric aggregation function. This flexible design enables a robust, data-driven framework for selecting aggregation strategies that yield sharper prediction sets. We also provide theoretical insights that help justify the validity and performance of the SACP approach. Extensive experiments on diverse datasets show that SACP consistently improves efficiency and often outperforms state-of-the-art model aggregation baselines.

#7 PARIS: Pruning Algorithm via the Representer theorem for Imbalanced Scenarios

著者: Enrico Camporeale

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.06950

要約:
The challenge of \textbf{imbalanced regression} arises when standard Empirical Risk Minimization (ERM) biases models toward high-frequency regions of the data distribution, causing severe degradation on rare but high-impact ``tail'' events. Existing strategies uch as loss re-weighting or synthetic over-sampling often introduce noise, distort the underlying distribution, or add substantial algorithmic complexity. We introduce \textbf{PARIS} (Pruning Algorithm via the Representer theorem for Imbalanced Scenarios), a principled framework that mitigates imbalance by \emph{optimizing the training set itself}. PARIS leverages the representer theorem for neural networks to compute a \textbf{closed-form representer deletion residual}, which quantifies the exact change in validation loss caused by removing a single training point \emph{without retraining}. Combined with an efficient Cholesky rank-one downdating scheme, PARIS performs fast, iterative pruning that eliminates uninformative or performance-degrading samples. We use a real-world space weather example, where PARIS reduces the training set by up to 75\% while preserving or improving overall RMSE, outperforming re-weighting, synthetic oversampling, and boosting baselines. Our results demonstrate that representer-guided dataset pruning is a powerful, interpretable, and computationally efficient approach to rare-event regression.

#8 Statistical analysis of Inverse Entropy-regularized Reinforcement Learning

著者: Denis Belomestny, Alexey Naumov, Sergey Samsonov

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.06956

要約:
Inverse reinforcement learning aims to infer the reward function that explains expert behavior observed through trajectories of state--action pairs. A long-standing difficulty in classical IRL is the non-uniqueness of the recovered reward: many reward functions can induce the same optimal policy, rendering the inverse problem ill-posed. In this paper, we develop a statistical framework for Inverse Entropy-regularized Reinforcement Learning that resolves this ambiguity by combining entropy regularization with a least-squares reconstruction of the reward from the soft Bellman residual. This combination yields a unique and well-defined so-called least-squares reward consistent with the expert policy. We model the expert demonstrations as a Markov chain with the invariant distribution defined by an unknown expert policy $\pi^\star$ and estimate the policy by a penalized maximum-likelihood procedure over a class of conditional distributions on the action space. We establish high-probability bounds for the excess Kullback--Leibler divergence between the estimated policy and the expert policy, accounting for statistical complexity through covering numbers of the policy class. These results lead to non-asymptotic minimax optimal convergence rates for the least-squares reward function, revealing the interplay between smoothing (entropy regularization), model complexity, and sample size. Our analysis bridges the gap between behavior cloning, inverse reinforcement learning, and modern statistical learning theory.

#9 Learning Conditional Independence Differential Graphs From Time-Dependent Data

privacy

著者: Jitendra K Tugnait

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.06960

要約:
Estimation of differences in conditional independence graphs (CIGs) of two time series Gaussian graphical models (TSGGMs) is investigated where the two TSGGMs are known to have similar structure. The TSGGM structure is encoded in the inverse power spectral density (IPSD) of the time series. In several existing works, one is interested in estimating the difference in two precision matrices to characterize underlying changes in conditional dependencies of two sets of data consisting of independent and identically distributed (i.i.d.) observations. In this paper we consider estimation of the difference in two IPSDs to characterize the underlying changes in conditional dependencies of two sets of time-dependent data. Our approach accounts for data time dependencies unlike past work. We analyze a penalized D-trace loss function approach in the frequency domain for differential graph learning, using Wirtinger calculus. We consider both convex (group lasso) and non-convex (log-sum and SCAD group penalties) penalty/regularization functions. An alternating direction method of multipliers (ADMM) algorithm is presented to optimize the objective function. We establish sufficient conditions in a high-dimensional setting for consistency (convergence of the inverse power spectral density to true value in the Frobenius norm) and graph recovery. Both synthetic and real data examples are presented in support of the proposed approaches. In synthetic data examples, our log-sum-penalized differential time-series graph estimator significantly outperformed our lasso based differential time-series graph estimator which, in turn, significantly outperformed an existing lasso-penalized i.i.d. modeling approach, with $F_1$ score as the performance metric.

#10 Exact Synthetic Populations for Scalable Societal and Market Modeling

著者: Thierry Petit, Arnault Pachot

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.07306

要約:
We introduce a constraint-programming framework for generating synthetic populations that reproduce target statistics with high precision while enforcing full individual consistency. Unlike data-driven approaches that infer distributions from samples, our method directly encodes aggregated statistics and structural relations, enabling exact control of demographic profiles without requiring any microdata. We validate the approach on official demographic sources and study the impact of distributional deviations on downstream analyses. This work is conducted within the Pollitics project developed by Emotia, where synthetic populations can be queried through large language models to model societal behaviors, explore market and policy scenarios, and provide reproducible decision-grade insights without personal data.

#11 Machine learning in an expectation-maximisation framework for nowcasting

著者: Paul Wilsens, Katrien Antonio, Gerda Claeskens

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.07335

要約:
Decision making often occurs in the presence of incomplete information, leading to the under- or overestimation of risk. Leveraging the observable information to learn the complete information is called nowcasting. In practice, incomplete information is often a consequence of reporting or observation delays. In this paper, we propose an expectation-maximisation (EM) framework for nowcasting that uses machine learning techniques to model both the occurrence as well as the reporting process of events. We allow for the inclusion of covariate information specific to the occurrence and reporting periods as well as characteristics related to the entity for which events occurred. We demonstrate how the maximisation step and the information flow between EM iterations can be tailored to leverage the predictive power of neural networks and (extreme) gradient boosting machines (XGBoost). With simulation experiments, we show that we can effectively model both the occurrence and reporting of events when dealing with high-dimensional covariate information. In the presence of non-linear effects, we show that our methodology outperforms existing EM-based nowcasting frameworks that use generalised linear models in the maximisation step. Finally, we apply the framework to the reporting of Argentinian Covid-19 cases, where the XGBoost-based approach again is most performant.

#12 High-Dimensional Change Point Detection using Graph Spanning Ratio

著者: Youngwen Sun, Katerina Papagiannouli, Vladimir Spokoiny

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.07541

要約:
Inspired by graph-based methodologies, we introduce a novel graph-spanning algorithm designed to identify changes in both offline and online data across low to high dimensions. This versatile approach is applicable to Euclidean and graph-structured data with unknown distributions, while maintaining control over error probabilities. Theoretically, we demonstrate that the algorithm achieves high detection power when the magnitude of the change surpasses the lower bound of the minimax separation rate, which scales on the order of $\sqrt{nd}$. Our method outperforms other techniques in terms of accuracy for both Gaussian and non-Gaussian data. Notably, it maintains strong detection power even with small observation windows, making it particularly effective for online environments where timely and precise change detection is critical.

#13 On Conditional Independence Graph Learning From Multi-Attribute Gaussian Dependent Time Series

著者: Jitendra K. Tugnait

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.07557

要約:
Estimation of the conditional independence graph (CIG) of high-dimensional multivariate Gaussian time series from multi-attribute data is considered. Existing methods for graph estimation for such data are based on single-attribute models where one associates a scalar time series with each node. In multi-attribute graphical models, each node represents a random vector or vector time series. In this paper we provide a unified theoretical analysis of multi-attribute graph learning for dependent time series using a penalized log-likelihood objective function formulated in the frequency domain using the discrete Fourier transform of the time-domain data. We consider both convex (sparse-group lasso) and non-convex (log-sum and SCAD group penalties) penalty/regularization functions. We establish sufficient conditions in a high-dimensional setting for consistency (convergence of the inverse power spectral density to true value in the Frobenius norm), local convexity when using non-convex penalties, and graph recovery. We do not impose any incoherence or irrepresentability condition for our convergence results. We also empirically investigate selection of the tuning parameters based on the Bayesian information criterion, and illustrate our approach using numerical examples utilizing both synthetic and real data.

#14 $\phi$-test: Global Feature Selection and Inference for Shapley Additive Explanations

著者: Dongseok Kim, Hyoungsun Choi, Mohamed Jismy Aashik Rasool, Gisung Oh

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.07578

要約:
We propose $\phi$-test, a global feature-selection and significance procedure for black-box predictors that combines Shapley attributions with selective inference. Given a trained model and an evaluation dataset, $\phi$-test performs SHAP-guided screening and fits a linear surrogate on the screened features via a selection rule with a tractable selective-inference form. For each retained feature, it outputs a Shapley-based global score, a surrogate coefficient, and post-selection $p$-values and confidence intervals in a global feature-importance table. Experiments on real tabular regression tasks with tree-based and neural backbones suggest that $\phi$-test can retain much of the predictive ability of the original model while using only a few features and producing feature sets that remain fairly stable across resamples and backbone classes. In these settings, $\phi$-test acts as a practical global explanation layer linking Shapley-based importance summaries with classical statistical inference.

#15 Physics-Informed Neural Networks for Source Inversion and Parameters Estimation in Atmospheric Dispersion

著者: Brenda Anague, Bamdad Hosseini, Issa Karambal, Jean Medard Ngnotchouye

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.07755

要約:
Recent studies have shown the success of deep learning in solving forward and inverse problems in engineering and scientific computing domains, such as physics-informed neural networks (PINNs). In the fields of atmospheric science and environmental monitoring, estimating emission source locations is a central task that further relies on multiple model parameters that dictate velocity profiles and diffusion parameters. Estimating these parameters at the same time as emission sources from scarce data is a difficult task. In this work, we achieve this by leveraging the flexibility and generality of PINNs. We use a weighted adaptive method based on the neural tangent kernels to solve a source inversion problem with parameter estimation on the 2D and 3D advection-diffusion equations with unknown velocity and diffusion coefficients that may vary in space and time. Our proposed weighted adaptive method is presented as an extension of PINNs for forward PDE problems to a highly ill-posed source inversion and parameter estimation problem. The key idea behind our methodology is to attempt the joint recovery of the solution, the sources along with the unknown parameters, thereby using the underlying partial differential equation as a constraint that couples multiple unknown functional parameters, leading to more efficient use of the limited information in the measurements. We present various numerical experiments, using different types of measurements that model practical engineering systems, to show that our proposed method is indeed successful and robust to additional noise in the measurements.

#16 Distribution-informed Online Conformal Prediction

著者: Dongjian Hu, Junxi Wu, Shu-Tao Xia, Changliang Zou

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.07770

要約:
Conformal prediction provides a pivotal and flexible technique for uncertainty quantification by constructing prediction sets with a predefined coverage rate. Many online conformal prediction methods have been developed to address data distribution shifts in fully adversarial environments, resulting in overly conservative prediction sets. We propose Conformal Optimistic Prediction (COP), an online conformal prediction algorithm incorporating underlying data pattern into the update rule. Through estimated cumulative distribution function of non-conformity scores, COP produces tighter prediction sets when predictable pattern exists, while retaining valid coverage guarantees even when estimates are inaccurate. We establish a joint bound on coverage and regret, which further confirms the validity of our approach. We also prove that COP achieves distribution-free, finite-sample coverage under arbitrary learning rates and can converge when scores are $i.i.d.$. The experimental results also show that COP can achieve valid coverage and construct shorter prediction intervals than other baselines.

#17 From Tail Universality to Bernstein-von Mises: A Unified Statistical Theory of Semi-Implicit Variational Inference

著者: Sean Plummer

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.06107

要約:
Semi-implicit variational inference (SIVI) constructs approximate posteriors of the form $q(\theta) = \int k(\theta | z) r(dz)$, where the conditional kernel is parameterized and the mixing base is fixed and tractable. This paper develops a unified "approximation-optimization-statistics'' theory for such families. On the approximation side, we show that under compact L1-universality and a mild tail-dominance condition, semi-implicit families are dense in L1 and can achieve arbitrarily small forward Kullback-Leibler (KL) error. We also identify two sharp obstructions to global approximation: (i) an Orlicz tail-mismatch condition that induces a strictly positive forward-KL gap, and (ii) structural restrictions, such as non-autoregressive Gaussian kernels, that force "branch collapse'' in conditional distributions. For each obstruction we give a minimal structural modification that restores approximability. On the optimization side, we establish finite-sample oracle inequalities and prove that the empirical SIVI objectives L(K,n) $\Gamma$-converge to their population limit as n and K tend to infinity. These results give consistency of empirical maximizers, quantitative control of finite-K surrogate bias, and stability of the resulting variational posteriors. Combining the approximation and optimization analyses yields the first general end-to-end statistical theory for SIVI: we characterize precisely when SIVI can recover the target distribution, when it cannot, and how architectural and algorithmic choices govern the attainable asymptotic behavior.

#18 Interpretable Neural Approximation of Stochastic Reaction Dynamics with Guaranteed Reliability

著者: Quentin Badolle, Arthur Theuer, Zhou Fang, Ankit Gupta, Mustafa Khammash

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.06294

要約:
Stochastic Reaction Networks (SRNs) are a fundamental modeling framework for systems ranging from chemical kinetics and epidemiology to ecological and synthetic biological processes. A central computational challenge is the estimation of expected outputs across initial conditions and times, a task that is rarely solvable analytically and becomes computationally prohibitive with current methods such as Finite State Projection or the Stochastic Simulation Algorithm. Existing deep learning approaches offer empirical scalability, but provide neither interpretability nor reliability guarantees, limiting their use in scientific analysis and in applications where model outputs inform real-world decisions. Here we introduce DeepSKA, a neural framework that jointly achieves interpretability, guaranteed reliability, and substantial computational gains. DeepSKA yields mathematically transparent representations that generalise across states, times, and output functions, and it integrates this structure with a small number of stochastic simulations to produce unbiased, provably convergent, and dramatically lower-variance estimates than classical Monte Carlo. We demonstrate these capabilities across nine SRNs, including nonlinear and non-mass-action models with up to ten species, where DeepSKA delivers accurate predictions and orders-of-magnitude efficiency improvements. This interpretable and reliable neural framework offers a principled foundation for developing analogous methods for other Markovian systems, including stochastic differential equations.

#19 Entropic Confinement and Mode Connectivity in Overparameterized Neural Networks

著者: Luca Di Carlo, Chase Goddard, David J. Schwab

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.06297

要約:
Modern neural networks exhibit a striking property: basins of attraction in the loss landscape are often connected by low-loss paths, yet optimization dynamics generally remain confined to a single convex basin and rarely explore intermediate points. We resolve this paradox by identifying entropic barriers arising from the interplay between curvature variations along these paths and noise in optimization dynamics. Empirically, we find that curvature systematically rises away from minima, producing effective forces that bias noisy dynamics back toward the endpoints - even when the loss remains nearly flat. These barriers persist longer than energetic barriers, shaping the late-time localization of solutions in parameter space. Our results highlight the role of curvature-induced entropic forces in governing both connectivity and confinement in deep learning landscapes.

#20 Zero Generalization Error Theorem for Random Interpolators via Algebraic Geometry

著者: Naoki Yoshida, Isao Ishikawa, Masaaki Imaizumi

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.06347

要約:
We theoretically demonstrate that the generalization error of interpolators for machine learning models under teacher-student settings becomes 0 once the number of training samples exceeds a certain threshold. Understanding the high generalization ability of large-scale models such as deep neural networks (DNNs) remains one of the central open problems in machine learning theory. While recent theoretical studies have attributed this phenomenon to the implicit bias of stochastic gradient descent (SGD) toward well-generalizing solutions, empirical evidences indicate that it primarily stems from properties of the model itself. Specifically, even randomly sampled interpolators, which are parameters that achieve zero training error, have been observed to generalize effectively. In this study, under a teacher-student framework, we prove that the generalization error of randomly sampled interpolators becomes exactly zero once the number of training samples exceeds a threshold determined by the geometric structure of the interpolator set in parameter space. As a proof technique, we leverage tools from algebraic geometry to mathematically characterize this geometric structure.

#21 Optimizing Optimizers for Fast Gradient-Based Learning

著者: Jaerin Lee, Kyoung Mu Lee

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.06370

要約:
We lay the theoretical foundation for automating optimizer design in gradient-based learning. Based on the greedy principle, we formulate the problem of designing optimizers as maximizing the instantaneous decrease in loss. By treating an optimizer as a function that translates loss gradient signals into parameter motions, the problem reduces to a family of convex optimization problems over the space of optimizers. Solving these problems under various constraints not only recovers a wide range of popular optimizers as closed-form solutions, but also produces the optimal hyperparameters of these optimizers with respect to the problems at hand. This enables a systematic approach to design optimizers and tune their hyperparameters according to the gradient statistics that are collected during the training process. Furthermore, this optimization of optimization can be performed dynamically during training.

#22 Simultaneous Heterogeneity and Reduced-rank Learning for Multivariate Response Regression

著者: Jie Wu, Bo Zhang, Daoji Li, Zemin Zheng

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.06514

要約:
Heterogeneous data are now ubiquitous in many applications in which correctly identifying the subgroups from a heterogeneous population is critical. Although there is an increasing body of literature on subgroup detection, existing methods mainly focus on the univariate response setting. In this paper, we propose a joint heterogeneity and reduced-rank learning framework to simultaneously identify the subgroup structure and estimate the covariate effects for heterogeneous multivariate response regression. In particular, our approach uses rank-constrained pairwise fusion penalization and conducts the subgroup analysis without requiring prior knowledge regarding the individual subgroup memberships. We implement the proposed approach by an alternating direction method of multipliers (ADMM) algorithm and show its convergence. We also establish the asymptotic properties for the resulting estimators under mild and interpretable conditions. A predictive information criterion is proposed to select the rank of the coefficient matrix with theoretical support. The effectiveness of the proposed approach is demonstrated through simulation studies and a real data application.

#23 Hierarchical Clustering With Confidence

著者: Di Wu, Jacob Bien, Snigdha Panigrahi

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.06522

要約:
Agglomerative hierarchical clustering is one of the most widely used approaches for exploring how observations in a dataset relate to each other. However, its greedy nature makes it highly sensitive to small perturbations in the data, often producing different clustering results and making it difficult to separate genuine structure from spurious patterns. In this paper, we show how randomizing hierarchical clustering can be useful not just for measuring stability but also for designing valid hypothesis testing procedures based on the clustering results. We propose a simple randomization scheme together with a method for constructing a valid p-value at each node of the hierarchical clustering dendrogram that quantifies evidence against performing the greedy merge. Our test controls the Type I error rate, works with any hierarchical linkage without case-specific derivations, and simulations show it is substantially more powerful than existing selective inference approaches. To demonstrate the practical utility of our p-values, we develop an adaptive $\alpha$-spending procedure that estimates the number of clusters, with a probabilistic guarantee on overestimation. Experiments on simulated and real data show that this estimate yields powerful clustering and can be used, for example, to assess clustering stability across multiple runs of the randomized algorithm.

#24 Optimal and Diffusion Transports in Machine Learning

diffusion

著者: Gabriel Peyr\'e

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.06797

要約:
Several problems in machine learning are naturally expressed as the design and analysis of time-evolving probability distributions. This includes sampling via diffusion methods, optimizing the weights of neural networks, and analyzing the evolution of token distributions across layers of large language models. While the targeted applications differ (samples, weights, tokens), their mathematical descriptions share a common structure. A key idea is to switch from the Eulerian representation of densities to their Lagrangian counterpart through vector fields that advect particles. This dual view introduces challenges, notably the non-uniqueness of Lagrangian vector fields, but also opportunities to craft density evolutions and flows with favorable properties in terms of regularity, stability, and computational tractability. This survey presents an overview of these methods, with emphasis on two complementary approaches: diffusion methods, which rely on stochastic interpolation processes and underpin modern generative AI, and optimal transport, which defines interpolation by minimizing displacement cost. We illustrate how both approaches appear in applications ranging from sampling, neural network optimization, to modeling the dynamics of transformers for large language models.

#25 Prediction with Expert Advice under Local Differential Privacy

privacy

著者: Ben Jacobsen, Kassem Fawaz

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.06971

要約:
We study the classic problem of prediction with expert advice under the constraint of local differential privacy (LDP). In this context, we first show that a classical algorithm naturally satisfies LDP and then design two new algorithms that improve it: RW-AdaBatch and RW-Meta. For RW-AdaBatch, we exploit the limited-switching behavior induced by LDP to provide a novel form of privacy amplification that grows stronger on easier data, analogous to the shuffle model in offline learning. Drawing on the theory of random walks, we prove that this improvement carries essentially no utility cost. For RW-Meta, we develop a general method for privately selecting between experts that are themselves non-trivial learning algorithms, and we show that in the context of LDP this carries no extra privacy cost. In contrast, prior work has only considered data-independent experts. We also derive formal regret bounds that scale inversely with the degree of independence between experts. Our analysis is supplemented by evaluation on real-world data reported by hospitals during the COVID-19 pandemic; RW-Meta outperforms both the classical baseline and a state-of-the-art \textit{central} DP algorithm by 1.5-3$\times$ on the task of predicting which hospital will report the highest density of COVID patients each week.

#26 Latency-Response Theory Model: Evaluating Large Language Models via Response Accuracy and Chain-of-Thought Length

著者: Zhiyu Xu, Jia Liu, Yixin Wang, Yuqi Gu

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.07019

要約:
The proliferation of Large Language Models (LLMs) necessitates valid evaluation methods to provide guidance for both downstream applications and actionable future improvements. The Item Response Theory (IRT) model with Computerized Adaptive Testing has recently emerged as a promising framework for evaluating LLMs via their response accuracy. Beyond simple response accuracy, LLMs' chain of thought (CoT) lengths serve as a vital indicator of their reasoning ability. To leverage the CoT length information to assist the evaluation of LLMs, we propose the Latency-Response Theory (LaRT) model, which jointly models both the response accuracy and CoT length by introducing a key correlation parameter between the latent ability and the latent speed. We derive an efficient stochastic approximation Expectation-Maximization algorithm for parameter estimation. We establish rigorous identifiability results for the latent ability and latent speed parameters to ensure the statistical validity of their estimation. Through both theoretical asymptotic analyses and simulation studies, we demonstrate LaRT's advantages over IRT in terms of superior estimation accuracy and shorter confidence intervals for latent trait estimation. To evaluate LaRT in real data, we collect responses from diverse LLMs on popular benchmark datasets. We find that LaRT yields different LLM rankings than IRT and outperforms IRT across multiple key evaluation metrics including predictive power, item efficiency, ranking validity, and LLM evaluation efficiency. Code and data are available at https://github.com/Toby-X/Latency-Response-Theory-Model.

#27 Ideal Attribution and Faithful Watermarks for Language Models

intellectual property

著者: Min Jae Song, Kameron Shahabi

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.07038

要約:
We introduce ideal attribution mechanisms, a formal abstraction for reasoning about attribution decisions over strings. At the core of this abstraction lies the ledger, an append-only log of the prompt-response interaction history between a model and its user. Each mechanism produces deterministic decisions based on the ledger and an explicit selection criterion, making it well-suited to serve as a ground truth for attribution. We frame the design goal of watermarking schemes as faithful representation of ideal attribution mechanisms. This novel perspective brings conceptual clarity, replacing piecemeal probabilistic statements with a unified language for stating the guarantees of each scheme. It also enables precise reasoning about desiderata for future watermarking schemes, even when no current construction achieves them, since the ideal functionalities are specified first. In this way, the framework provides a roadmap that clarifies which guarantees are attainable in an idealized setting and worth pursuing in practice.

#28 Machine Learning-based Unfolding for Cross Section Measurements in the Presence of Nuisance Parameters

著者: Huanbiao Zhu, Krish Desai, Mikael Kuusela, Vinicius Mikuni, Benjamin Nachman, Larry Wasserman

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.07074

要約:
Statistically correcting measured cross sections for detector effects is an important step across many applications. In particle physics, this inverse problem is known as \textit{unfolding}. In cases with complex instruments, the distortions they introduce are often known only implicitly through simulations of the detector. Modern machine learning has enabled efficient simulation-based approaches for unfolding high-dimensional data. Among these, one of the first methods successfully deployed on experimental data is the \textsc{OmniFold} algorithm, a classifier-based Expectation-Maximization procedure. In practice, however, the forward model is only approximately specified, and the corresponding uncertainty is encoded through nuisance parameters. Building on the well-studied \textsc{OmniFold} algorithm, we show how to extend machine learning-based unfolding to incorporate nuisance parameters. Our new algorithm, called Profile \textsc{OmniFold}, is demonstrated using a Gaussian example as well as a particle physics case study using simulated data from the CMS Experiment at the Large Hadron Collider.

#29 DeepSVM: Learning Stochastic Volatility Models with Physics-Informed Deep Operator Networks

著者: Kieran A. Malandain, Selim Kalici, Hakob Chakhoyan

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.07162

要約:
Real-time calibration of stochastic volatility models (SVMs) is computationally bottlenecked by the need to repeatedly solve coupled partial differential equations (PDEs). In this work, we propose DeepSVM, a physics-informed Deep Operator Network (PI-DeepONet) designed to learn the solution operator of the Heston model across its entire parameter space. Unlike standard data-driven deep learning (DL) approaches, DeepSVM requires no labelled training data. Rather, we employ a hard-constrained ansatz that enforces terminal payoffs and static no-arbitrage conditions by design. Furthermore, we use Residual-based Adaptive Refinement (RAR) to stabilize training in difficult regions subject to high gradients. Overall, DeepSVM achieves a final training loss of $10^{-5}$ and predicts highly accurate option prices across a range of typical market dynamics. While pricing accuracy is high, we find that the model's derivatives (Greeks) exhibit noise in the at-the-money (ATM) regime, highlighting the specific need for higher-order regularization in physics-informed operator learning.

#30 A Bootstrap Perspective on Stochastic Gradient Descent

著者: Hongjian Lan, Yucong Liu, Florian Sch\"afer

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.07676

要約:
Machine learning models trained with \emph{stochastic} gradient descent (SGD) can generalize better than those trained with deterministic gradient descent (GD). In this work, we study SGD's impact on generalization through the lens of the statistical bootstrap: SGD uses gradient variability under batch sampling as a proxy for solution variability under the randomness of the data collection process. We use empirical results and theoretical analysis to substantiate this claim. In idealized experiments on empirical risk minimization, we show that SGD is drawn to parameter choices that are robust under resampling and thus avoids spurious solutions even if they lie in wider and deeper minima of the training loss. We prove rigorously that by implicitly regularizing the trace of the gradient covariance matrix, SGD controls the algorithmic variability. This regularization leads to solutions that are less sensitive to sampling noise, thereby improving generalization. Numerical experiments on neural network training show that explicitly incorporating the estimate of the algorithmic variability as a regularizer improves test performance. This fact supports our claim that bootstrap estimation underpins SGD's generalization advantages.

#31 Provable Long-Range Benefits of Next-Token Prediction

著者: Xinyuan Cao, Santosh S. Vempala

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.07818

要約:
Why do modern language models, trained to do well on next-word prediction, appear to generate coherent documents and capture long-range structure? Here we show that next-token prediction is provably powerful for learning longer-range structure, even with common neural network architectures. Specifically, we prove that optimizing next-token prediction over a Recurrent Neural Network (RNN) yields a model that closely approximates the training distribution: for held-out documents sampled from the training distribution, no algorithm of bounded description length limited to examining the next $k$ tokens, for any $k$, can distinguish between $k$ consecutive tokens of such documents and $k$ tokens generated by the learned language model following the same prefix. We provide polynomial bounds (in $k$, independent of the document length) on the model size needed to achieve such $k$-token indistinguishability, offering a complexity-theoretic explanation for the long-range coherence observed in practice.

#32 Alpha-VI DeepONet: A prior-robust variational Bayesian approach for enhancing DeepONets with uncertainty quantification

著者: Soban Nasir Lone, Subhayan De, Rajdip Nayek

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2408.00681

要約:
We introduce a novel deep operator network (DeepONet) framework that incorporates generalised variational inference (GVI) using R\'enyi's $\alpha$-divergence to learn complex operators while quantifying uncertainty. By incorporating Bayesian neural networks as the building blocks for the branch and trunk networks, our framework endows DeepONet with uncertainty quantification. The use of R\'enyi's $\alpha$-divergence, instead of the Kullback-Leibler divergence (KLD), commonly used in standard variational inference, mitigates issues related to prior misspecification that are prevalent in Variational Bayesian DeepONets. This approach offers enhanced flexibility and robustness. We demonstrate that modifying the variational objective function yields superior results in terms of minimising the mean squared error and improving the negative log-likelihood on the test set. Our framework's efficacy is validated across various mechanical systems, where it outperforms both deterministic and standard KLD-based VI DeepONets in predictive accuracy and uncertainty quantification. The hyperparameter $\alpha$, which controls the degree of robustness, can be tuned to optimise performance for specific problems. We apply this approach to a range of mechanics problems, including gravity pendulum, advection-diffusion, and diffusion-reaction systems. Our findings underscore the potential of $\alpha$-VI DeepONet to advance the field of data-driven operator learning and its applications in engineering and scientific domains.

#33 Learning Enhanced Ensemble Filters

著者: Eviatar Bach, Ricardo Baptista, Edoardo Calvello, Bohan Chen, Andrew Stuart

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2504.17836

要約:
The filtering distribution in hidden Markov models evolves according to the law of a mean-field model in state-observation space. The ensemble Kalman filter (EnKF) approximates this mean-field model with an ensemble of interacting particles, employing a Gaussian ansatz for the joint distribution of the state and observation at each observation time. These methods are robust, but the Gaussian ansatz limits accuracy. Here this shortcoming is addressed by using machine learning to map the joint predicted state and observation to the updated state estimate. The derivation of methods from a mean field formulation of the true filtering distribution suggests a single parametrization of the algorithm that can be deployed at different ensemble sizes. And we use a mean field formulation of the ensemble Kalman filter as an inductive bias for our architecture. To develop this perspective, in which the mean-field limit of the algorithm and finite interacting ensemble particle approximations share a common set of parameters, a novel form of neural operator is introduced, taking probability distributions as input: a measure neural mapping (MNM). A MNM is used to design a novel approach to filtering, the MNM-enhanced ensemble filter (MNMEF), which is defined in both the mean-field limit and for interacting ensemble particle approximations. The ensemble approach uses empirical measures as input to the MNM and is implemented using the set transformer, which is invariant to ensemble permutation and allows for different ensemble sizes. In practice fine-tuning of a small number of parameters, for specific ensemble sizes, further enhances the accuracy of the scheme. The promise of the approach is demonstrated by its superior root-mean-square-error performance relative to leading methods in filtering the Lorenz '96 and Kuramoto-Sivashinsky models.

#34 Generative Machine Learning for Multivariate Angular Simulation

著者: Jakob Benjamin Wessel, Callum J. R. Murphy-Barltrop, Emma S. Simpson

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2504.21505

要約:
With the recent development of new geometric and angular-radial frameworks for multivariate extremes, reliably simulating from angular variables in moderate-to-high dimensions is of increasing importance. Empirical approaches have the benefit of simplicity, and work reasonably well in low dimensions, but as the number of variables increases, they can lack the required flexibility and scalability. Classical parametric models for angular variables, such as the von Mises--Fisher distribution (vMF), provide an alternative. Exploiting finite mixtures of vMF distributions increases their flexibility, but there are cases where, without letting the number of mixture components grow considerably, a mixture model with a fixed number of components is not sufficient to capture the intricate features that can arise in data. Owing to their flexibility, generative deep learning methods are able to capture complex data structures; they therefore have the potential to be useful in the simulation of multivariate angular variables. In this paper, we introduce a range of deep learning approaches for this task, including generative adversarial networks, normalizing flows and flow matching. We assess their performance via a range of metrics, and make comparisons to the more classical approach of using a finite mixture of vMF distributions. The methods are also applied to a metocean data set, with diagnostics indicating strong performance, demonstrating the applicability of such techniques to real-world, complex data structures.

#35 SADA: Safe and Adaptive Aggregation of Multiple Black-Box Predictions in Semi-Supervised Learning

著者: Jiawei Shan, Zhifeng Chen, Yiming Dong, Yazhen Wang, Jiwei Zhao

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.21707

要約:
Semi-supervised learning (SSL) arises in practice when labeled data are scarce or expensive to obtain, while large quantities of unlabeled data are readily available. With the growing adoption of machine learning techniques, it has become increasingly feasible to generate multiple predicted labels using a variety of models and algorithms, including deep learning, large language models, and generative AI. In this paper, we propose a novel approach that safely and adaptively aggregates multiple black-box predictions of uncertain quality for both inference and prediction tasks. Our method provides two key guarantees: (i) it never performs worse than using the labeled data alone, regardless of the quality of the predictions; and (ii) if any one of the predictions (without knowing which one) perfectly fits the ground truth, the algorithm adaptively exploits this to achieve either a faster convergence rate or the semiparametric efficiency bound. We demonstrate the effectiveness of the proposed algorithm through small-scale simulations and two real-data analyses with distinct scientific goals. A user-friendly R package, sada, is provided to facilitate practical implementation.

#36 Deep Hedging Under Non-Convexity: Limitations and a Case for AlphaZero

著者: Matteo Maggiolo, Giuseppe Nuti, Miroslav \v{S}trupl, Oleg Szehr

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.01874

要約:
This paper examines replication portfolio construction in incomplete markets - a key problem in financial engineering with applications in pricing, hedging, balance sheet management, and energy storage planning. We model this as a two-player game between an investor and the market, where the investor makes strategic bets on future states while the market reveals outcomes. Inspired by the success of Monte Carlo Tree Search in stochastic games, we introduce an AlphaZero-based system and compare its performance to deep hedging - a widely used industry method based on gradient descent. Through theoretical analysis and experiments, we show that deep hedging struggles in environments where the optimal action-value function is not subject to convexity constraints - such as those involving non-convex transaction costs, capital constraints, or regulatory limitations - converging to local optima. We construct specific market environments to highlight these limitations and demonstrate that AlphaZero consistently finds near-optimal replication strategies. On the theoretical side, we establish a connection between deep hedging and convex optimization, suggesting that its effectiveness is contingent on convexity assumptions. Our experiments further suggest that AlphaZero is more sample-efficient - an important advantage in data-scarce, overfitting-prone derivative markets.

#37 Model Monitoring: A General Framework with an Application to Non-life Insurance Pricing

著者: Alexej Brauer, Paul Menzel, Mario V. W\"uthrich

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.04556

要約:
Maintaining the predictive performance of pricing models is challenging when insurance portfolios and data-generating mechanisms evolve over time. Focusing on non-life insurance, we adopt the concept-drift terminology from machine learning and distinguish virtual drift from real concept drift in an actuarial setting. Methodologically, we (i) formalize deviance loss and Murphy's score decomposition to assess global and local auto-calibration; (ii) study the Gini score as a rank-based performance measure, derive its asymptotic distribution, and develop a consistent bootstrap estimator of its asymptotic variance; and (iii) combine these results into a statistically grounded, model-agnostic monitoring framework that integrates a Gini-based ranking drift test with global and local auto-calibration tests. An application to a modified motor insurance portfolio with controlled concept-drift scenarios illustrates how the framework guides decisions on refitting or recalibrating pricing models.

#38 In-Context Learning Is Provably Bayesian Inference: A Generalization Theory for Meta-Learning

著者: Tomoya Wakayama, Taiji Suzuki

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.10981

要約:
This paper develops a finite-sample statistical theory for in-context learning (ICL), analyzed within a meta-learning framework that accommodates mixtures of diverse task types. We introduce a principled risk decomposition that separates the total ICL risk into two orthogonal components: Bayes Gap and Posterior Variance. The Bayes Gap quantifies how well the trained model approximates the Bayes-optimal in-context predictor. For a uniform-attention Transformer, we derive a non-asymptotic upper bound on this gap, which explicitly clarifies the dependence on the number of pretraining prompts and their context length. The Posterior Variance is a model-independent risk representing the intrinsic task uncertainty. Our key finding is that this term is determined solely by the difficulty of the true underlying task, while the uncertainty arising from the task mixture vanishes exponentially fast with only a few in-context examples. Together, these results provide a unified view of ICL: the Transformer selects the optimal meta-algorithm during pretraining and rapidly converges to the optimal algorithm for the true task at test time.

#39 The Optimal Approximation Factor in Density Estimation

著者: Olivier Bousquet, Daniel Kane, Shay Moran

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/1902.05876

要約:
Consider the following problem: given two arbitrary densities $q_1,q_2$ and a sample-access to an unknown target density $p$, find which of the $q_i$'s is closer to $p$ in total variation. A remarkable result due to Yatracos shows that this problem is tractable in the following sense: there exists an algorithm that uses $O(\epsilon^{-2})$ samples from $p$ and outputs~$q_i$ such that with high probability, $TV(q_i,p) \leq 3\cdot\mathsf{opt} + \epsilon$, where $\mathsf{opt}= \min\{TV(q_1,p),TV(q_2,p)\}$. Moreover, this result extends to any finite class of densities $\mathcal{Q}$: there exists an algorithm that outputs the best density in $\mathcal{Q}$ up to a multiplicative approximation factor of 3. We complement and extend this result by showing that: (i) the factor 3 can not be improved if one restricts the algorithm to output a density from $\mathcal{Q}$, and (ii) if one allows the algorithm to output arbitrary densities (e.g.\ a mixture of densities from $\mathcal{Q}$), then the approximation factor can be reduced to 2, which is optimal. In particular this demonstrates an advantage of improper learning over proper in this setup. We develop two approaches to achieve the optimal approximation factor of 2: an adaptive one and a static one. Both approaches are based on a geometric point of view of the problem and rely on estimating surrogate metrics to the total variation. Our sample complexity bounds exploit techniques from {\it Adaptive Data Analysis}.

#40 A Unified Framework for Estimation of High-dimensional Conditional Factor Models

著者: Qihui Chen

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2209.00391

要約:
This paper presents a general framework for estimating high-dimensional conditional latent factor models via constrained nuclear norm regularization. We establish large sample properties of the estimators and provide efficient algorithms for their computation. To improve practical applicability, we propose a cross-validation procedure for selecting the regularization parameter. Our framework unifies the estimation of various conditional factor models, enabling the derivation of new asymptotic results while addressing limitations of existing methods, which are often model-specific or restrictive. Empirical analyses of the cross section of individual US stock returns suggest that imposing homogeneity improves the model's out-of-sample predictability, with our new method outperforming existing alternatives.

#41 Hidden Minima in Two-Layer ReLU Networks

model extraction

著者: Yossi Arjevani

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2312.16819

要約:
We consider the optimization problem arising from fitting two-layer ReLU networks with $d$ inputs under the square loss, where labels are generated by a target network. Two infinite families of spurious minima have recently been identified: one whose loss vanishes as $d \to \infty$, and another whose loss remains bounded away from zero. The latter are nevertheless avoided by vanilla SGD, and thus hidden, motivating the search for analytic properties distinguishing the two types. Perhaps surprisingly, the Hessian spectra of hidden and non-hidden minima agree up to terms of order $O(d^{-1/2})$, providing limited explanatory power. Consequently, our analysis of hidden minima proceeds instead via curves along which the loss is minimized or maximized. The main result is that arcs emanating from hidden minima differ, characteristically, by their structure and symmetry, precisely on account of the $O(d^{-1/2})$-eigenvalue terms absent from previous analyses.

#42 Theoretical Guarantees for the Subspace-Constrained Tyler's Estimator

著者: Gilad Lerman, Teng Zhang

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2403.18658

要約:
This work analyzes the subspace-constrained Tyler's estimator (STE), a method designed to recover a low-dimensional subspace from a dataset that may be heavily corrupted by outliers. The STE has previously been shown to be competitive for fundamental computer vision problems. We assume a weak inlier-outlier model and allow the inlier fraction to fall below the threshold at which robust subspace recovery becomes computationally hard. We show that, in this setting, if the initialization of STE satisfies a certain condition, then STE-which is computationally efficient-can effectively recover the underlying subspace. Furthermore, we establish approximate recovery guarantees for STE in the presence of noisy inliers. Finally, under the asymptotic generalized haystack model, we demonstrate that STE initialized with Tyler's M-estimator (TME) recovers the subspace even when the inlier fraction is too small for TME to succeed on its own.

#43 Covariate-Elaborated Robust Partial Information Transfer with Conditional Spike-and-Slab Prior

著者: Ruqian Zhang, Yijiao Zhang, Annie Qu, Zhongyi Zhu, Juan Shen

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2404.03764

要約:
The popularity of transfer learning stems from the fact that it can borrow information from useful auxiliary datasets. Existing statistical transfer learning methods usually adopt a global similarity measure between the source data and the target data, which may lead to inefficiency when only partial information is shared. In this paper, we propose a novel Bayesian transfer learning method named ``CONCERT'' to allow robust partial information transfer for high-dimensional data analysis. A conditional spike-and-slab prior is introduced in the joint distribution of target and source parameters for information transfer. By incorporating covariate-specific priors, we can characterize partial similarities and integrate source information collaboratively to improve the performance on the target. In contrast to existing work, the CONCERT is a one-step procedure which achieves variable selection and information transfer simultaneously. We establish variable selection consistency, as well as estimation and prediction error bounds for CONCERT. Our theory demonstrates the covariate-specific benefit of transfer learning. To ensure the scalability of the algorithm, we adopt the variational Bayes framework to facilitate implementation. Extensive experiments and two real data applications showcase the validity and advantages of CONCERT over existing cutting-edge transfer learning methods.

#44 Evaluating Model Performance Under Worst-case Subpopulations

著者: Mike Li, Daksh Mittal, Hongseok Namkoong, Shangzhou Xia

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2407.01316

要約:
The performance of ML models degrades when the training population is different from that seen under operation. Towards assessing distributional robustness, we study the worst-case performance of a model over all subpopulations of a given size, defined with respect to core attributes Z. This notion of robustness can consider arbitrary (continuous) attributes Z, and automatically accounts for complex intersectionality in disadvantaged groups. We develop a scalable yet principled two-stage estimation procedure that can evaluate the robustness of state-of-the-art models. We prove that our procedure enjoys several finite-sample convergence guarantees, including dimension-free convergence. Instead of overly conservative notions based on Rademacher complexities, our evaluation error depends on the dimension of Z only through the out-of-sample error in estimating the performance conditional on Z. On real datasets, we demonstrate that our method certifies the robustness of a model and prevents deployment of unreliable models.

#45 Data Taggants: Dataset Ownership Verification via Harmless Targeted Data Poisoning

backdoor

著者: Wassim Bouaziz, Nicolas Usunier, El-Mahdi El-Mhamdi

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2410.09101

要約:
Dataset ownership verification, the process of determining if a dataset is used in a model's training data, is necessary for detecting unauthorized data usage and data contamination. Existing approaches, such as backdoor watermarking, rely on inducing a detectable behavior into the trained model on a part of the data distribution. However, these approaches have limitations, as they can be harmful to the model's performances or require unpractical access to the model's internals. Most importantly, previous approaches lack guarantee against false positives. This paper introduces data taggants, a novel non-backdoor dataset ownership verification technique. Our method uses pairs of out-of-distribution samples and random labels as secret keys, and leverages clean-label targeted data poisoning to subtly alter a dataset, so that models trained on it respond to the key samples with the corresponding key labels. The keys are built as to allow for statistical certificates with black-box access only to the model. We validate our approach through comprehensive and realistic experiments on ImageNet1k using ViT and ResNet models with state-of-the-art training recipes. Our findings demonstrate that data taggants can reliably detect models trained on the protected dataset with high confidence, without compromising validation accuracy, and show their superiority over backdoor watermarking. We demonstrate the stealthiness and robustness of our method against various defense mechanisms.

#46 FIT-GNN: Faster Inference Time for GNNs that 'FIT' in Memory Using Coarsening

著者: Shubhajit Roy, Hrriday Ruparel, Kishan Ved, Anirban Dasgupta

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2410.15001

要約:
Scalability of Graph Neural Networks (GNNs) remains a significant challenge. To tackle this, methods like coarsening, condensation, and computation trees are used to train on a smaller graph, resulting in faster computation. Nonetheless, prior research has not adequately addressed the computational costs during the inference phase. This paper presents a novel approach to improve the scalability of GNNs by reducing computational burden during the inference phase using graph coarsening. We demonstrate two different methods -- Extra Nodes and Cluster Nodes. Our study extends the application of graph coarsening for graph-level tasks, including graph classification and graph regression. We conduct extensive experiments on multiple benchmark datasets to evaluate the performance of our approach. Our results show that the proposed method achieves orders of magnitude improvements in single-node inference time compared to traditional approaches. Furthermore, it significantly reduces memory consumption for node and graph classification and regression tasks, enabling efficient training and inference on low-resource devices where conventional methods are impractical. Notably, these computational advantages are achieved while maintaining competitive performance relative to baseline models.

#47 A Particle Algorithm for Mean-Field Variational Inference

著者: Qiang Du, Kaizheng Wang, Edith Zhang, Chenyang Zhong

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2412.20385

要約:
Variational inference is a fast and scalable alternative to Markov chain Monte Carlo and has been widely applied to posterior inference tasks in statistics and machine learning. A traditional approach for implementing mean-field variational inference (MFVI) is coordinate ascent variational inference (CAVI), which relies crucially on parametric assumptions on complete conditionals. We introduce a novel particle-based algorithm for MFVI, named PArticle VI (PAVI), for nonparametric mean-field approximation. We obtain non-asymptotic error bounds for our algorithm. To our knowledge, this is the first end-to-end guarantee for particle-based MFVI.

#48 Flatten Graphs as Sequences: Transformers are Scalable Graph Generators

著者: Dexiong Chen, Markus Krimmel, Karsten Borgwardt

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2502.02216

要約:
We introduce AutoGraph, a scalable autoregressive model for attributed graph generation using decoder-only transformers. By flattening graphs into random sequences of tokens through a reversible process, AutoGraph enables modeling graphs as sequences without relying on additional node features that are expensive to compute, in contrast to diffusion-based approaches. This results in sampling complexity and sequence lengths that scale optimally linearly with the number of edges, making it scalable and efficient for large, sparse graphs. A key success factor of AutoGraph is that its sequence prefixes represent induced subgraphs, creating a direct link to sub-sentences in language modeling. Empirically, AutoGraph achieves state-of-the-art performance on synthetic and molecular benchmarks, with up to 100x faster generation and 3x faster training than leading diffusion models. It also supports substructure-conditioned generation without fine-tuning and shows promising transferability, bridging language modeling and graph generation to lay the groundwork for graph foundation models. Our code is available at https://github.com/BorgwardtLab/AutoGraph.

#49 Stein Discrepancy for Unsupervised Domain Adaptation

著者: Anneke von Seeger, Dongmian Zou, Gilad Lerman

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2502.03587

要約:
Unsupervised domain adaptation (UDA) aims to improve model performance on an unlabeled target domain using a related, labeled source domain. A common approach aligns source and target feature distributions by minimizing a distance between them, often using symmetric measures such as maximum mean discrepancy (MMD). However, these methods struggle when target data is scarce. We propose a novel UDA framework that leverages Stein discrepancy, an asymmetric measure that depends on the target distribution only through its score function, making it particularly suitable for low-data target regimes. Our proposed method has kernelized and adversarial forms and supports flexible modeling of the target distribution via Gaussian, GMM, or VAE models. We derive a generalization bound on the target error and a convergence rate for the empirical Stein discrepancy in the two-sample setting. Empirically, our method consistently outperforms prior UDA approaches under limited target data across multiple benchmarks.

#50 Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning

著者: Shenao Zhang, Yaqing Wang, Yinxiao Liu, Tianqi Liu, Peter Grabowski, Eugene Ie, Zhaoran Wang, Yunxuan Li

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.20561

要約:
Large Language Models (LLMs) trained via Reinforcement Learning (RL) have exhibited strong reasoning capabilities and emergent reflective behaviors, such as rethinking and error correction, as a form of in-context exploration. However, the Markovian policy obtained from conventional RL training does not give rise to reflective exploration behaviors since the policy depends on the history only through the state and therefore has no incentive to enrich identical states with additional context. Instead, RL exploration is only useful during training to learn the optimal policy in a trial-and-error manner. Therefore, it remains unclear whether reflective reasoning will emerge during RL, or why it is beneficial. To remedy this, we recast reflective exploration within a Bayesian RL framework, which optimizes the expected return under a posterior distribution over Markov decision processes induced by the training data. This Bayesian formulation admits uncertainty-adaptive policies that, through belief updates, naturally incentivize information-gathering actions and induce self-reflection behaviors. Our resulting algorithm, BARL, instructs the LLM to stitch and switch strategies based on the observed outcomes, offering principled guidance on when and how the model should reflectively explore. Empirical results on both synthetic and mathematical reasoning tasks demonstrate that BARL outperforms conventional RL approaches, achieving superior test-time performance and token efficiency. Our code is available at https://github.com/shenao-zhang/BARL.

#51 Learning where to learn: Training data distribution optimization for scientific machine learning

著者: Nicolas Guerra, Nicholas H. Nelsen, Yunan Yang

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.21626

要約:
In scientific machine learning, models are routinely deployed with parameter values or boundary conditions far from those used in training. This paper studies the learning-where-to-learn problem of designing a training data distribution that minimizes average prediction error across a family of deployment regimes. A theoretical analysis shows how the training distribution shapes deployment accuracy. This motivates two adaptive algorithms based on bilevel or alternating optimization in the space of probability measures. Discretized implementations using parametric distribution classes or nonparametric particle-based gradient flows deliver optimized training distributions that outperform nonadaptive designs. Once trained, the resulting models exhibit improved sample complexity and robustness to distribution shift. This framework unlocks the potential of principled data acquisition for learning functions and solution operators of partial differential equations.

#52 LLMs Judging LLMs: A Simplex Perspective

著者: Patrick Vossler, Fan Xia, Yifan Mai, Adarsh Subbaswamy, Jean Feng

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.21972

要約:
Given the challenge of automatically evaluating free-form outputs from large language models (LLMs), an increasingly common solution is to use LLMs themselves as the judging mechanism, without any gold-standard scores. Implicitly, this practice accounts for only sampling variability (aleatoric uncertainty) and ignores uncertainty about judge quality (epistemic uncertainty). While this is justified if judges are perfectly accurate, it is unclear when such an approach is theoretically valid and practically robust. We study these questions for the task of ranking LLM candidates from a novel geometric perspective: for $M$-level scoring systems, both LLM judges and candidates can be represented as points on an $(M-1)$-dimensional probability simplex, where geometric concepts (e.g., triangle areas) correspond to key ranking concepts. This perspective yields intuitive theoretical conditions and visual proofs for when rankings are identifiable; for instance, we provide a formal basis for the ``folk wisdom'' that LLM judges are more effective for two-level scoring ($M=2$) than multi-level scoring ($M>2$). Leveraging the simplex, we design geometric Bayesian priors that encode epistemic uncertainty about judge quality and vary the priors to conduct sensitivity analyses. Experiments on LLM benchmarks show that rankings based solely on LLM judges are robust in many but not all datasets, underscoring both their widespread success and the need for caution. Our Bayesian method achieves substantially higher coverage rates than existing procedures, highlighting the importance of modeling epistemic uncertainty.

#53 Emergent Granger Causality in Neural Networks: Can Prediction Alone Reveal Structure?

著者: Malik Shahid Sultan, Hernando Ombao, Maurizio Filippone

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2506.20347

要約:
Granger Causality (GC) offers an elegant statistical framework to study the association between multivariate time series data. Vector autoregressive models (VAR) are simple and easy to fit, but have limited application because of their inherent inability to capture more complex (e.g., non-linear) associations. Numerous attempts have already been made in the literature that exploit the functional approximation power of deep neural networks (DNNs) for GC. However, these methods treat GC as a variable selection problem. We present a novel paradigm for investigating the learned GC from a single neural network used for joint modeling of all components of multivariate time series data, which is essentially linked with prediction and assessing the distribution shift in residuals. A deep learning model, with proper regularization, may learn the true GC structure when jointly used for all components of the time series when there is sufficient training data. We propose to uncover the learned GC structure by comparing the model uncertainty or distribution of the residuals when the past of everything is used as compared to the one where a specific time series component is dropped from the model. We also compare the effect of input layer dropout on the ability of a neural network to learn GC. We show that a well-regularized model can learn the true GC structure from the data without explicitly adding terms in the loss function that guide the model to select variables or perform sparse regression under specific settings. We also provide a comparison of deep learning architectures such as CNN, LSTM and transformer models on their ability to discover Granger Causality. The numerical experiments demonstrate that, compared to sparse regression models, a simple joint model is a strong baseline for learning the true GC which has the advantage that it does not require tuning of many extra hyper-parameters.

#54 Stochastic Approximation with Block Coordinate Optimal Stepsizes

著者: Tao Jiang, Lin Xiao

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2507.08963

要約:
We consider stochastic approximation with block-coordinate stepsizes and propose adaptive stepsize rules that aim to minimize the expected distance from the next iterate to an (unknown) target point. These stepsize rules employ online estimates of the second moment of the search direction along each block coordinate. The popular Adam algorithm can be interpreted as a variant with a specific estimator. By leveraging a simple conditional estimator, we derive a new method that obtains competitive performance against Adam but requires less memory and fewer hyper-parameters. We prove that this family of methods converges almost surely to a small neighborhood of the target point, and the radius of the neighborhood depends on the bias and variance of the second-moment estimator. Our analysis relies on a simple aiming condition that assumes neither convexity nor smoothness, thus has broad applicability.

#55 A Simple Approximate Bayesian Inference Neural Surrogate for Stochastic Petri Net Models

著者: Bright Kwaku Manu, Trevor Reckell, Beckett Sterner, Petar Jevtic

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2507.10714

要約:
Stochastic Petri Nets (SPNs) are an increasingly popular tool of choice for modeling discrete-event dynamics in areas such as epidemiology and systems biology, yet their parameter estimation remains challenging in general and in particular when transition rates depend on external covariates and explicit likelihoods are unavailable. We introduce a neural-surrogate (neural-network-based approximation of the posterior distribution) framework that predicts the coefficients of known covariate-dependent rate functions directly from noisy, partially observed token trajectories. Our model employs a lightweight 1D Convolutional Residual Network trained end-to-end on Gillespie-simulated SPN realizations, learning to invert system dynamics under realistic conditions of event dropout. During inference, Monte Carlo dropout provides calibrated uncertainty bounds together with point estimates. On synthetic SPNs with $10\%$ missing events, our surrogate recovers rate-function coefficients with an $RMSE = 0.043$ and substantially runs faster than traditional Bayesian approaches. These results demonstrate that data-driven, likelihood-free surrogates can enable accurate, robust, and real-time parameter recovery in complex, partially observed discrete-event systems.

#56 Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

著者: Tomek Korbak, Mikita Balesni, Elizabeth Barnes, Yoshua Bengio, Joe Benton, Joseph Bloom, Mark Chen, Alan Cooney, Allan Dafoe, Anca Dragan, Scott Emmons, Owain Evans, David Farhi, Ryan Greenblatt, Dan Hendrycks, Marius Hobbhahn, Evan Hubinger, Geoffrey Irving, Erik Jenner, Daniel Kokotajlo, Victoria Krakovna, Shane Legg, David Lindner, David Luan, Aleksander M\k{a}dry, Julian Michael, Neel Nanda, Dave Orr, Jakub Pachocki, Ethan Perez, Mary Phuong, Fabien Roger, Joshua Saxe, Buck Shlegeris, Mart\'in Soto, Eric Steinberger, Jasmine Wang, Wojciech Zaremba, Bowen Baker, Rohin Shah, Vlad Mikulik

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2507.11473

要約:
AI systems that "think" in human language offer a unique opportunity for AI safety: we can monitor their chains of thought (CoT) for the intent to misbehave. Like all other known AI oversight methods, CoT monitoring is imperfect and allows some misbehavior to go unnoticed. Nevertheless, it shows promise and we recommend further research into CoT monitorability and investment in CoT monitoring alongside existing safety methods. Because CoT monitorability may be fragile, we recommend that frontier model developers consider the impact of development decisions on CoT monitorability.

#57 Density Operator Expectation Maximization

著者: Adit Vishnu, Abhay Shastry, Dhruva Kashyap, Chiranjib Bhattacharyya

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2507.22786

要約:
Machine learning with density operators, the mathematical foundation of quantum mechanics, is gaining prominence with rapid advances in quantum computing. Generative models based on density operators cannot yet handle tasks that are routinely handled by probabilistic models. The progress of latent variable models, a broad and influential class of probabilistic unsupervised models, was driven by the Expectation-Maximization framework. Deriving such a framework for density operators is challenging due to the non-commutativity of operators. To tackle this challenge, an inequality arising from the monotonicity of relative entropy is demonstrated to serve as an evidence lower bound for density operators. A minorant-maximization perspective on this bound leads to Density Operator Expectation Maximization (DO-EM), a general framework for training latent variable models defined through density operators. Through an information-geometric argument, the Expectation step in DO-EM is shown to be the Petz recovery map. The DO-EM algorithm is applied to Quantum Restricted Boltzmann Machines, adapting Contrastive Divergence to approximate the Maximization step gradient. Quantum interleaved Deep Boltzmann Machines and Quantum Gaussian-Bernoulli Restricted Boltzmann Machines, new models introduced in this work, outperform their probabilistic counterparts on generative tasks when trained with similar computational resources and identical hyperparameters.

#58 Efficient Approximate Posterior Sampling with Annealed Langevin Monte Carlo

著者: Advait Parulekar, Litu Rout, Karthikeyan Shanmugam, Sanjay Shakkottai

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2508.07631

要約:
We study the problem of posterior sampling in the context of score based generative models. We have a trained score network for a prior $p(x)$, a measurement model $p(y|x)$, and are tasked with sampling from the posterior $p(x|y)$. Prior work has shown this to be intractable in KL (in the worst case) under well-accepted computational hardness assumptions. Despite this, popular algorithms for tasks such as image super-resolution, stylization, and reconstruction enjoy empirical success. Rather than establishing distributional assumptions or restricted settings under which exact posterior sampling is tractable, we view this as a more general "tilting" problem of biasing a distribution towards a measurement. Under minimal assumptions, we show that one can tractably sample from a distribution that is simultaneously close to the posterior of a noised prior in KL divergence and the true posterior in Fisher divergence. Intuitively, this combination ensures that the resulting sample is consistent with both the measurement and the prior. To the best of our knowledge these are the first formal results for (approximate) posterior sampling in polynomial time.

#59 Multi-head Transformers Provably Learn Symbolic Multi-step Reasoning via Gradient Descent

著者: Tong Yang, Yu Huang, Yingbin Liang, Yuejie Chi

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2508.08222

要約:
Transformers have demonstrated remarkable capabilities in multi-step reasoning tasks. However, understandings of the underlying mechanisms by which they acquire these abilities through training remain limited, particularly from a theoretical standpoint. This work investigates how transformers learn to solve symbolic multi-step reasoning problems through chain-of-thought processes, focusing on path-finding in trees. We analyze two intertwined tasks: a backward reasoning task, where the model outputs a path from a goal node to the root, and a more complex forward reasoning task, where the model implements two-stage reasoning by first identifying the goal-to-root path and then reversing it to produce the root-to-goal path. Our theoretical analysis, grounded in the dynamics of gradient descent, shows that trained one-layer transformers can provably solve both tasks with generalization guarantees to unseen trees. In particular, our multi-phase training dynamics for forward reasoning elucidate how different attention heads learn to specialize and coordinate autonomously to solve the two subtasks in a single autoregressive path. These results provide a mechanistic explanation of how trained transformers can implement sequential algorithmic procedures. Moreover, they offer insights into the emergence of reasoning abilities, suggesting that when tasks are structured to take intermediate chain-of-thought steps, even shallow multi-head transformers can effectively solve problems that would otherwise require deeper architectures.

#60 A self-supervised learning approach for denoising autoregressive models with additive noise: finite and infinite variance cases

著者: Sayantan Banerjee, Agnieszka Wylomanska, Sundar S

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2508.12970

要約:
The autoregressive time series model is a popular second-order stationary process, modeling a wide range of real phenomena. However, in applications, autoregressive signals are often corrupted by additive noise. Further, the autoregressive process and the corruptive noise may be highly impulsive, stemming from an infinite-variance distribution. The model estimation techniques that account for additional noise tend to show reduced efficacy when there is very strong noise present in the data, especially when the noise is heavy-tailed. In this paper, we propose a novel self-supervised learning method to denoise the additive noise-corrupted autoregressive model. Our approach is motivated by recent work in computer vision and does not require full knowledge of the noise distribution. We use the proposed method to recover exemplary finite- and infinite-variance autoregressive signals, namely, Gaussian and alpha-stable distributed signals, respectively, from their noise-corrupted versions. The simulation study conducted on both synthetic and semi-synthetic data demonstrates strong denoising performance of our method compared to several baseline methods, particularly when the corruption is significant and impulsive in nature. Finally, we apply the presented methodology to forecast the pure autoregressive signal from the noise-corrupted data.

#61 Staying on the Manifold: Geometry-Aware Noise Injection

著者: Albert Kj{\o}ller Jacobsen, Johanna Marie Gegenfurtner, Georgios Arvanitidis

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.20201

要約:
It has been shown that perturbing the input during training implicitly regularises the gradient of the learnt function, leading to smoother models and enhancing generalisation. However, previous research mostly considered the addition of ambient noise in the input space, without considering the underlying structure of the data. In this work, we propose several strategies of adding geometry-aware input noise that accounts for the lower dimensional manifold the input space inhabits. We start by projecting ambient Gaussian noise onto the tangent space of the manifold. In a second step, the noise sample is mapped on the manifold via the associated geodesic curve. We also consider Brownian motion noise, which moves in random steps along the manifold. We show that geometry-aware noise leads to improved generalisation and robustness to hyperparameter selection on highly curved manifolds, while performing at least as well as training without noise on simpler manifolds. Our proposed framework extends to data manifolds approximated by generative models and we observe similar trends on the MNIST digits dataset.

#62 Closed-form $\ell_r$ norm scaling with data for overparameterized linear regression and diagonal linear networks under $\ell_p$ bias

著者: Shuofeng Zhang, Ard Louis

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.21181

要約:
For overparameterized linear regression with isotropic Gaussian design and minimum-$\ell_p$ interpolator $p\in(1,2]$, we give a unified, high-probability characterization for the scaling of the family of parameter norms $ \\{ \lVert \widehat{w_p} \rVert_r \\}_{r \in [1,p]} $ with sample size. We solve this basic, but unresolved question through a simple dual-ray analysis, which reveals a competition between a signal *spike* and a *bulk* of null coordinates in $X^\top Y$, yielding closed-form predictions for (i) a data-dependent transition $n_\star$ (the "elbow"), and (ii) a universal threshold $r_\star=2(p-1)$ that separates $\lVert \widehat{w_p} \rVert_r$'s which plateau from those that continue to grow with an explicit exponent. This unified solution resolves the scaling of *all* $\ell_r$ norms within the family $r\in [1,p]$ under $\ell_p$-biased interpolation, and explains in one picture which norms saturate and which increase as $n$ grows. We then study diagonal linear networks (DLNs) trained by gradient descent. By calibrating the initialization scale $\alpha$ to an effective $p_{\mathrm{eff}}(\alpha)$ via the DLN separable potential, we show empirically that DLNs inherit the same elbow/threshold laws, providing a predictive bridge between explicit and implicit bias. Given that many generalization proxies depend on $\lVert \widehat {w_p} \rVert_r$, our results suggest that their predictive power will depend sensitively on which $l_r$ norm is used.

#63 AuON: A Linear-time Alternative to Orthogonal Momentum Updates

著者: Dipan Maity

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.24320

要約:
Orthogonal momentum gradient updates have emerged to overcome the limitations of vector-based optimizers like Adam. The vector-based optimizer Adam suffers from high memory costs and ill-conditioned momentum gradient updates. However, traditional Orthogonal momentum approaches, such as SVD/QR decomposition, suffer from high computational and memory costs and underperform compared to well-tuned SGD with momentum.Recent advances, such as Muon, improve efficiency by applying momentum before orthogonalization and approximate orthogonal matrices via Newton-Schulz iterations, which gives better GPU utilization, active high TFLOPS, and reduces memory usage by up to 3x. Nevertheless, Muon(Vanilla) suffers from exploding attention logits and has cubic computation complexity. In this paper we , deep dive into orthogonal momentum gradient updates to find the main properties that help Muon to achieve remarkable performance.We propose \textbf{AuON} (Alternative Unit-norm momentum updates by Normalized nonlinear scaling), a linear-time optimizer that achieves strong performance without approximate orthogonal matrices, while preserving structural alignment and reconditioning ill-posed updates. AuON has an automatic (\textbf{"emergency brake"}) to handle exploding attention logits.. We further introduce a hybrid variant (\textbf{ Hybrid-AuON})that applies the linear transformations with Newton-Schulz iterations which out performs Muon in the language modeling tasks. Code is available at: https://github.com/ryyzn9/AuON

#64 EnScale: Temporally-consistent multivariate generative downscaling via proper scoring rules

著者: Maybritt Schillinger, Maxim Samarin, Xinwei Shen, Reto Knutti, Nicolai Meinshausen

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.26258

要約:
The practical use of future climate projections from global circulation models (GCMs) is often limited by their coarse spatial resolution, requiring downscaling to generate high-resolution data. Regional climate models (RCMs) provide this refinement, but are computationally expensive. To address this issue, machine learning models can learn the downscaling function, mapping coarse GCM outputs to high-resolution fields. Among these, generative approaches aim to capture the full conditional distribution of RCM data given coarse-scale GCM data, which is characterized by large variability and thus challenging to model accurately. We introduce EnScale, a generative machine learning framework that emulates the full GCM-to-RCM map by training on multiple pairs of GCM and corresponding RCM data. It first adjusts large-scale mismatches between GCM and coarsened RCM data, followed by a super-resolution step to generate high-resolution fields. Both steps employ generative models optimized with the energy score, a proper scoring rule. Compared to state-of-the-art ML downscaling approaches, our setup reduces computational cost by about one order of magnitude. EnScale jointly emulates multiple variables -- temperature, precipitation, solar radiation, and wind -- spatially consistent over an area in Central Europe. In addition, we propose a variant EnScale-t that enables temporally consistent downscaling. We establish a comprehensive evaluation framework across various categories including calibration, spatial structure, extremes, and multivariate dependencies. Comparison with diverse benchmarks demonstrates EnScale's strong performance and computational efficiency. EnScale offers a promising approach for accurate and temporally consistent RCM emulation.

#65 AdaDetectGPT: Adaptive Detection of LLM-Generated Text with Statistical Guarantees

著者: Hongyi Zhou, Jin Zhu, Pingfan Su, Kai Ye, Ying Yang, Shakeel A O B Gavioli-Akilagun, Chengchun Shi

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.01268

要約:
We study the problem of determining whether a piece of text has been authored by a human or by a large language model (LLM). Existing state of the art logits-based detectors make use of statistics derived from the log-probability of the observed text evaluated using the distribution function of a given source LLM. However, relying solely on log probabilities can be sub-optimal. In response, we introduce AdaDetectGPT -- a novel classifier that adaptively learns a witness function from training data to enhance the performance of logits-based detectors. We provide statistical guarantees on its true positive rate, false positive rate, true negative rate and false negative rate. Extensive numerical studies show AdaDetectGPT nearly uniformly improves the state-of-the-art method in various combination of datasets and LLMs, and the improvement can reach up to 37\%. A python implementation of our method is available at https://github.com/Mamba413/AdaDetectGPT.

#66 K-DAREK: Distance Aware Error for Kurkova Kolmogorov Networks

著者: Masoud Ataei, Vikas Dhiman, Mohammad Javad Khojasteh

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.22021

要約:
Neural networks are powerful parametric function approximators, while Gaussian processes (GPs) are nonparametric probabilistic models that place distributions over functions via kernel-defined correlations but become computationally expensive for large-scale problems. Kolmogorov-Arnold networks (KANs), semi-parametric neural architectures, model complex functions efficiently using spline layers. Kurkova Kolmogorov-Arnold networks (KKANs) extend KANs by replacing the early spline layers with multi-layer perceptrons that map inputs into higher-dimensional spaces before applying spline-based transformations, which yield more stable training and provide robust architectures for system modeling. By enhancing the KKAN architecture, we develop a novel learning algorithm, distance-aware error for Kurkova-Kolmogorov networks (K-DAREK), for efficient and interpretable function approximation with uncertainty quantification. Our approach establishes robust error bounds that are distance-aware; this means they reflect the proximity of a test point to its nearest training points. In safe control case studies, we demonstrate that K-DAREK is about four times faster and ten times more computationally efficient than Ensemble of KANs, 8.6 times more scalable than GP as data size increases, and 7.2% safer than our previous work distance-aware error for Kolmogorov networks (DAREK). Moreover, on real data (e.g., Real Estate Valuation), K-DAREK's error bound achieves zero coverage violations.

#67 LAD-BNet: Lag-Aware Dual-Branch Networks for Real-Time Energy Forecasting on Edge Devices

著者: Jean-Philippe Lignier

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.10680

要約:
Real-time energy forecasting on edge devices represents a major challenge for smart grid optimization and intelligent buildings. We present LAD-BNet (Lag-Aware Dual-Branch Network), an innovative neural architecture optimized for edge inference with Google Coral TPU. Our hybrid approach combines a branch dedicated to explicit exploitation of temporal lags with a Temporal Convolutional Network (TCN) featuring dilated convolutions, enabling simultaneous capture of short and long-term dependencies. Tested on real energy consumption data with 10-minute temporal resolution, LAD-BNet achieves 14.49% MAPE at 1-hour horizon with only 18ms inference time on Edge TPU, representing an 8-12 x acceleration compared to CPU. The multi-scale architecture enables predictions up to 12 hours with controlled performance degradation. Our model demonstrates a 2.39% improvement over LSTM baselines and 3.04% over pure TCN architectures, while maintaining a 180MB memory footprint suitable for embedded device constraints. These results pave the way for industrial applications in real-time energy optimization, demand management, and operational planning.

#68 An Adaptive Resonance Theory-based Topological Clustering Algorithm with a Self-Adjusting Vigilance Parameter

著者: Naoki Masuyama, Yuichiro Toda, Yusuke Nojima, Hisao Ishibuchi

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.17983

要約:
Clustering in stationary and nonstationary settings, where data distributions remain static or evolve over time, requires models that can adapt to distributional shifts while preserving previously learned cluster structures. This paper proposes an Adaptive Resonance Theory (ART)-based topological clustering algorithm that autonomously adjusts its recalculation interval and vigilance threshold through a diversity-driven adaptation mechanism. This mechanism enables hyperparameter-free learning that maintains cluster stability and continuity in dynamic environments. Experiments on 24 real-world datasets demonstrate that the proposed algorithm outperforms state-of-the-art methods in both clustering performance and continual learning capability. These results highlight the effectiveness of the proposed parameter adaptation in mitigating catastrophic forgetting and maintaining consistent clustering in evolving data streams. Source code is available at https://github.com/Masuyama-lab/IDAT

#69 Comparing Two Proxy Methods for Causal Identification

著者: Helen Guo, Elizabeth L. Ogburn, Ilya Shpitser

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.00175

要約:
Identifying causal effects in the presence of unmeasured variables is a fundamental challenge in causal inference, for which proxy variable methods have emerged as a powerful solution. We contrast two major approaches in this framework: (1) bridge equation methods, which leverage solutions to integral equations to recover causal targets, and (2) array decomposition methods, which recover latent factors composing counterfactual quantities by exploiting unique determination of eigenspaces. We compare the model restrictions underlying these two approaches and provide insight into implications of the underlying assumptions, clarifying the scope of applicability for each method.

#70 Calibrating Geophysical Predictions under Constrained Probabilistic Distributions

著者: Zhewen Hou, Jiajin Sun, Subashree Venkatasubramanian, Peter Jin, Shuolin Li, Tian Zheng

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.03081

要約:
Machine learning (ML) has shown significant promise in studying complex geophysical dynamical systems, including turbulence and climate processes. Such systems often display sensitive dependence on initial conditions, reflected in positive Lyapunov exponents, where even small perturbations in short-term forecasts can lead to large deviations in long-term outcomes. Thus, meaningful inference requires not only accurate short-term predictions, but also consistency with the system's long-term attractor that is captured by the marginal distribution of state variables. Existing approaches attempt to address this challenge by incorporating spatial and temporal dependence, but these strategies become impractical when data are extremely sparse. In this work, we show that prior knowledge of marginal distributions offers valuable complementary information to short-term observations, motivating a distribution-informed learning framework. We introduce a calibration algorithm based on normalization and the Kernelized Stein Discrepancy (KSD) to enhance ML predictions. The method here employs KSD within a reproducing kernel Hilbert space to calibrate model outputs, improving their fidelity to known physical distributions. This not only sharpens pointwise predictions but also enforces consistency with non-local statistical structures rooted in physical principles. Through synthetic experiments-spanning offline climatological CO2 fluxes and online quasi-geostrophic flow simulations-we demonstrate the robustness and broad utility of the proposed framework.

#71 SSLfmm: An R Package for Semi-Supervised Learning with a Mixed-Missingness Mechanism in Finite Mixture Models

著者: Geoffrey J. McLachlan, Jinran Wu

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.03322

要約:
Semi-supervised learning (SSL) constructs classifiers from datasets in which only a subset of observations is labelled, a situation that naturally arises because obtaining labels often requires expert judgement or costly manual effort. This motivates methods that integrate labelled and unlabelled data within a learning framework. Most SSL approaches assume that label absence is harmless, typically treated as missing completely at random or ignored, but in practice, the missingness process can be informative, as the chances of an observation being unlabelled may depend on the ambiguity of its feature vector. In such cases, the missingness indicators themselves provide additional information that, if properly modelled, may improve estimation efficiency. The \textbf{SSLfmm} package for R is designed to capture this behaviour by estimating the Bayes' classifier under a finite mixture model in which each component corresponding to a class follows a multivariate normal distribution. It incorporates a mixed-missingness mechanism that combines a missing completely at random (MCAR) component with a (non-ignorable) missing at random (MAR) component, the latter modelling the probability of label missingness as a logistic function of the entropy based on the features. Parameters are estimated via an Expectation--Conditional Maximisation algorithm. In the two-class Gaussian setting with arbitrary covariance matrices, the resulting classifier trained on partially labelled data may, in some cases, achieve a lower misclassification rate than the supervised version in the case where all the labels are known. The package includes a practical tool for modelling and illustrates its performance through simulated examples.

#72 Mitigating the Curse of Detail: Scaling Arguments for Feature Learning and Sample Complexity

著者: Noa Rubin, Orit Davidovich, Zohar Ringel

公開日: Tue, 09 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.04165

要約:
Two pressing topics in the theory of deep learning are the interpretation of feature learning mechanisms and the determination of implicit bias of networks in the rich regime. Current theories of rich feature learning, often appear in the form of high-dimensional non-linear equations, which require computationally intensive numerical solutions. Furthermore, even under such limiting settings, predictions often appear in the form of high-dimensional non-linear equations, which require computationally intensive numerical solutions. Given the many details that go into defining a deep learning problem, this analytical complexity is a significant and often unavoidable challenge. Here, we propose a powerful heuristic route for predicting the data and width scales at which various patterns of feature learning emerge. This form of scale analysis is considerably simpler than such exact theories and reproduces the scaling exponents of various known results. In addition, we make novel predictions on complex toy architectures, such as three-layer non-linear networks and attention heads, thus extending the scope of first-principle theories of deep learning.

stat.ML updates on arXiv.org

📋 論文タイトル一覧