arXiv論文一覧 - stat.ML updates on arXiv.org

#1 Bayesian Physics-Informed Neural Networks for Inverse Problems (BPINN-IP): Application in Infrared Image Processing

著者: Ali Mohammad-Djafari, Ning Chu, Li Wang

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.02495

要約:
Inverse problems arise across scientific and engineering domains, where the goal is to infer hidden parameters or physical fields from indirect and noisy observations. Classical approaches, such as variational regularization and Bayesian inference, provide well established theoretical foundations for handling ill posedness. However, these methods often become computationally restrictive in high dimensional settings or when the forward model is governed by complex physics. Physics Informed Neural Networks (PINNs) have recently emerged as a promising framework for solving inverse problems by embedding physical laws directly into the training process of neural networks. In this paper, we introduce a new perspective on the Bayesian Physics Informed Neural Network (BPINN) framework, extending classical PINNs by explicitly incorporating training data generation, modeling and measurement uncertainties through Bayesian prior modeling and doing inference with the posterior laws. Also, as we focus on the inverse problems, we call this method BPINN-IP, and we show that the standard PINN formulation naturally appears as its special case corresponding to the Maximum A Posteriori (MAP) estimate. This unified formulation allows simultaneous exploitation of physical constraints, prior knowledge, and data-driven inference, while enabling uncertainty quantification through posterior distributions. To demonstrate the effectiveness of the proposed framework, we consider inverse problems arising in infrared image processing, including deconvolution and super-resolution, and present results on both simulated and real industrial data.

#2 Laplace Approximation For Tensor Train Kernel Machines In System Identification

著者: Albert Saiapin, Kim Batselier

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.02532

要約:
To address the scalability limitations of Gaussian process (GP) regression, several approximation techniques have been proposed. One such method is based on tensor networks, which utilizes an exponential number of basis functions without incurring exponential computational cost. However, extending this model to a fully probabilistic formulation introduces several design challenges. In particular, for tensor train (TT) models, it is unclear which TT-core should be treated in a Bayesian manner. We introduce a Bayesian tensor train kernel machine that applies Laplace approximation to estimate the posterior distribution over a selected TT-core and employs variational inference (VI) for precision hyperparameters. Experiments show that core selection is largely independent of TT-ranks and feature structure, and that VI replaces cross-validation while offering up to 65x faster training. The method's effectiveness is demonstrated on an inverse dynamics problem.

#3 Revisiting Theory of Contrastive Learning for Domain Generalization

著者: Ali Alvandi, Mina Rezaei

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.02831

要約:
Contrastive learning is among the most popular and powerful approaches for self-supervised representation learning, where the goal is to map semantically similar samples close together while separating dissimilar ones in the latent space. Existing theoretical methods assume that downstream task classes are drawn from the same latent class distribution used during the pretraining phase. However, in real-world settings, downstream tasks may not only exhibit distributional shifts within the same label space but also introduce new or broader label spaces, leading to domain generalization challenges. In this work, we introduce novel generalization bounds that explicitly account for both types of mismatch: domain shift and domain generalization. Specifically, we analyze scenarios where downstream tasks either (i) draw classes from the same latent class space but with shifted distributions, or (ii) involve new label spaces beyond those seen during pretraining. Our analysis reveals how the performance of contrastively learned representations depends on the statistical discrepancy between pretraining and downstream distributions. This extended perspective allows us to derive provable guarantees on the performance of learned representations on average classification tasks involving class distributions outside the pretraining latent class set.

#4 EcoCast: A Spatio-Temporal Model for Continual Biodiversity and Climate Risk Forecasting

著者: Hammed A. Akande, Abdulrauf A. Gidado

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.02260

要約:
Increasing climate change and habitat loss are driving unprecedented shifts in species distributions. Conservation professionals urgently need timely, high-resolution predictions of biodiversity risks, especially in ecologically diverse regions like Africa. We propose EcoCast, a spatio-temporal model designed for continual biodiversity and climate risk forecasting. Utilizing multisource satellite imagery, climate data, and citizen science occurrence records, EcoCast predicts near-term (monthly to seasonal) shifts in species distributions through sequence-based transformers that model spatio-temporal environmental dependencies. The architecture is designed with support for continual learning to enable future operational deployment with new data streams. Our pilot study in Africa shows promising improvements in forecasting distributions of selected bird species compared to a Random Forest baseline, highlighting EcoCast's potential to inform targeted conservation policies. By demonstrating an end-to-end pipeline from multi-modal data ingestion to operational forecasting, EcoCast bridges the gap between cutting-edge machine learning and biodiversity management, ultimately guiding data-driven strategies for climate resilience and ecosystem conservation throughout Africa.

#5 Spatiotemporal Pyramid Flow Matching for Climate Emulation

著者: Jeremy Andrew Irvin, Jiaqi Han, Zikui Wang, Abdulaziz Alharbi, Yufei Zhao, Nomin-Erdene Bayarsaikhan, Daniele Visioni, Andrew Y. Ng, Duncan Watson-Parris

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.02268

要約:
Generative models have the potential to transform the way we emulate Earth's changing climate. Previous generative approaches rely on weather-scale autoregression for climate emulation, but this is inherently slow for long climate horizons and has yet to demonstrate stable rollouts under nonstationary forcings. Here, we introduce Spatiotemporal Pyramid Flows (SPF), a new class of flow matching approaches that model data hierarchically across spatial and temporal scales. Inspired by cascaded video models, SPF partitions the generative trajectory into a spatiotemporal pyramid, progressively increasing spatial resolution to reduce computation and coupling each stage with an associated timescale to enable direct sampling at any temporal level in the pyramid. This design, together with conditioning each stage on prescribed physical forcings (e.g., greenhouse gases or aerosols), enables efficient, parallel climate emulation at multiple timescales. On ClimateBench, SPF outperforms strong flow matching baselines and pre-trained models at yearly and monthly timescales while offering fast sampling, especially at coarser temporal levels. To scale SPF, we curate ClimateSuite, the largest collection of Earth system simulations to date, comprising over 33,000 simulation-years across ten climate models and the first dataset to include simulations of climate interventions. We find that the scaled SPF model demonstrates good generalization to held-out scenarios across climate models. Together, SPF and ClimateSuite provide a foundation for accurate, efficient, probabilistic climate emulation across temporal scales and realistic future scenarios. Data and code is publicly available at https://github.com/stanfordmlgroup/spf .

#6 Unlocking the Power of Boltzmann Machines by Parallelizable Sampler and Efficient Temperature Estimation

著者: Kentaro Kubo, Hayato Goto

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.02323

要約:
Boltzmann machines (BMs) are powerful energy-based generative models, but their heavy training cost has largely confined practical use to Restricted BMs (RBMs) trained with an efficient learning method called contrastive divergence. More accurate learning typically requires Markov chain Monte Carlo (MCMC) Boltzmann sampling, but it is time-consuming due to the difficulty of parallelization for more expressive models. To address this limitation, we first propose a new Boltzmann sampler inspired by a quantum-inspired combinatorial optimization called simulated bifurcation (SB). This SB-inspired approach, which we name Langevin SB (LSB), enables parallelized sampling while maintaining accuracy comparable to MCMC. Furthermore, this is applicable not only to RBMs but also to BMs with general couplings. However, LSB cannot control the inverse temperature of the output Boltzmann distribution, which hinders learning and degrades performance. To overcome this limitation, we also developed an efficient method for estimating the inverse temperature during the learning process, which we call conditional expectation matching (CEM). By combining LSB and CEM, we establish an efficient learning framework for BMs with greater expressive power than RBMs. We refer to this framework as sampler-adaptive learning (SAL). SAL opens new avenues for energy-based generative modeling beyond RBMs.

#7 Safeguarded Stochastic Polyak Step Sizes for Non-smooth Optimization: Robust Performance Without Small (Sub)Gradients

著者: Dimitris Oikonomou, Nicolas Loizou

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.02342

要約:
The stochastic Polyak step size (SPS) has proven to be a promising choice for stochastic gradient descent (SGD), delivering competitive performance relative to state-of-the-art methods on smooth convex and non-convex optimization problems, including deep neural network training. However, extensions of this approach to non-smooth settings remain in their early stages, often relying on interpolation assumptions or requiring knowledge of the optimal solution. In this work, we propose a novel SPS variant, Safeguarded SPS (SPS$_{safe}$), for the stochastic subgradient method, and provide rigorous convergence guarantees for non-smooth convex optimization with no need for strong assumptions. We further incorporate momentum into the update rule, yielding equally tight theoretical results. Comprehensive experiments on convex benchmarks and deep neural networks corroborate our theory: the proposed step size accelerates convergence, reduces variance, and consistently outperforms existing adaptive baselines. Finally, in the context of deep neural network training, our method demonstrates robust performance by addressing the vanishing gradient problem.

#8 On the Problem of Consistent Anomalies in Zero-Shot Anomaly Detection

著者: Tai Le-Gia

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.02520

要約:
Zero-shot anomaly classification and segmentation (AC/AS) aim to detect anomalous samples and regions without any training data, a capability increasingly crucial in industrial inspection and medical imaging. This dissertation aims to investigate the core challenges of zero-shot AC/AS and presents principled solutions rooted in theory and algorithmic design. We first formalize the problem of consistent anomalies, a failure mode in which recurring similar anomalies systematically bias distance-based methods. By analyzing the statistical and geometric behavior of patch representations from pre-trained Vision Transformers, we identify two key phenomena - similarity scaling and neighbor-burnout - that describe how relationships among normal patches change with and without consistent anomalies in settings characterized by highly similar objects. We then introduce CoDeGraph, a graph-based framework for filtering consistent anomalies built on the similarity scaling and neighbor-burnout phenomena. Through multi-stage graph construction, community detection, and structured refinement, CoDeGraph effectively suppresses the influence of consistent anomalies. Next, we extend this framework to 3D medical imaging by proposing a training-free, computationally efficient volumetric tokenization strategy for MRI data. This enables a genuinely zero-shot 3D anomaly detection pipeline and shows that volumetric anomaly segmentation is achievable without any 3D training samples. Finally, we bridge batch-based and text-based zero-shot methods by demonstrating that CoDeGraph-derived pseudo-masks can supervise prompt-driven vision-language models. Together, this dissertation provides theoretical understanding and practical solutions for the zero-shot AC/AS problem.

#9 Tempering the Bayes Filter towards Improved Model-Based Estimation

著者: Menno van Zutphen, Domagoj Herceg, Giannis Delimpaltadakis, Duarte J. Antunes

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.02823

要約:
Model-based filtering is often carried out while subject to an imperfect model, as learning partially-observable stochastic systems remains a challenge. Recent work on Bayesian inference found that tempering the likelihood or full posterior of an imperfect model can improve predictive accuracy, as measured by expected negative log likelihood. In this paper, we develop the tempered Bayes filter, improving estimation performance through both of the aforementioned, and one newly introduced, modalities. The result admits a recursive implementation with a computational complexity no higher than that of the original Bayes filter. Our analysis reveals that -- besides the well-known fact in the field of Bayesian inference that likelihood tempering affects the balance between prior and likelihood -- full-posterior tempering tunes the level of entropy in the final belief distribution. We further find that a region of the tempering space can be understood as interpolating between the Bayes- and MAP filters, recovering these as special cases. Analytical results further establish conditions under which a tempered Bayes filter achieves improved predictive performance. Specializing the results to the linear Gaussian case, we obtain the tempered Kalman filter. In this context, we interpret how the parameters affect the Kalman state estimate and covariance propagation. Empirical results confirm that our method consistently improves predictive accuracy over the Bayes filter baseline.

#10 HeteroJIVE: Joint Subspace Estimation for Heterogeneous Multi-View Data

著者: Jingyang Li, Zhongyuan Lyu

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.02866

要約:
Many modern datasets consist of multiple related matrices measured on a common set of units, where the goal is to recover the shared low-dimensional subspace. While the Angle-based Joint and Individual Variation Explained (AJIVE) framework provides a solution, it relies on equal-weight aggregation, which can be strictly suboptimal when views exhibit significant statistical heterogeneity (arising from varying SNR and dimensions) and structural heterogeneity (arising from individual components). In this paper, we propose HeteroJIVE, a weighted two-stage spectral algorithm tailored to such heterogeneity. Theoretically, we first revisit the ``non-diminishing" error barrier with respect to the number of views $K$ identified in recent literature for the equal-weight case. We demonstrate that this barrier is not universal: under generic geometric conditions, the bias term vanishes and our estimator achieves the $O(K^{-1/2})$ rate without the need for iterative refinement. Extending this to the general-weight case, we establish error bounds that explicitly disentangle the two layers of heterogeneity. Based on this, we derive an oracle-optimal weighting scheme implemented via a data-driven procedure. Extensive simulations corroborate our theoretical findings, and an application to TCGA-BRCA multi-omics data validates the superiority of HeteroJIVE in practice.

#11 Hypothesis Testing for Generalized Thurstone Models

著者: Anuran Makur, Japneet Singh

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.02912

要約:
In this work, we develop a hypothesis testing framework to determine whether pairwise comparison data is generated by an underlying \emph{generalized Thurstone model} $\mathcal{T}_F$ for a given choice function $F$. While prior work has predominantly focused on parameter estimation and uncertainty quantification for such models, we address the fundamental problem of minimax hypothesis testing for $\mathcal{T}_F$ models. We formulate this testing problem by introducing a notion of separation distance between general pairwise comparison models and the class of $\mathcal{T}_F$ models. We then derive upper and lower bounds on the critical threshold for testing that depend on the topology of the observation graph. For the special case of complete observation graphs, this threshold scales as $\Theta((nk)^{-1/2})$, where $n$ is the number of agents and $k$ is the number of comparisons per pair. Furthermore, we propose a hypothesis test based on our separation distance, construct confidence intervals, establish time-uniform bounds on the probabilities of type I and II errors using reverse martingale techniques, and derive minimax lower bounds using information-theoretic methods. Finally, we validate our results through experiments on synthetic and real-world datasets.

#12 Fast Gaussian Process Approximations for Autocorrelated Data

著者: Ahmadreza Chokhachian, Matthias Katzfuss, Yu Ding

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.02925

要約:
This paper is concerned with the problem of how to speed up computation for Gaussian process models trained on autocorrelated data. The Gaussian process model is a powerful tool commonly used in nonlinear regression applications. Standard regression modeling assumes random samples and an independently, identically distributed noise. Various fast approximations that speed up Gaussian process regression work under this standard setting. But for autocorrelated data, failing to account for autocorrelation leads to a phenomenon known as temporal overfitting that deteriorates model performance on new test instances. To handle autocorrelated data, existing fast Gaussian process approximations have to be modified; one such approach is to segment the originally correlated data points into blocks in which the blocked data are de-correlated. This work explains how to make some of the existing Gaussian process approximations work with blocked data. Numerical experiments across diverse application datasets demonstrate that the proposed approaches can remarkably accelerate computation for Gaussian process regression on autocorrelated data without compromising model prediction performance.

#13 Identification of Multivariate Measurement Error Models

著者: Yingyao Hu

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.02970

要約:
This paper develops new identification results for multidimensional continuous measurement-error models where all observed measurements are contaminated by potentially correlated errors and none provides an injective mapping of the latent distribution. Using third order cross moments, the paper constructs a three way tensor whose unique decomposition, guaranteed by Kruskal theorem, identifies the factor loading matrices. Starting with a linear structure, the paper recovers the full distribution of latent factors by constructing suitable measurements and applying scalar or multivariate versions of Kotlarski identity. As a result, the joint distribution of the latent vector and measurement errors is fully identified without requiring injective measurements, showing that multivariate latent structure can be recovered in broader settings than previously believed. Under injectivity, the paper also provides user-friendly testable conditions for identification. Finally, this paper provides general identification results for nonlinear models using a newly-defined generalized Kruskal rank - signal rank - of intergral operators. These results have wide applicability in empirical work involving noisy or indirect measurements, including factor models, survey data with reporting errors, mismeasured regressors in econometrics, and multidimensional latent-trait models in psychology and marketing, potentially enabling more robust estimation and interpretation when clean measurements are unavailable.

#14 Anomalous Change Point Detection Using Probabilistic Predictive Coding

著者: Roelof G. Hup, Julian P. Merkofer, Alex A. Bhogal, Ruud J. G. van Sloun, Reinder Haakma, Rik Vullings

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2405.15727

要約:
Change point detection (CPD) and anomaly detection (AD) are essential techniques in various fields to identify abrupt changes or abnormal data instances. However, existing methods are often constrained to univariate data, face scalability challenges with large datasets due to computational demands, and experience reduced performance with high-dimensional or intricate data, as well as hidden anomalies. Furthermore, they often lack interpretability and adaptability to domain-specific knowledge, which limits their versatility across different fields. In this work, we propose a deep learning-based CPD/AD method called Probabilistic Predictive Coding (PPC) that jointly learns to encode sequential data to low-dimensional latent space representations and to predict the subsequent data representations as well as the corresponding prediction uncertainties. The model parameters are optimized with maximum likelihood estimation by comparing these predictions with the true encodings. At the time of application, the true and predicted encodings are used to determine the probability of conformance, an interpretable and meaningful anomaly score. Furthermore, our approach has linear time complexity, scalability issues are prevented, and the method can easily be adjusted to a wide range of data types and intricate applications. We demonstrate the effectiveness and adaptability of our proposed method across synthetic time series experiments, image data, and real-world magnetic resonance spectroscopic imaging data.

#15 Adaptive Prediction-Powered AutoEval with Reliability and Efficiency Guarantees

著者: Sangwoo Park, Matteo Zecchin, Osvaldo Simeone

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.18659

要約:
Selecting artificial intelligence (AI) models, such as large language models (LLMs), from multiple candidates requires accurate performance estimation. This is ideally achieved through empirical evaluations involving abundant real-world data. However, such evaluations are costly and impractical at scale. To address this challenge, autoevaluation methods leverage synthetic data produced by automated evaluators, such as LLMs-as-judges, reducing variance but potentially introducing bias. Recent approaches have employed semi-supervised prediction-powered inference (PPI) to correct for the bias of autoevaluators. However, the use of autoevaluators may lead in practice to a degradation in sample efficiency compared to conventional methods using only real-world data. In this paper, we propose R-AutoEval+, a novel framework that provides finite-sample reliability guarantees on the model evaluation, while also ensuring an enhanced (or at least no worse) sample efficiency compared to conventional methods. The key innovation of R-AutoEval+ is an adaptive construction of the model evaluation variable, which dynamically tunes its reliance on synthetic data, reverting to conventional methods when the autoevaluator is insufficiently accurate. Experiments on the use of LLMs-as-judges for the optimization of quantization settings for the weights of an LLM, for prompt design in LLMs, and for test-time reasoning budget allocation in LLMs confirm the reliability and efficiency of R-AutoEval+.

#16 kNNSampler: Stochastic Imputations for Recovering Missing Value Distributions

著者: Parastoo Pashmchi, J\'er\^ome Benoit, Motonobu Kanagawa

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.08366

要約:
We study a missing-value imputation method, termed kNNSampler, that imputes a given unit's missing response by randomly sampling from the observed responses of the $k$ most similar units to the given unit in terms of the observed covariates. This method can sample unknown missing values from their distributions, quantify the uncertainties of missing values, and be readily used for multiple imputation. Unlike popular kNNImputer, which estimates the conditional mean of a missing response given an observed covariate, kNNSampler is theoretically shown to estimate the conditional distribution of a missing response given an observed covariate. Experiments illustrate the performance of kNNSampler. The code for kNNSampler is made publicly available (https://github.com/SAP/knn-sampler).

#17 Curiosity-Driven Development of Action and Language in Robots Through Self-Exploration

著者: Theodore Jerome Tinker, Kenji Doya, Jun Tani

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.05013

要約:
Human infants acquire language and action gradually through development, achieving strong generalization from minimal experience, whereas large language models require exposure to billions of training tokens. What mechanisms underlie such efficient developmental learning in humans? This study investigates this question through robot simulation experiments in which agents learn to perform actions associated with imperative sentences (e.g., \textit{push red cube}) via curiosity-driven self-exploration. Our approach integrates the active inference framework with reinforcement learning, enabling intrinsically motivated developmental learning. The simulations reveal several key findings: i) Generalization improves markedly as the scale of compositional elements increases. ii) Curiosity combined with motor noise yields substantially better learning than exploration without curiosity. iii) Rote pairing of sentences and actions precedes the emergence of compositional generalization. iv) Simpler, prerequisite-like actions develop earlier than more complex actions that depend on them. v) When exception-handling rules were introduced -- where certain imperative sentences required executing inconsistent actions -- the robots successfully acquired these exceptions through exploration and displayed a U-shaped performance curve characteristic of representational redescription in child language learning. Together, these results suggest that curiosity-driven exploration and active inference provide a powerful account of how intrinsic motivation and hierarchical sensorimotor learning can jointly support scalable compositional generalization and exception handling in both humans and artificial agents.

#18 Geopolitics, Geoeconomics and Risk: A Machine Learning Approach

著者: Alvaro Ortiz, Tomasa Rodrigo

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.12416

要約:
We introduce a novel high-frequency daily panel dataset of both markets and news-based indicators -- including Geopolitical Risk, Economic Policy Uncertainty, Trade Policy Uncertainty, and Political Sentiment -- for 42 countries across both emerging and developed markets. Using this dataset, we study how sentiment dynamics shape sovereign risk, measured by Credit Default Swap (CDS) spreads, and evaluate their forecasting value relative to traditional drivers such as global monetary policy and market volatility. Our horse-race analysis of forecasting models demonstrates that incorporating news-based indicators significantly enhances predictive accuracy and enriches the analysis, with non-linear machine learning methods -- particularly Random Forests -- delivering the largest gains. Our analysis reveals that while global financial variables remain the dominant drivers of sovereign risk, geopolitical risk and economic policy uncertainty also play a meaningful role. Crucially, their effects are amplified through non-linear interactions with global financial conditions. Finally, we document pronounced regional heterogeneity, as certain asset classes and emerging markets exhibit heightened sensitivity to shocks in policy rates, global financial volatility, and geopolitical risk.

#19 On the identifiability of causal graphs with multiple environments

著者: Francesco Montagna

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.13583

要約:
Causal discovery from i.i.d. observational data is known to be generally ill-posed. We demonstrate that if we have access to the distribution of a structural causal model, and additional data from only two environments that sufficiently differ in the noise statistics, the unique causal graph is identifiable. Notably, this is the first result in the literature that guarantees the entire causal graph recovery with a constant number of environments and arbitrary nonlinear mechanisms. Our only constraint is the Gaussianity of the noise terms; however, we propose potential ways to relax this requirement. Of interest on its own, we expand on the well-known duality between independent component analysis (ICA) and causal discovery; recent advancements have shown that nonlinear ICA can be solved from multiple environments, at least as many as the number of sources: we show that the same can be achieved for causal discovery while having access to much less auxiliary information.

#20 Front-door Reducibility: Reducing ADMGs to the Standard Front-door Setting via a Graphical Criterion

著者: Jianqiao Mao, Max A. Little

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.15679

要約:
Front-door adjustment gives a simple closed-form identification formula under the classical front-door criterion, but its applicability is often viewed as narrow. By contrast, the general ID algorithm can identify many more causal effects in arbitrary graphs, yet typically outputs algebraically complex expressions that are hard to estimate and interpret. We show that many such graphs can in fact be reduced to a standard front-door setting via front-door reducibility (FDR), a graphical condition on acyclic directed mixed graphs that aggregates variables into super-nodes $(\boldsymbol{X}^{*},\boldsymbol{Y}^{*},\boldsymbol{M}^{*})$. We characterize the FDR criterion, prove it is equivalent (at the graph level) to the existence of an FDR adjustment, and present FDR-TID, an exact algorithm that finds an admissible FDR triple with correctness, completeness, and finite-termination guarantees. Empirical examples show that many graphs far outside the textbook front-door setting are FDR, yielding simple, estimable adjustments where general ID expressions would be cumbersome. FDR therefore complements existing identification methods by prioritizing interpretability and computational simplicity without sacrificing generality across mixed graphs.

#21 Sparse Multiple Kernel Learning: Alternating Best Response and Semidefinite Relaxations

著者: Dimitris Bertsimas, Caio de Prospero Iglesias, Nicholas A. G. Johnson

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.21890

要約:
We study Sparse Multiple Kernel Learning (SMKL), which is the problem of selecting a sparse convex combination of prespecified kernels for support vector binary classification. Unlike prevailing l1 regularized approaches that approximate a sparsifying penalty, we formulate the problem by imposing an explicit cardinality constraint on the kernel weights and add an l2 penalty for robustness. We solve the resulting non-convex minimax problem via an alternating best response algorithm with two subproblems: the alpha subproblem is a standard kernel SVM dual solved via LIBSVM, while the beta subproblem admits an efficient solution via the Greedy Selector and Simplex Projector algorithm. We reformulate SMKL as a mixed integer semidefinite optimization problem and derive a hierarchy of semidefinite convex relaxations which can be used to certify near-optimality of the solutions returned by our best response algorithm and also to warm start it. On ten UCI benchmarks, our method with random initialization outperforms state-of-the-art MKL approaches in out-of-sample prediction accuracy on average by 3.34 percentage points (relative to the best performing benchmark) while selecting a small number of candidate kernels in comparable runtime. With warm starting, our method outperforms the best performing benchmark's out-of-sample prediction accuracy on average by 4.05 percentage points. Our convex relaxations provide a certificate that in several cases, the solution returned by our best response algorithm is the globally optimal solution.

#22 On SkipGram Word Embedding Models with Negative Sampling: Unified Framework and Impact of Noise Distributions

著者: Dezhi Liu, Richong Zhang, Ziqiao Wang

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2009.04413

要約:
SkipGram word embedding models with negative sampling, or SGN in short, is an elegant family of word embedding models. In this paper, we formulate a framework for word embedding, referred to as Word-Context Classification (WCC), that generalizes SGN to a wide family of models. The framework, which uses some ``noise examples'', is justified through theoretical analysis. The impact of noise distribution on the learning of the WCC embedding models is studied experimentally, suggesting that the best noise distribution is, in fact, the data distribution, in terms of both the embedding performance and the speed of convergence during training. Along our way, we discover several novel embedding models that outperform existing WCC models.

#23 Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization

著者: Tong Li, Jacob Nogas, Haochen Song, Anna Rafferty, Eric M. Schwartz, Audrey Durand, Harsh Kumar, Nina Deliu, Sofia S. Villar, Dehan Kong, Joseph J. Williams

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2112.08507

要約:
Traditional randomized A/B experiments assign arms with uniform random (UR) probability, such as 50/50 assignment to two versions of a website to discover whether one version engages users more. To more quickly and automatically use data to benefit users, multi-armed bandit algorithms such as Thompson Sampling (TS) have been advocated. While TS is interpretable and incorporates the randomization key to statistical inference, it can cause biased estimates and increase false positives and false negatives in detecting differences in arm means. We introduce a more Statistically Sensitive algorithm, TS-PostDiff (Posterior Probability of Small Difference), that mixes TS with traditional UR by using an additional adaptive step, where the probability of using UR (vs TS) is proportional to the posterior probability that the difference in arms is small. This allows an experimenter to define what counts as a small difference, below which a traditional UR experiment can obtain informative data for statistical inference at low cost, and above which using more TS to maximize user benefits is key. We evaluate TS-PostDiff against UR, TS, and two other TS variants designed to improve statistical inference. We consider results for the common two-armed experiment across a range of settings inspired by real-world applications. Our results provide insight into when and why TS-PostDiff or alternative approaches provide better tradeoffs between benefiting users (reward) and statistical inference (false positive rate and power). TS-PostDiff's adaptivity helps efficiently reduce false positives and increase statistical power when differences are small, while increasing reward more when differences are large. The work highlights important considerations for future Statistically Sensitive algorithm development that balances reward and statistical analysis in adaptive experimentation.

#24 Gaussian and Non-Gaussian Universality of Data Augmentation

著者: Kevin Han Huang, Peter Orbanz, Morgane Austern

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2202.09134

要約:
We provide universality results that quantify how data augmentation affects the variance and limiting distribution of estimates through simple surrogates, and analyze several specific models in detail. The results confirm some observations made in machine learning practice, but also lead to unexpected findings: Data augmentation may increase rather than decrease the uncertainty of estimates, such as the empirical prediction risk. It can act as a regularizer, but fails to do so in certain high-dimensional problems, and it may shift the double-descent peak of an empirical risk. Overall, the analysis shows that several properties data augmentation has been attributed with are not either true or false, but rather depend on a combination of factors -- notably the data distribution, the properties of the estimator, and the interplay of sample size, number of augmentations, and dimension. As our main theoretical tool, we develop an adaptation of Lindeberg's technique for block dependence. The resulting universality regime may be Gaussian or non-Gaussian.

#25 Unifying Linear-Time Attention via Latent Probabilistic Modelling

著者: Rares Dolga, Lucas Maystre, Marius Cobzarenco, David Barber

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2402.17512

要約:
Transformers have achieved state-of-the-art results across a range of domains, but their quadratic attention mechanism poses significant challenges for long-sequence modelling. Recent efforts to design linear-time attention mechanisms have yielded more scalable alternatives, yet often at the cost of performance, particularly on discrete data such as language. In this work, we revisit linear attention through the lens of probabilistic graphical models. We first show that standard linear attention can be interpreted as an undirected latent variable model, revealing a key limitation: the absence of directionality. To address this, we propose a novel directed parameterisation of linear attention that introduces an asymmetric structure, enabling an interpretation aligned with the causal and sequential nature of language. Our formulation integrates global latent-variable attention with local standard attention in a fully probabilistic framework. Additionally, we introduce a recurrent parameterisation of queries and keys that avoids reliance on relative positional encodings, often incompatible with linear attention. Experiments on language modelling benchmarks demonstrate that our model achieves competitive performance with standard attention and outperforms existing linear attention variants.

#26 Mixture of Experts Softens the Curse of Dimensionality in Operator Learning

著者: Anastasis Kratsios, Takashi Furuya, Jose Antonio Lara Benitez, Matti Lassas, Maarten de Hoop

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2404.09101

要約:
We study the approximation-theoretic implications of mixture-of-experts architectures for operator learning, where the complexity of a single large neural operator is distributed across many small neural operators (NOs), and each input is routed to exactly one NO via a decision tree. We analyze how this tree-based routing and expert decomposition affect approximation power, sample complexity, and stability. Our main result is a distributed universal approximation theorem for mixture of neural operators (MoNOs): any Lipschitz nonlinear operator between $L^2([0,1]^d)$ spaces can be uniformly approximated over the Sobolev unit ball to arbitrary accuracy $\varepsilon>0$ by an MoNO, where each expert NO has a depth, width, and rank scaling as $\mathcal{O}(\varepsilon^{-1})$. Although the number of experts may grow with accuracy, each NO remains small, enough to fit within active memory of standard hardware for reasonable accuracy levels. Our analysis also yields new quantitative approximation rates for classical NOs approximating uniformly continuous nonlinear operators uniformly on compact subsets of $L^2([0,1]^d)$.

#27 Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families

著者: Felipe Maia Polo, Seamus Somerstep, Leshem Choshen, Yuekai Sun, Mikhail Yurochkin

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2412.06540

要約:
Scaling laws for large language models (LLMs) predict model performance based on parameters like size and training data. However, differences in training configurations and data processing across model families lead to significant variations in benchmark performance, making it difficult for a single scaling law to generalize across all LLMs. On the other hand, training family-specific scaling laws requires training models of varying sizes for every family. In this work, we propose Skills Scaling Laws (SSLaws, pronounced as Sloth), a novel scaling law that leverages publicly available benchmark data and assumes LLM performance is driven by low-dimensional latent skills, such as reasoning and instruction following. These latent skills are influenced by computational resources like model size and training tokens, but with varying efficiencies across model families. Sloth exploits correlations across benchmarks to provide more accurate and interpretable predictions while alleviating the need to train multiple LLMs per family. We present both theoretical results on parameter identification and empirical evaluations on 12 prominent benchmarks, from Open LLM Leaderboard v1/v2, demonstrating that Sloth predicts LLM performance accurately and offers insights into scaling behaviors for complex downstream tasks, increased test-time compute, and compute-optimal scaling of skills.

#28 Decentralized Projection-free Online Upper-Linearizable Optimization with Applications to DR-Submodular Optimization

著者: Yiyang Lu, Mohammad Pedramfar, Vaneet Aggarwal

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2501.18183

要約:
We introduce a novel framework for decentralized projection-free optimization, extending projection-free methods to a broader class of upper-linearizable functions. Our approach leverages decentralized optimization techniques with the flexibility of upper-linearizable function frameworks, effectively generalizing traditional DR-submodular function optimization. We obtain the regret of $O(T^{1-\theta/2})$ with communication complexity of $O(T^{\theta})$ and number of linear optimization oracle calls of $O(T^{2\theta})$ for decentralized upper-linearizable function optimization, for any $0\le \theta \le 1$. This approach allows for the first results for monotone up-concave optimization with general convex constraints and non-monotone up-concave optimization with general convex constraints. Further, the above results for first order feedback are extended to zeroth order, semi-bandit, and bandit feedback.

#29 Machine Unlearning via Information Theoretic Regularization

privacy

著者: Shizhou Xu, Thomas Strohmer

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2502.05684

要約:
How can we effectively remove or ''unlearn'' undesirable information, such as specific features or the influence of individual data points, from a learning outcome while minimizing utility loss and ensuring rigorous guarantees? We introduce a unified mathematical framework based on information-theoretic regularization to address both data point unlearning and feature unlearning. For data point unlearning, we introduce the $\textit{Marginal Unlearning Principle}$, an auditable and provable framework inspired by memory suppression studies in neuroscience. Moreover, we provide formal information-theoretic unlearning definition based on the proposed principle, named marginal unlearning, and provable guarantees on sufficiency and necessity of marginal unlearning to the existing approximate unlearning definitions. We then show the proposed framework provide natural solution to the marginal unlearning problems. For feature unlearning, the framework applies to deep learning with arbitrary training objectives. By combining flexibility in learning objectives with simplicity in regularization design, our approach is highly adaptable and practical for a wide range of machine learning and AI applications. From a mathematical perspective, we provide an unified analytic solution to the optimal feature unlearning problem with a variety of information-theoretic training objectives. Our theoretical analysis reveals intriguing connections between machine unlearning, information theory, optimal transport, and extremal sigma algebras. Numerical simulations support our theoretical finding.

#30 On the Closed-Form of Flow Matching: Generalization Does Not Arise from Target Stochasticity

著者: Quentin Bertrand, Anne Gagneux, Mathurin Massias, R\'emi Emonet

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2506.03719

要約:
Modern deep generative models can now produce high-quality synthetic samples that are often indistinguishable from real training data. A growing body of research aims to understand why recent methods, such as diffusion and flow matching techniques, generalize so effectively. Among the proposed explanations are the inductive biases of deep learning architectures and the stochastic nature of the conditional flow matching loss. In this work, we rule out the noisy nature of the loss as a key factor driving generalization in flow matching. First, we empirically show that in high-dimensional settings, the stochastic and closed-form versions of the flow matching loss yield nearly equivalent losses. Then, using state-of-the-art flow matching models on standard image datasets, we demonstrate that both variants achieve comparable statistical performance, with the surprising observation that using the closed-form can even improve performance.

#31 Revisiting Clustering of Neural Bandits: Selective Reinitialization for Mitigating Loss of Plasticity

著者: Zhiyuan Su, Sunhao Dai, Xiao Zhang

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2506.12389

要約:
Clustering of Bandits (CB) methods enhance sequential decision-making by grouping bandits into clusters based on similarity and incorporating cluster-level contextual information, demonstrating effectiveness and adaptability in applications like personalized streaming recommendations. However, when extending CB algorithms to their neural version (commonly referred to as Clustering of Neural Bandits, or CNB), they suffer from loss of plasticity, where neural network parameters become rigid and less adaptable over time, limiting their ability to adapt to non-stationary environments (e.g., dynamic user preferences in recommendation). To address this challenge, we propose Selective Reinitialization (SeRe), a novel bandit learning framework that dynamically preserves the adaptability of CNB algorithms in evolving environments. SeRe leverages a contribution utility metric to identify and selectively reset underutilized units, mitigating loss of plasticity while maintaining stable knowledge retention. Furthermore, when combining SeRe with CNB algorithms, the adaptive change detection mechanism adjusts the reinitialization frequency according to the degree of non-stationarity, ensuring effective adaptation without unnecessary resets. Theoretically, we prove that SeRe enables sublinear cumulative regret in piecewise-stationary environments, outperforming traditional CNB approaches in long-term performances. Extensive experiments on six real-world recommendation datasets demonstrate that SeRe-enhanced CNB algorithms can effectively mitigate the loss of plasticity with lower regrets, improving adaptability and robustness in dynamic settings.

#32 Knowledge Adaptation as Posterior Correction

著者: Mohammad Emtiyaz Khan

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2506.14262

要約:
Adaptation is the holy grail of intelligence, but even the best AI models lack the adaptability of toddlers. In spite of great progress, little is known about the mechanisms by which machines can learn to adapt as fast as humans and animals. Here, we cast adaptation as `correction' of old posteriors and show that a wide-variety of existing adaptation methods follow this very principle, including those used for continual learning, federated learning, unlearning, and model merging. In all these settings, more accurate posteriors often lead to smaller corrections and can enable faster adaptation. Posterior correction is derived by using the dual representation of the Bayesian Learning Rule of Khan and Rue (2023), where the interference between the old representation and new information is quantified by using the natural-gradient mismatch. We present many examples demonstrating how machines can learn to adapt quickly by using posterior correction.

#33 Bridging Human and LLM Judgments: Understanding and Narrowing the Gap

著者: Felipe Maia Polo, Xinhe Wang, Mikhail Yurochkin, Gongjun Xu, Moulinath Banerjee, Yuekai Sun

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2508.12792

要約:
Large language models are increasingly used as judges (LLM-as-a-judge) to evaluate model outputs at scale, but their assessments often diverge systematically from human judgments. We present Bridge, a unified statistical framework that explicitly bridges human and LLM evaluations under both absolute scoring and pairwise comparison paradigms. Bridge posits a latent human preference score for each prompt-response pair and models LLM deviations as linear transformations of covariates that capture sources of discrepancies. This offers a simple and principled framework for refining LLM ratings and characterizing systematic discrepancies between humans and LLMs. We provide an efficient fitting algorithm with asymptotic guarantees for statistical inference. Using six LLM judges and two benchmarks (BigGen Bench and Chatbot Arena), Bridge achieves higher agreement with human ratings (accuracy, calibration, and KL divergence) and exposes systematic human-LLM gaps.

#34 Forest tree species classification and entropy-derived uncertainty mapping using extreme gradient boosting and Sentinel-1/2 satellite data

著者: Abdulhakim M. Abdi, Fan Wang

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.18228

要約:
We present a new 10-meter map of dominant tree species in Swedish forests accompanied by pixel-level uncertainty estimates. The tree species classification is based on spatiotemporal metrics derived from Sentinel-1 and Sentinel-2 satellite data, combined with field observations from the Swedish National Forest Inventory. We apply an extreme gradient boosting model with Bayesian optimization to relate field observations to satellite-derived features and generate the final species map. Classification uncertainty is quantified using Shannon's entropy of the predicted class probabilities, which provide a spatially explicit measure of model confidence. The final model achieved an overall accuracy of 85% (F1 score = 0.82, Matthews correlation coefficient = 0.81), and mapped species distributions showed strong agreement with official forest statistics (Spearman's rho = 0.94).

#35 GLASS Flows: Transition Sampling for Alignment of Flow and Diffusion Models

diffusion

著者: Peter Holderrieth, Uriel Singer, Tommi Jaakkola, Ricky T. Q. Chen, Yaron Lipman, Brian Karrer

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.25170

要約:
The performance of flow matching and diffusion models can be greatly improved at inference time using reward alignment algorithms, yet efficiency remains a major limitation. While several algorithms were proposed, we demonstrate that a common bottleneck is the sampling method these algorithms rely on: many algorithms require to sample Markov transitions via SDE sampling, which is significantly less efficient and often less performant than ODE sampling. To remove this bottleneck, we introduce GLASS Flows, a new sampling paradigm that simulates a "flow matching model within a flow matching model" to sample Markov transitions. As we show in this work, this "inner" flow matching model can be retrieved from a pre-trained model without any re-training, combining the efficiency of ODEs with the stochastic evolution of SDEs. On large-scale text-to-image models, we show that GLASS Flows eliminate the trade-off between stochastic evolution and efficiency. Combined with Feynman-Kac Steering, GLASS Flows improve state-of-the-art performance in text-to-image generation, making it a simple, drop-in solution for inference-time scaling of flow and diffusion models.

#36 The Algorithmic Phase Transition in Correlated Spiked Models

著者: Zhangsong Li

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.06040

要約:
We study the computational task of detecting and estimating correlated signals in a pair of spiked matrices $$ X=\tfrac{\lambda}{\sqrt{n}} xu^{\top}+W, \quad Y=\tfrac{\mu}{\sqrt{n}} yv^{\top}+Z $$ where the spikes $x,y$ have correlation $\rho$. Specifically, we consider two fundamental models: (1) Correlated spiked Wigner model with signal-to-noise ratio $\lambda,\mu$; (2) Correlated spiked $n*N$ Wishart (covariance) model with signal-to-noise ratio $\sqrt\lambda,\sqrt\mu$. We propose an efficient detection and estimation algorithm based on counting a specific family of edge-decorated cycles. The algorithm's performance is governed by the function $$ F(\lambda,\mu,\rho,\gamma)=\max\Big\{ \frac{ \lambda^2 }{ \gamma }, \frac{ \mu^2 }{ \gamma }, \frac{ \lambda^2 \rho^2 }{ \gamma-\lambda^2+\lambda^2 \rho^2 } + \frac{ \mu^2 \rho^2 }{ \gamma-\mu^2+\mu^2 \rho^2 } \Big\} \,. $$ We prove our algorithm succeeds for the correlated spiked Wigner model whenever $F(\lambda,\mu,\rho,1)>1$, and succeeds for the correlated spiked Wishart model whenever $F(\lambda,\mu,\rho,\tfrac{n}{N})>1$. Our result shows that an algorithm can leverage the correlation between the spikes to detect and estimate the signals even in regimes where efficiently recovering either $x$ from ${X}$ alone or $y$ from ${Y}$ alone is believed to be computationally infeasible. We complement our algorithmic results with evidence for a matching computational lower bound. In particular, we prove that when $F(\lambda,\mu,\rho,1)<1$ for the correlated spiked Wigner model and when $F(\lambda,\mu,\rho,\tfrac{n}{N})<1$ for the spiked Wishart model, all algorithms based on low-degree polynomials fails to distinguish $({X},{Y})$ with two independent noise matrices. This strongly suggests that $F=1$ is the precise computation threshold for our models.

#37 Non-stationary and Varying-discounting Markov Decision Processes for Reinforcement Learning

著者: Zhizuo Chen, Theodore T. Allen

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.17598

要約:
Algorithms developed under stationary Markov Decision Processes (MDPs) often face challenges in non-stationary environments, and infinite-horizon formulations may not directly apply to finite-horizon tasks. To address these limitations, we introduce the Non-stationary and Varying-discounting MDP (NVMDP) framework, which naturally accommodates non-stationarity and allows discount rates to vary with time and transitions. Infinite-horizon, stationary MDPs emerge as special cases of NVMDPs for identifying an optimal policy, and finite-horizon MDPs are also subsumed within the NVMDP formulations. Moreover, NVMDPs provide a flexible mechanism to shape optimal policies, without altering the state space, action space, or the reward structure. We establish the theoretical foundations of NVMDPs, including assumptions, state- and action-value formulation and recursion, matrix representation, optimality conditions, and policy improvement under finite state and action spaces. Building on these results, we adapt dynamic programming and generalized Q-learning algorithms to NVMDPs, along with formal convergence proofs. For problems requiring function approximation, we extend the Policy Gradient Theorem and the policy improvement bound in Trust Region Policy Optimization (TRPO), offering proofs in both scalar and matrix forms. Empirical evaluations in a non-stationary gridworld environment demonstrate that NVMDP-based algorithms successfully recover optimal trajectories under multiple reward and discounting schemes, whereas original Q-learning fails. These results collectively show that NVMDPs provide a theoretically sound and practically effective framework for reinforcement learning, requiring only minor algorithmic modifications while enabling robust handling of non-stationarity and explicit optimal policy shaping.

#38 On Statistical Inference for High-Dimensional Binary Time Series

著者: Dehao Dai, Yunyi Zhang

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.00338

要約:
The analysis of non-real-valued data, such as binary time series, has attracted great interest in recent years. This manuscript proposes a post-selection estimator for estimating the coefficient matrices of a high-dimensional generalized binary vector autoregressive process and establishes a Gaussian approximation theorem for the proposed estimator. Furthermore, it introduces a second-order wild bootstrap algorithm to enable statistical inference on the coefficient matrices. Numerical studies and empirical applications demonstrate the good finite-sample performance of the proposed method.

#39 Using physics-inspired Singular Learning Theory to understand grokking & other phase transitions in modern neural networks

著者: Anish Lakkapragada

公開日: Wed, 03 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.00686

要約:
Classical statistical inference and learning theory often fail to explain the success of modern neural networks. A key reason is that these models are non-identifiable (singular), violating core assumptions behind PAC bounds and asymptotic normality. Singular learning theory (SLT), a physics-inspired framework grounded in algebraic geometry, has gained popularity for its ability to close this theory-practice gap. In this paper, we empirically study SLT in toy settings relevant to interpretability and phase transitions. First, we understand the SLT free energy $\mathcal{F}_n$ by testing an Arrhenius-style rate hypothesis using both a grokking modulo-arithmetic model and Anthropic's Toy Models of Superposition. Second, we understand the local learning coefficient $\lambda_{\alpha}$ by measuring how it scales with problem difficulty across several controlled network families (polynomial regressors, low-rank linear networks, and low-rank autoencoders). Our experiments recover known scaling laws while others yield meaningful deviations from theoretical expectations. Overall, our paper illustrates the many merits of SLT for understanding neural network phase transitions, and poses open research questions for the field.

stat.ML updates on arXiv.org

📋 論文タイトル一覧