arXiv論文一覧 - stat.ML updates on arXiv.org

#1 When Features Beat Noise: A Feature Selection Technique Through Noise-Based Hypothesis Testing

著者: Mousam Sinha, Tirtha Sarathi Ghosh, Ridam Pal

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.20851

要約:
Feature selection has remained a daunting challenge in machine learning and artificial intelligence, where increasingly complex, high-dimensional datasets demand principled strategies for isolating the most informative predictors. Despite widespread adoption, many established techniques suffer from notable limitations; some incur substantial computational cost, while others offer no definite statistical driven stopping criteria or assesses the significance of their importance scores. A common heuristic approach introduces multiple random noise features and retains all predictors ranked above the strongest noise feature. Although intuitive, this strategy lacks theoretical justification and depends heavily on heuristics. This paper proposes a novel feature selection method that addresses these limitations. Our approach introduces multiple random noise features and evaluates each feature's importance against the maximum importance value among these noise features incorporating a non-parametric bootstrap-based hypothesis testing framework to establish a solid theoretical foundation. We establish the conceptual soundness of our approach through statistical derivations that articulate the principles guiding the design of our algorithm. To evaluate its reliability, we generated simulated datasets under controlled statistical settings and benchmarked performance against Boruta and Knockoff-based methods, observing consistently stronger recovery of meaningful signal. As a demonstration of practical utility, we applied the technique across diverse real-world datasets, where it surpassed feature selection techniques including Boruta, RFE, and Extra Trees. Hence, the method emerges as a robust algorithm for principled feature selection, enabling the distillation of informative predictors that support reliable inference, enhanced predictive performance, and efficient computation.

#2 Deep Learning as a Convex Paradigm of Computation: Minimizing Circuit Size with ResNets

著者: Arthur Jacot

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.20888

要約:
This paper argues that DNNs implement a computational Occam's razor -- finding the `simplest' algorithm that fits the data -- and that this could explain their incredible and wide-ranging success over more traditional statistical methods. We start with the discovery that the set of real-valued function $f$ that can be $\epsilon$-approximated with a binary circuit of size at most $c\epsilon^{-\gamma}$ becomes convex in the `Harder than Monte Carlo' (HTMC) regime, when $\gamma>2$, allowing for the definition of a HTMC norm on functions. In parallel one can define a complexity measure on the parameters of a ResNets (a weighted $\ell_1$ norm of the parameters), which induce a `ResNet norm' on functions. The HTMC and ResNet norms can then be related by an almost matching sandwich bound. Thus minimizing this ResNet norm is equivalent to finding a circuit that fits the data with an almost minimal number of nodes (within a power of 2 of being optimal). ResNets thus appear as an alternative model for computation of real functions, better adapted to the HTMC regime and its convexity.

#3 Geometric Calibration and Neutral Zones for Uncertainty-Aware Multi-Class Classification

著者: Soumojit Das, Nairanjana Dasgupta, Prashanta Dutta

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.20960

要約:
Modern artificial intelligence systems make critical decisions yet often fail silently when uncertain. We develop a geometric framework for post-hoc calibration of neural network probability outputs, treating probability vectors as points on the $(c-1)$-dimensional probability simplex equipped with the Fisher--Rao metric. Our approach yields Additive Log-Ratio (ALR) calibration maps that reduce exactly to Platt scaling for binary problems (Proposition~1) while extending naturally to multi-class settings -- providing a principled generalization that existing methods lack. Complementing calibration, we define geometric reliability scores based on Fisher--Rao distance and construct neutral zones for principled deferral of uncertain predictions. Theoretical contributions include: (i) consistency of the calibration estimator at rate $O_p(n^{-1/2})$ via M-estimation theory (Theorem~1), and (ii) tight concentration bounds for reliability scores with explicit sub-Gaussian parameters enabling sample size calculations for validation set design (Theorem~2). We conjecture Neyman--Pearson optimality of our neutral zone construction based on connections to Bhattacharyya coefficients. Empirical validation on Adeno-Associated Virus classification demonstrates that the two-stage framework (calibration followed by reliability-based deferral) captures 72.5\% of errors while deferring 34.5\% of samples. Notably, this operational gain is achievable with any well-calibrated probability output; the contribution of geometric calibration lies in its theoretical foundations rather than empirical superiority over simpler alternatives. This work bridges information geometry and statistical learning, offering formal guarantees relevant to applications requiring rigorous validation.

#4 Nonconvex Penalized LAD Estimation in Partial Linear Models with DNNs: Asymptotic Analysis and Proximal Algorithms

著者: Lechen Feng, Haoran Li, Lucky Li, Xingqiu Zhao

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.21115

要約:
This paper investigates the partial linear model by Least Absolute Deviation (LAD) regression. We parameterize the nonparametric term using Deep Neural Networks (DNNs) and formulate a penalized LAD problem for estimation. Specifically, our model exhibits the following challenges. First, the regularization term can be nonconvex and nonsmooth, necessitating the introduction of infinite dimensional variational analysis and nonsmooth analysis into the asymptotic normality discussion. Second, our network must expand (in width, sparsity level and depth) as more samples are observed, thereby introducing additional difficulties for theoretical analysis. Third, the oracle of the proposed estimator is itself defined through a ultra high-dimensional, nonconvex, and discontinuous optimization problem, which already entails substantial computational and theoretical challenges. Under such the challenges, we establish the consistency, convergence rate, and asymptotic normality of the estimator. Furthermore, we analyze the oracle problem itself and its continuous relaxation. We study the convergence of a proximal subgradient method for both formulations, highlighting their structural differences lead to distinct computational subproblems along the iterations. In particular, the relaxed formulation admits significantly cheaper proximal updates, reflecting an inherent trade-off between statistical accuracy and computational tractability.

#5 Maxitive Donsker-Varadhan Formulation for Possibilistic Variational Inference

著者: Jasraj Singh, Shelvia Wongso, Jeremie Houssineau, Badr-Eddine Ch\'erief-Abdellatif

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.21223

要約:
Variational inference (VI) is a cornerstone of modern Bayesian learning, enabling approximate inference in complex models that would otherwise be intractable. However, its formulation depends on expectations and divergences defined through high-dimensional integrals, often rendering analytical treatment impossible and necessitating heavy reliance on approximate learning and inference techniques. Possibility theory, an imprecise probability framework, allows to directly model epistemic uncertainty instead of leveraging subjective probabilities. While this framework provides robustness and interpretability under sparse or imprecise information, adapting VI to the possibilistic setting requires rethinking core concepts such as entropy and divergence, which presuppose additivity. In this work, we develop a principled formulation of possibilistic variational inference and apply it to a special class of exponential-family functions, highlighting parallels with their probabilistic counterparts and revealing the distinctive mathematical structures of possibility theory.

#6 Phase Transition for Stochastic Block Model with more than $\sqrt{n}$ Communities (II)

著者: Alexandra Carpentier, Christophe Giraud, Nicolas Verzelen

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.21526

要約:
A fundamental theoretical question in network analysis is to determine under which conditions community recovery is possible in polynomial time in the Stochastic Block Model (SBM). When the number $K$ of communities remains smaller than $\sqrt{n}$ --where $n$ denotes the number of nodes--, non-trivial community recovery is possible in polynomial time above, and only above, the Kesten--Stigum (KS) threshold, originally postulated using arguments from statistical physics. When $K \geq \sqrt{n}$, Chin, Mossel, Sohn, and Wein recently proved that, in the \emph{sparse regime}, community recovery in polynomial time is achievable below the KS threshold by counting non-backtracking paths. This finding led them to postulate a new threshold for the many-communities regime $K \geq \sqrt{n}$. Subsequently, Carpentier, Giraud, and Verzelen established the failure of low-degree polynomials below this new threshold across all density regimes, and demonstrated successful recovery above the threshold in certain moderately sparse settings. While these results provide strong evidence that, in the many community setting, the computational barrier lies at the threshold proposed in~Chin et al., the question of achieving recovery above this threshold still remains open in most density regimes. The present work is a follow-up to~Carpentier et al., in which we prove Conjecture~1.4 stated therein by: \\ 1- Constructing a family of motifs satisfying specific structural properties; and\\ 2- Proving that community recovery is possible above the proposed threshold by counting such motifs.\\ Our results complete the picture of the computational barrier for community recovery in the SBM with $K \geq \sqrt{n}$ communities. They also indicate that, in moderately sparse regimes, the optimal algorithms appear to be fundamentally different from spectral methods.

#7 On Evolution-Based Models for Experimentation Under Interference

著者: Sadegh Shirani, Mohsen Bayati

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.21675

要約:
Causal effect estimation in networked systems is central to data-driven decision making. In such settings, interventions on one unit can spill over to others, and in complex physical or social systems, the interaction pathways driving these interference structures remain largely unobserved. We argue that for identifying population-level causal effects, it is not necessary to recover the exact network structure; instead, it suffices to characterize how those interactions contribute to the evolution of outcomes. Building on this principle, we study an evolution-based approach that investigates how outcomes change across observation rounds in response to interventions, hence compensating for missing network information. Using an exposure-mapping perspective, we give an axiomatic characterization of when the empirical distribution of outcomes follows a low-dimensional recursive equation, and identify minimal structural conditions under which such evolution mappings exist. We frame this as a distributional counterpart to difference-in-differences. Rather than assuming parallel paths for individual units, it exploits parallel evolution patterns across treatment scenarios to estimate counterfactual trajectories. A key insight is that treatment randomization plays a role beyond eliminating latent confounding; it induces an implicit sampling from hidden interference channels, enabling consistent learning about heterogeneous spillover effects. We highlight causal message passing as an instantiation of this method in dense networks while extending to more general interference structures, including influencer networks where a small set of units drives most spillovers. Finally, we discuss the limits of this approach, showing that strong temporal trends or endogenous interference can undermine identification.

#8 Pretraining Transformer-Based Models on Diffusion-Generated Synthetic Graphs for Alzheimer's Disease Prediction

diffusion

著者: Abolfazl Moslemi, Hossein Peyvandi

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.20704

要約:
Early and accurate detection of Alzheimer's disease (AD) is crucial for enabling timely intervention and improving outcomes. However, developing reliable machine learning (ML) models for AD diagnosis is challenging due to limited labeled data, multi-site heterogeneity, and class imbalance. We propose a Transformer-based diagnostic framework that combines diffusion-based synthetic data generation with graph representation learning and transfer learning. A class-conditional denoising diffusion probabilistic model (DDPM) is trained on the real-world NACC dataset to generate a large synthetic cohort that mirrors multimodal clinical and neuroimaging feature distributions while balancing diagnostic classes. Modality-specific Graph Transformer encoders are first pretrained on this synthetic data to learn robust, class-discriminative representations and are then frozen while a neural classifier is trained on embeddings from the original NACC data. We quantify distributional alignment between real and synthetic cohorts using metrics such as Maximum Mean Discrepancy (MMD), Frechet distance, and energy distance, and complement discrimination metrics with calibration and fixed-specificity sensitivity analyses. Empirically, our framework outperforms standard baselines, including early and late fusion deep neural networks and the multimodal graph-based model MaGNet, yielding higher AUC, accuracy, sensitivity, and specificity under subject-wise cross-validation on NACC. These results show that diffusion-based synthetic pretraining with Graph Transformers can improve generalization in low-sample, imbalanced clinical prediction settings.

#9 Solving Diffusion Inverse Problems with Restart Posterior Sampling

diffusion

著者: Bilal Ahmed, Joseph G. Makin

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.20705

要約:
Inverse problems are fundamental to science and engineering, where the goal is to infer an underlying signal or state from incomplete or noisy measurements. Recent approaches employ diffusion models as powerful implicit priors for such problems, owing to their ability to capture complex data distributions. However, existing diffusion-based methods for inverse problems often rely on strong approximations of the posterior distribution, require computationally expensive gradient backpropagation through the score network, or are restricted to linear measurement models. In this work, we propose Restart for Posterior Sampling (RePS), a general and efficient framework for solving both linear and non-linear inverse problems using pre-trained diffusion models. RePS builds on the idea of restart-based sampling, previously shown to improve sample quality in unconditional diffusion, and extends it to posterior inference. Our method employs a conditioned ODE applicable to any differentiable measurement model and introduces a simplified restart strategy that contracts accumulated approximation errors during sampling. Unlike some of the prior approaches, RePS avoids backpropagation through the score network, substantially reducing computational cost. We demonstrate that RePS achieves faster convergence and superior reconstruction quality compared to existing diffusion-based baselines across a range of inverse problems, including both linear and non-linear settings.

#10 A Set of Rules for Model Validation

著者: Jos\'e Camacho

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.20711

要約:
The validation of a data-driven model is the process of assessing the model's ability to generalize to new, unseen data in the population of interest. This paper proposes a set of general rules for model validation. These rules are designed to help practitioners create reliable validation plans and report their results transparently. While no validation scheme is flawless, these rules can help practitioners ensure their strategy is sufficient for practical use, openly discuss any limitations of their validation strategy, and report clear, comparable performance metrics.

#11 Constrained deep learning for pricing and hedging european options in incomplete markets

著者: Nicolas Baradel

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.20837

要約:
In incomplete financial markets, pricing and hedging European options lack a unique no-arbitrage solution due to unhedgeable risks. This paper introduces a constrained deep learning approach to determine option prices and hedging strategies that minimize the Profit and Loss (P&L) distribution around zero. We employ a single neural network to represent the option price function, with its gradient serving as the hedging strategy, optimized via a loss function enforcing the self-financing portfolio condition. A key challenge arises from the non-smooth nature of option payoffs (e.g., vanilla calls are non-differentiable at-the-money, while digital options are discontinuous), which conflicts with the inherent smoothness of standard neural networks. To address this, we compare unconstrained networks against constrained architectures that explicitly embed the terminal payoff condition, drawing inspiration from PDE-solving techniques. Our framework assumes two tradable assets: the underlying and a liquid call option capturing volatility dynamics. Numerical experiments evaluate the method on simple options with varying non-smoothness, the exotic Equinox option, and scenarios with market jumps for robustness. Results demonstrate superior P&L distributions, highlighting the efficacy of constrained networks in handling realistic payoffs. This work advances machine learning applications in quantitative finance by integrating boundary constraints, offering a practical tool for pricing and hedging in incomplete markets.

#12 Selecting Belief-State Approximations in Simulators with Latent States

著者: Nan Jiang

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.20870

要約:
State resetting is a fundamental but often overlooked capability of simulators. It supports sample-based planning by allowing resets to previously encountered simulation states, and enables calibration of simulators using real data by resetting to states observed in real-system traces. While often taken for granted, state resetting in complex simulators can be nontrivial: when the simulator comes with latent variables (states), state resetting requires sampling from the posterior over the latent state given the observable history, a.k.a. the belief state (Silver and Veness, 2010). While exact sampling is often infeasible, many approximate belief-state samplers can be constructed, raising the question of how to select among them using only sampling access to the simulator. In this paper, we show that this problem reduces to a general conditional distribution-selection task and develop a new algorithm and analysis under sampling-only access. Building on this reduction, the belief-state selection problem admits two different formulations: latent state-based selection, which directly targets the conditional distribution of the latent state, and observation-based selection, which targets the induced distribution over the observation. Interestingly, these formulations differ in how their guarantees interact with the downstream roll-out methods: perhaps surprisingly, observation-based selection may fail under the most natural roll-out method (which we call Single-Reset) but enjoys guarantees under the less conventional alternative (which we call Repeated-Reset). Together with discussion on issues such as distribution shift and the choice of sampling policies, our paper reveals a rich landscape of algorithmic choices, theoretical nuances, and open questions, in this seemingly simple problem.

#13 Probabilistic Hash Embeddings for Online Learning of Categorical Features

著者: Aodong Li, Abishek Sankararaman, Balakrishnan Narayanaswamy

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.20893

要約:
We study streaming data with categorical features where the vocabulary of categorical feature values is changing and can even grow unboundedly over time. Feature hashing is commonly used as a pre-processing step to map these categorical values into a feature space of fixed size before learning their embeddings. While these methods have been developed and evaluated for offline or batch settings, in this paper we consider online settings. We show that deterministic embeddings are sensitive to the arrival order of categories and suffer from forgetting in online learning, leading to performance deterioration. To mitigate this issue, we propose a probabilistic hash embedding (PHE) model that treats hash embeddings as stochastic and applies Bayesian online learning to learn incrementally from data. Based on the structure of PHE, we derive a scalable inference algorithm to learn model parameters and infer/update the posteriors of hash embeddings and other latent variables. Our algorithm (i) can handle an evolving vocabulary of categorical items, (ii) is adaptive to new items without forgetting old items, (iii) is implementable with a bounded set of parameters that does not grow with the number of distinct observed values on the stream, and (iv) is invariant to the item arrival order. Experiments in classification, sequence modeling, and recommendation systems in online learning setups demonstrate the superior performance of PHE while maintaining high memory efficiency (consumes as low as 2~4 memory of a one-hot embedding table). Supplementary materials are at https://github.com/aodongli/probabilistic-hash-embeddings

#14 Operationalizing Quantized Disentanglement

著者: Vitoria Barin-Pacela, Kartik Ahuja, Simon Lacoste-Julien, Pascal Vincent

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.20927

要約:
Recent theoretical work established the unsupervised identifiability of quantized factors under any diffeomorphism. The theory assumes that quantization thresholds correspond to axis-aligned discontinuities in the probability density of the latent factors. By constraining a learned map to have a density with axis-aligned discontinuities, we can recover the quantization of the factors. However, translating this high-level principle into an effective practical criterion remains challenging, especially under nonlinear maps. Here, we develop a criterion for unsupervised disentanglement by encouraging axis-aligned discontinuities. Discontinuities manifest as sharp changes in the estimated density of factors and form what we call cliffs. Following the definition of independent discontinuities from the theory, we encourage the location of the cliffs along a factor to be independent of the values of the other factors. We show that our method, Cliff, outperforms the baselines on all disentanglement benchmarks, demonstrating its effectiveness in unsupervised disentanglement.

#15 Wavelet-Guided Water-Level Estimation for ISAC

著者: Ayoob Salari, Kai Wu, Khawaja Fahad Masood, Y. Jay Guo, J. Andrew Zhang

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.20936

要約:
Real-time water-level monitoring across many locations is vital for flood response, infrastructure management, and environmental forecasting. Yet many sensing methods rely on fixed instruments - acoustic, radar, camera, or pressure probes - that are costly to install and maintain and are vulnerable during extreme events. We propose a passive, low-cost water-level tracking scheme that uses only LTE downlink power metrics reported by commodity receivers. The method extracts per-antenna RSRP, RSSI, and RSRQ, applies a continuous wavelet transform (CWT) to the RSRP to isolate the semidiurnal tide component, and forms a summed-coefficient signature that simultaneously marks high/low tide (tide-turn times) and tracks the tide-rate (flow speed) over time. These wavelet features guide a lightweight neural network that learns water-level changes over time from a short training segment. Beyond a single serving base station, we also show a multi-base-station cooperative mode: independent CWTs are computed per carrier and fused by a robust median to produce one tide-band feature that improves stability and resilience to local disturbances. Experiments over a 420 m river path under line-of-sight conditions achieve root-mean-square and mean-absolute errors of 0.8 cm and 0.5 cm, respectively. Under a non-line-of-sight setting with vegetation and vessel traffic, the same model transfers successfully after brief fine-tuning, reaching 1.7 cm RMSE and 0.8 cm MAE. Unlike CSI-based methods, the approach needs no array calibration and runs on standard hardware, making wide deployment practical. When signals from multiple base stations are available, fusion further improves robustness.

#16 Estimating Ising Models in Total Variation Distance

著者: Constantinos Daskalakis, Vardis Kandiros, Rui Yao

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.21008

要約:
We consider the problem of estimating Ising models over $n$ variables in Total Variation (TV) distance, given $l$ independent samples from the model. While the statistical complexity of the problem is well-understood [DMR20], identifying computationally and statistically efficient algorithms has been challenging. In particular, remarkable progress has occurred in several settings, such as when the underlying graph is a tree [DP21, BGPV21], when the entries of the interaction matrix follow a Gaussian distribution [GM24, CK24], or when the bulk of its eigenvalues lie in a small interval [AJK+24, KLV24], but no unified framework for polynomial-time estimation in TV exists so far. Our main contribution is a unified analysis of the Maximum Pseudo-Likelihood Estimator (MPLE) for two general classes of Ising models. The first class includes models that have bounded operator norm and satisfy the Modified Log-Sobolev Inequality (MLSI), a functional inequality that was introduced to study the convergence of the associated Glauber dynamics to stationarity. In the second class of models, the interaction matrix has bounded infinity norm (or bounded width), which is the most common assumption in the literature for structure learning of Ising models. We show how our general results for these classes yield polynomial-time algorithms and optimal or near-optimal sample complexity guarantees in a variety of settings. Our proofs employ a variety of tools from tensorization inequalities to measure decompositions and concentration bounds.

#17 Breaking the Safety-Capability Tradeoff: Reinforcement Learning with Verifiable Rewards Maintains Safety Guardrails in LLMs

著者: Dongkyu Derek Cho, Huan Song, Arijit Ghosh Chowdhury, Haotian An, Yawei Wang, Rohit Thekkanal, Negin Sokhandan, Sharlina Keshava, Hannah Marlowe

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.21050

要約:
Fine-tuning large language models (LLMs) for downstream tasks typically exhibit a fundamental safety-capability tradeoff, where improving task performance degrades safety alignment even on benign datasets. This degradation persists across standard approaches including supervised finetuning (SFT) and reinforcement learning from human feedback (RLHF). While reinforcement learning with verifiable rewards (RLVR) has emerged as a promising alternative that optimizes models on objectively measurable tasks, its safety implications remain unexplored. We present the first comprehensive theoretical and empirical analysis of safety properties in RLVR. Theoretically, we derive upper bounds on safety drift under KL-constrained optimization and prove conditions under which safety degradation is eliminated. Empirically, we conduct extensive experiments across five adversarial safety benchmarks, demonstrating that RLVR can simultaneously enhance reasoning capabilities while maintaining or improving safety guardrails. Our comprehensive ablation studies examine the effects of optimization algorithms, model scale, and task domains. Our findings challenge the prevailing assumption of an inevitable safety capability trade-off, and establish that a specific training methodology can achieve both objectives simultaneously, providing insights for the safe deployment of reasoning-capable LLMs.

#18 Zipf Distributions from Two-Stage Symbolic Processes: Stability Under Stochastic Lexical Filtering

著者: Vladimir Berman

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.21060

要約:
Zipf's law in language lacks a definitive origin, debated across fields. This study explains Zipf-like behavior using geometric mechanisms without linguistic elements. The Full Combinatorial Word Model (FCWM) forms words from a finite alphabet, generating a geometric distribution of word lengths. Interacting exponential forces yield a power-law rank-frequency curve, determined by alphabet size and blank symbol probability. Simulations support predictions, matching English, Russian, and mixed-genre data. The symbolic model suggests Zipf-type laws arise from geometric constraints, not communicative efficiency.

#19 G-Net: A Provably Easy Construction of High-Accuracy Random Binary Neural Networks

著者: Alireza Aghasi, Nicholas Marshall, Saeid Pourmand, Wyatt Whiting

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.21063

要約:
We propose a novel randomized algorithm for constructing binary neural networks with tunable accuracy. This approach is motivated by hyperdimensional computing (HDC), which is a brain-inspired paradigm that leverages high-dimensional vector representations, offering efficient hardware implementation and robustness to model corruptions. Unlike traditional low-precision methods that use quantization, we consider binary embeddings of data as points in the hypercube equipped with the Hamming distance. We propose a novel family of floating-point neural networks, G-Nets, which are general enough to mimic standard network layers. Each floating-point G-Net has a randomized binary embedding, an embedded hyperdimensional (EHD) G-Net, that retains the accuracy of its floating-point counterparts, with theoretical guarantees, due to the concentration of measure. Empirically, our binary models match convolutional neural network accuracies and outperform prior HDC models by large margins, for example, we achieve almost 30\% higher accuracy on CIFAR-10 compared to prior HDC models. G-Nets are a theoretically justified bridge between neural networks and randomized binary neural networks, opening a new direction for constructing robust binary/quantized deep learning models. Our implementation is available at https://github.com/GNet2025/GNet.

#20 Statistical Inference for Manifold Similarity and Alignability across Noisy High-Dimensional Datasets

著者: Hongrui Chen, Rong Ma

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.21074

要約:
The rapid growth of high-dimensional datasets across various scientific domains has created a pressing need for new statistical methods to compare distributions supported on their underlying structures. Assessing similarity between datasets whose samples lie on low-dimensional manifolds requires robust techniques capable of separating meaningful signal from noise. We propose a principled framework for statistical inference of similarity and alignment between distributions supported on manifolds underlying high-dimensional datasets in the presence of heterogeneous noise. The key idea is to link the low-rank structure of observed data matrices to their underlying manifold geometry. By analyzing the spectrum of the sample covariance under a manifold signal-plus-noise model, we develop a scale-invariant distance measure between datasets based on their principal variance structures. We further introduce a consistent estimator for this distance and a statistical test for manifold alignability, and establish their asymptotic properties using random matrix theory. The proposed framework accommodates heterogeneous noise across datasets and offers an efficient, theoretically grounded approach for comparing high-dimensional datasets with low-dimensional manifold structures. Through extensive simulations and analyses of multi-sample single-cell datasets, we demonstrate that our method achieves superior robustness and statistical power compared with existing approaches.

#21 How to Correctly Report LLM-as-a-Judge Evaluations

著者: Chungpa Lee, Thomas Zeng, Jongwon Jeong, Jy-yong Sohn, Kangwook Lee

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.21140

要約:
Large language models (LLMs) are increasingly used as evaluators in lieu of humans. While scalable, their judgments are noisy due to imperfect specificity and sensitivity of LLMs, leading to biased accuracy estimates. Although bias-correction methods exist, they are underutilized in LLM research and typically assume exact knowledge of the model's specificity and sensitivity. Furthermore, in general we only have estimates of these values and it is not well known how to properly construct confidence intervals using only estimates. This work presents a simple plug-in framework that corrects such bias and constructs confidence intervals reflecting uncertainty from both test and calibration dataset, enabling practical and statistically sound LLM-based evaluation. Additionally, to reduce uncertainty in the accuracy estimate, we introduce an adaptive algorithm that efficiently allocates calibration sample sizes.

#22 Estimation in high-dimensional linear regression: Post-Double-Autometrics as an alternative to Post-Double-Lasso

著者: Sullivan Hu\'e, S\'ebastien Laurent, Ulrich Aiounou, Emmanuel Flachaire

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.21257

要約:
Post-Double-Lasso is becoming the most popular method for estimating linear regression models with many covariates when the purpose is to obtain an accurate estimate of a parameter of interest, such as an average treatment effect. However, this method can suffer from substantial omitted variable bias in finite sample. We propose a new method called Post-Double-Autometrics, which is based on Autometrics, and show that this method outperforms Post-Double-Lasso. Its use in a standard application of economic growth sheds new light on the hypothesis of convergence from poor to rich economies.

#23 A Dynamics-Informed Gaussian Process Framework for 2D Stochastic Navier-Stokes via Quasi-Gaussianity

著者: Boumediene Hamzi, Houman Owhadi

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.21281

要約:
The recent proof of quasi-Gaussianity for the 2D stochastic Navier--Stokes (SNS) equations by Coe, Hairer, and Tolomeo establishes that the system's unique invariant measure is equivalent (mutually absolutely continuous) to the Gaussian measure of its corresponding linear Ornstein--Uhlenbeck (OU) process. While Gaussian process (GP) frameworks are increasingly used for fluid dynamics, their priors are often chosen for convenience rather than being rigorously justified by the system's long-term dynamics. In this work, we bridge this gap by introducing a probabilistic framework for 2D SNS built directly upon this theoretical foundation. We construct our GP prior precisely from the stationary covariance of the linear OU model, which is explicitly defined by the forcing spectrum and dissipation. This provides a principled, GP prior with rigorous long-time dynamical justification for turbulent flows, bridging SPDE theory and practical data assimilation.

#24 Some aspects of robustness in modern Markov Chain Monte Carlo

著者: Sam Power, Giorgos Vasdekis

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.21563

要約:
Markov Chain Monte Carlo (MCMC) is a flexible approach to approximate sampling from intractable probability distributions, with a rich theoretical foundation and comprising a wealth of exemplar algorithms. While the qualitative correctness of MCMC algorithms is often easy to ensure, their practical efficiency is contingent on the `target' distribution being reasonably well-behaved. In this work, we concern ourself with the scenario in which this good behaviour is called into question, reviewing an emerging line of work on `robust' MCMC algorithms which can perform acceptably even in the face of certain pathologies. We focus on two particular pathologies which, while simple, can already have dramatic effects on standard `local' algorithms. The first is roughness, whereby the target distribution varies so rapidly that the numerical stability of the algorithm is tenuous. The second is flatness, whereby the landscape of the target distribution is instead so barren and uninformative that one becomes lost in uninteresting parts of the state space. In each case, we formulate the pathology in concrete terms, review a range of proposed algorithmic remedies to the pathology, and outline promising directions for future research.

#25 Bridging the Unavoidable A Priori: A Framework for Comparative Causal Modeling

著者: Peter S. Hovmand, Kari O'Donnell, Callie Ogland-Hand, Brian Biroscak, Douglas D. Gunzler

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.21636

要約:
AI/ML models have rapidly gained prominence as innovations for solving previously unsolved problems and their unintended consequences from amplifying human biases. Advocates for responsible AI/ML have sought ways to draw on the richer causal models of system dynamics to better inform the development of responsible AI/ML. However, a major barrier to advancing this work is the difficulty of bringing together methods rooted in different underlying assumptions (i.e., Dana Meadow's "the unavoidable a priori"). This paper brings system dynamics and structural equation modeling together into a common mathematical framework that can be used to generate systems from distributions, develop methods, and compare results to inform the underlying epistemology of system dynamics for data science and AI/ML applications.

#26 Factor-Assisted Federated Learning for Personalized Optimization with Heterogeneous Data

著者: Feifei Wang, Huiyun Tang, Yang Li

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2312.04281

要約:
Federated learning is an emerging distributed machine learning framework aiming at protecting data privacy. Data heterogeneity is one of the core challenges in federated learning, which could severely degrade the convergence rate and prediction performance of deep neural networks. To address this issue, we develop a novel personalized federated learning framework for heterogeneous data, which we refer to as FedSplit. This modeling framework is motivated by the finding that, data in different clients contain both common knowledge and personalized knowledge. Then the hidden elements in each neural layer can be split into the shared and personalized groups. With this decomposition, a novel objective function is established and optimized. We demonstrate FedSplit enjoyers a faster convergence speed than the standard federated learning method both theoretically and empirically. The generalization bound of the FedSplit method is also studied. To practically implement the proposed method on real datasets, factor analysis is introduced to facilitate the decoupling of hidden elements. This leads to a practically implemented model for FedSplit and we further refer to as FedFac. We demonstrated by simulation studies that, using factor analysis can well recover the underlying shared/personalized decomposition. The superior prediction performance of FedFac is further verified empirically by comparison with various state-of-the-art federated learning methods on several real datasets.

#27 Momentum Multi-Marginal Schr\"odinger Bridge Matching

著者: Panagiotis Theodoropoulos, Augustinos D. Saravanos, Evangelos A. Theodorou, Guan-Horng Liu

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2506.10168

要約:
Understanding complex systems by inferring trajectories from sparse sample snapshots is a fundamental challenge in a wide range of domains, e.g., single-cell biology, meteorology, and economics. Despite advancements in Bridge and Flow matching frameworks, current methodologies rely on pairwise interpolation between adjacent snapshots. This hinders their ability to capture long-range temporal dependencies and potentially affects the coherence of the inferred trajectories. To address these issues, we introduce \textbf{Momentum Multi-Marginal Schr\"odinger Bridge Matching (3MSBM)}, a novel matching framework that learns smooth measure-valued splines for stochastic systems that satisfy multiple positional constraints. This is achieved by lifting the dynamics to phase space and generalizing stochastic bridges to be conditioned on several points, forming a multi-marginal conditional stochastic optimal control problem. The underlying dynamics are then learned by minimizing a variational objective, having fixed the path induced by the multi-marginal conditional bridge. As a matching approach, 3MSBM learns transport maps that preserve intermediate marginals throughout training, significantly improving convergence and scalability. Extensive experimentation in a series of real-world applications validates the superior performance of 3MSBM compared to existing methods in capturing complex dynamics with temporal dependencies, opening new avenues for training matching frameworks in multi-marginal settings.

#28 Demystifying Spectral Feature Learning for Instrumental Variable Regression

著者: Dimitri Meunier, Antoine Moulin, Jakub Wornbard, Vladimir R. Kostic, Arthur Gretton

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2506.10899

要約:
We address the problem of causal effect estimation in the presence of hidden confounders, using nonparametric instrumental variable (IV) regression. A leading strategy employs spectral features - that is, learned features spanning the top eigensubspaces of the operator linking treatments to instruments. We derive a generalization error bound for a two-stage least squares estimator based on spectral features, and gain insights into the method's performance and failure modes. We show that performance depends on two key factors, leading to a clear taxonomy of outcomes. In a good scenario, the approach is optimal. This occurs with strong spectral alignment, meaning the structural function is well-represented by the top eigenfunctions of the conditional operator, coupled with this operator's slow eigenvalue decay, indicating a strong instrument. Performance degrades in a bad scenario: spectral alignment remains strong, but rapid eigenvalue decay (indicating a weaker instrument) demands significantly more samples for effective feature learning. Finally, in the ugly scenario, weak spectral alignment causes the method to fail, regardless of the eigenvalues' characteristics. Our synthetic experiments empirically validate this taxonomy. We further introduce a practical procedure to estimate these spectral properties from data, allowing practitioners to diagnose which regime a given problem falls into. We apply this method to the dSprites dataset, demonstrating its utility.

#29 A Common Pipeline for Harmonizing Electronic Health Record Data for Translational Research

著者: Jessica Gronsbell, Vidul Ayakulangara Panickan, Doudou Zhou, Chris Lin, Thomas Charlon, Chuan Hong, Xin Xiong, Linshanshan Wang, Jianhui Gao, Shirley Zhou, Yuan Tian, Yaqi Shi, Ziming Gan, Tianxi Cai

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.08553

要約:
Despite the growing availability of Electronic Health Record (EHR) data, researchers often face substantial barriers in effectively using these data for translational research due to their complexity, heterogeneity, and lack of standardized tools and documentation. To address this critical gap, we introduce PEHRT, a common pipeline for harmonizing EHR data for translational research. PEHRT is a comprehensive, ready-to-use resource that includes open-source code, visualization tools, and detailed documentation to streamline the process of preparing EHR data for analysis. The pipeline provides tools to harmonize structured and unstructured EHR data to standardized ontologies to ensure consistency across diverse coding systems. In the presence of unmapped or heterogeneous local codes, PEHRT further leverages representation learning and pre-trained language models to generate robust embeddings that capture semantic relationships across sites to mitigate heterogeneity and enable integrative downstream analyses. PEHRT also supports cross-institutional co-training through shared representations, allowing participating sites to collaboratively refine embeddings and enhance generalizability without sharing individual-level data. The framework is data model-agnostic and can be seamlessly deployed across diverse healthcare systems to produce interoperable, research-ready datasets. By lowering the technical barriers to EHR-based research, PEHRT empowers investigators to transform raw clinical data into reproducible, analysis-ready resources for discovery and innovation.

#30 IndiSeek learns information-guided disentangled representations

著者: Yu Gui, Cong Ma, Zongming Ma

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.21584

要約:
Learning disentangled representations is a fundamental task in multi-modal learning. In modern applications such as single-cell multi-omics, both shared and modality-specific features are critical for characterizing cell states and supporting downstream analyses. Ideally, modality-specific features should be independent of shared ones while also capturing all complementary information within each modality. This tradeoff is naturally expressed through information-theoretic criteria, but mutual-information-based objectives are difficult to estimate reliably, and their variational surrogates often underperform in practice. In this paper, we introduce IndiSeek, a novel disentangled representation learning approach that addresses this challenge by combining an independence-enforcing objective with a computationally efficient reconstruction loss that bounds conditional mutual information. This formulation explicitly balances independence and completeness, enabling principled extraction of modality-specific features. We demonstrate the effectiveness of IndiSeek on synthetic simulations, a CITE-seq dataset and multiple real-world multi-modal benchmarks.

#31 Optimal control of the future via prospective learning with control

著者: Yuxin Bai, Aranyak Acharyya, Ashwin De Silva, Zeyu Shen, James Hassett, Joshua T. Vogelstein

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.08717

要約:
Optimal control of the future is the next frontier for AI. Current approaches to this problem are typically rooted in reinforcement learning (RL). RL is mathematically distinct from supervised learning, which has been the main workhorse for the recent achievements in AI. Moreover, RL typically operates in a stationary environment with episodic resets, limiting its utility. Here, we extend supervised learning to address learning to \textit{control} in non-stationary, reset-free environments. Using this framework, called ''Prospective Learning with Control'' (PL+C), we prove that under certain fairly general assumptions, empirical risk minimization (ERM) asymptotically achieves the Bayes optimal policy. We then consider a specific instance of prospective learning with control, foraging -- which is a canonical task for any mobile agent -- be it natural or artificial. We illustrate that modern RL algorithms fail to learn in these non-stationary reset-free environments, and even with modifications, they are orders of magnitude less efficient than our prospective foraging agents.

#32 Enabling Differentially Private Federated Learning for Speech Recognition: Benchmarks, Adaptive Optimizers and Gradient Clipping

privacy

著者: Martin Pelikan, Sheikh Shams Azam, Vitaly Feldman, Jan "Honza" Silovsky, Kunal Talwar, Christopher G. Brinton, Tatiana Likhomanenko

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2310.00098

要約:
While federated learning (FL) and differential privacy (DP) have been extensively studied, their application to automatic speech recognition (ASR) remains largely unexplored due to the challenges in training large transformer models. Specifically, large models further exacerbate issues in FL as they are particularly susceptible to gradient heterogeneity across layers, unlike the relatively uniform gradient behavior observed in shallow models. As a result, prior works struggle to converge with standard optimization techniques, even in the absence of DP mechanisms. To the best of our knowledge, no existing work establishes a competitive, practical recipe for FL with DP in the context of ASR. To address this gap, we establish \textbf{the first benchmark for FL with DP in end-to-end ASR}. Our approach centers on per-layer clipping and layer-wise gradient normalization: theoretical analysis reveals that these techniques together mitigate clipping bias and gradient heterogeneity across layers in deeper models. Consistent with these theoretical insights, our empirical results show that FL with DP is viable under strong privacy guarantees, provided a population of at least several million users. Specifically, we achieve user-level (7.2, $10^{-9}$)-DP (resp. (4.5, $10^{-9}$)-DP) with only a 1.3% (resp. 4.6%) absolute drop in word error rate when extrapolating to high (resp. low) population scales for FL with DP in ASR. Although our experiments focus on ASR, the underlying principles we uncover - particularly those concerning gradient heterogeneity and layer-wise gradient normalization - offer broader guidance for designing scalable, privacy-preserving FL algorithms for large models across domains. Code of all experiments and benchmarks is available at https://github.com/apple/ml-pfl4asr.

#33 Decentralized Bilevel Optimization: A Perspective from Transient Iteration Complexity

著者: Boao Kong, Shuchen Zhu, Songtao Lu, Xinmeng Huang, Kun Yuan

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2402.03167

要約:
Stochastic bilevel optimization (SBO) is becoming increasingly essential in machine learning due to its versatility in handling nested structures. To address large-scale SBO, decentralized approaches have emerged as effective paradigms in which nodes communicate with immediate neighbors without a central server, thereby improving communication efficiency and enhancing algorithmic robustness. However, most decentralized SBO algorithms focus solely on asymptotic convergence rates, overlooking transient iteration complexity-the number of iterations required before asymptotic rates dominate, which results in limited understanding of the influence of network topology, data heterogeneity, and the nested bilevel algorithmic structures. To address this issue, this paper introduces D-SOBA, a Decentralized Stochastic One-loop Bilevel Algorithm framework. D-SOBA comprises two variants: D-SOBA-SO, which incorporates second-order Hessian and Jacobian matrices, and D-SOBA-FO, which relies entirely on first-order gradients. We provide a comprehensive non-asymptotic convergence analysis and establish the transient iteration complexity of D-SOBA. This provides the first theoretical understanding of how network topology, data heterogeneity, and nested bilevel structures influence decentralized SBO. Extensive experimental results demonstrate the efficiency and theoretical advantages of D-SOBA.

#34 Data Valuation by Fusing Global and Local Statistical Information

著者: Xiaoling Zhou, Ou Wu, Michael K. Ng, Hao Jiang

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2405.17464

要約:
Data valuation has garnered increasing attention in recent years, given the critical role of high-quality data in various applications. Among diverse data valuation approaches, Shapley value-based methods are predominant due to their strong theoretical grounding. However, the exact computation of Shapley values is often computationally prohibitive, prompting the development of numerous approximation techniques. Despite notable advancements, existing methods generally neglect the incorporation of value distribution information and fail to account for dynamic data conditions, thereby compromising their performance and application potential. In this paper, we highlight the crucial role of both global and local statistical properties of value distributions in the context of data valuation for machine learning. First, we conduct a comprehensive analysis of these distributions across various simulated and real-world datasets, uncovering valuable insights and key patterns. Second, we propose an enhanced data valuation method that fuses the explored distribution characteristics into two regularization terms to refine Shapley value estimation. The proposed regularizers can be seamlessly incorporated into various existing data valuation methods. Third, we introduce a novel approach for dynamic data valuation that infers updated data values without recomputing Shapley values, thereby significantly improving computational efficiency. Extensive experiments have been conducted across a range of tasks, including Shapley value estimation, value-based data addition and removal, mislabeled data detection, and dynamic data valuation. The results showcase the consistent effectiveness and efficiency of our proposed methodologies, affirming the significant potential of global and local value distributions in data valuation.

#35 CTSyn: A Foundation Model for Cross Tabular Data Generation

著者: Xiaofeng Lin, Chenheng Xu, Matthew Yang, Guang Cheng

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2406.04619

要約:
Generative Foundation Models (GFMs) have achieved remarkable success in producing high-quality synthetic data for images and text. However, their application to tabular data presents significant challenges due to the heterogeneous nature of table features. Current cross-table learning frameworks struggle because they lack a generative model backbone and an effective mechanism to decode heterogeneous feature values. To address these challenges, we propose the Cross-Table Synthesizer (CTSyn), a diffusion-based generative foundation model for tabular data generation. CTSyn comprises two key components. The first is an autoencoder network that consolidates diverse tables into a unified latent space. It dynamically reconstructs table values using a table schema embedding, allowing adaptation to heterogeneous datasets. The second is a conditional latent diffusion model that generates samples from the learned latent space, conditioned on the table schema. Through large-scale pre-training, CTSyn outperforms existing table synthesizers on standard benchmarks in both utility and diversity. These results position CTSyn as a promising framework for synthetic table generation and lay the groundwork for developing large-scale tabular foundation models.

#36 Establishing Linear Surrogate Regret Bounds for Convex Smooth Losses via Convolutional Fenchel-Young Losses

著者: Yuzhou Cao, Han Bao, Lei Feng, Bo An

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.09432

要約:
Surrogate regret bounds, also known as excess risk bounds, bridge the gap between the convergence rates of surrogate and target losses. The regret transfer is lossless if the surrogate regret bound is linear. While convex smooth surrogate losses are appealing in particular due to the efficient estimation and optimization, the existence of a trade-off between the loss smoothness and linear regret bound has been believed in the community. Under this scenario, the better optimization and estimation properties of convex smooth surrogate losses may inevitably deteriorate after undergoing the regret transfer onto a target loss. We overcome this dilemma for arbitrary discrete target losses by constructing a convex smooth surrogate loss, which entails a linear surrogate regret bound composed with a tailored prediction link. The construction is based on Fenchel--Young losses generated by the convolutional negentropy, which are equivalent to the infimal convolution of a generalized negentropy and the target Bayes risk. Consequently, the infimal convolution enables us to derive a smooth loss while maintaining the surrogate regret bound linear. We additionally benefit from the infimal convolution to have a consistent estimator of the underlying class probability. Our results are overall a novel demonstration of how convex analysis penetrates into optimization and statistical efficiency in risk minimization.

#37 Sample-optimal learning of quantum states using gentle measurements

著者: Cristina Butucea, Jan Johannes, Henning Stein

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.24587

要約:
Gentle measurements of quantum states do not entirely collapse the initial state. Instead, they provide a post-measurement state at a prescribed trace distance $\alpha$ from the initial state together with a random variable used for quantum learning of the initial state. We introduce here the class of $\alpha-$locally-gentle measurements ($\alpha-$LGM) on a finite dimensional quantum system which are product measurements on product states and prove a strong quantum Data-Processing Inequality (qDPI) on this class using an improved relation between gentleness and quantum differential privacy. We further show a gentle quantum Neyman-Pearson lemma which implies that our qDPI is asymptotically optimal (for small $\alpha$). This inequality is employed to show that the necessary number of quantum states for prescribed accuracy $\epsilon$ is of order $1/(\epsilon^2 \alpha^2)$ for both quantum tomography and quantum state certification. Finally, we propose an $\alpha-$LGM called quantum Label Switch that attains these bounds. It is a general implementable method to turn any two-outcome measurement into an $\alpha-$LGM.

#38 Action Chunking and Exploratory Data Collection Yield Exponential Improvements in Behavior Cloning for Continuous Control

著者: Thomas T. Zhang, Daniel Pfrommer, Chaoyi Pan, Nikolai Matni, Max Simchowitz

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2507.09061

要約:
This paper presents a theoretical analysis of two of the most impactful interventions in modern learning from demonstration in robotics and continuous control: the practice of action-chunking (predicting sequences of actions in open-loop) and exploratory augmentation of expert demonstrations. Though recent results show that learning from demonstration, also known as imitation learning (IL), can suffer errors that compound exponentially with task horizon in continuous settings, we demonstrate that action chunking and exploratory data collection circumvent exponential compounding errors in different regimes. Our results identify control-theoretic stability as the key mechanism underlying the benefits of these interventions. On the empirical side, we validate our predictions and the role of control-theoretic stability through experimentation on popular robot learning benchmarks. On the theoretical side, we demonstrate that the control-theoretic lens provides fine-grained insights into how compounding error arises, leading to tighter statistical guarantees on imitation learning error when these interventions are applied than previous techniques based on information-theoretic considerations alone.

#39 Extreme value theory for singular subspace estimation in the matrix denoising model

著者: Junhyung Chang, Joshua Cape

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2507.19978

要約:
This paper studies fine-grained singular subspace estimation in the matrix denoising model where a deterministic low-rank signal matrix is additively perturbed by a stochastic matrix of Gaussian noise. We establish that the maximum Euclidean row norm (i.e., the two-to-infinity norm) of the aligned difference between the leading sample and population singular vectors approaches the Gumbel distribution in the large-matrix limit, under suitable signal-to-noise conditions and after appropriate centering and scaling. We apply our novel asymptotic distributional theory to test hypotheses of low-rank signal structure encoded in the leading singular vectors and their corresponding principal subspace. We provide de-biased estimators for the corresponding nuisance signal singular values and show that our proposed plug-in test statistic has desirable properties. Notably, compared to using the Frobenius norm subspace distance, our test statistic based on the two-to-infinity norm empirically has higher power to detect structured alternatives that differ from the null in only a few matrix entries or rows. Our main results are obtained by a novel synthesis of and technical analysis involving row-wise matrix perturbation analysis, extreme value theory, saddle point approximation methods, and random matrix theory. Our contributions complement the existing literature for matrix denoising focused on minimaxity, mean squared error analysis, unitarily invariant distances between subspaces, component-wise asymptotic distributional theory, and row-wise uniform error bounds. Numerical simulations illustrate our main results and demonstrate the robustness properties of our testing procedure to non-Gaussian noise distributions.

#40 Reconstructing the local density field with combined convolutional and point cloud architecture

著者: Baptiste Barthe-Gold, Nhat-Minh Nguyen, Leander Thiele

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.08573

要約:
We construct a neural network to perform regression on the local dark-matter density field given line-of-sight peculiar velocities of dark-matter halos, biased tracers of the dark matter field. Our architecture combines a convolutional U-Net with a point-cloud DeepSets. This combination enables efficient use of small-scale information and improves reconstruction quality relative to a U-Net-only approach. Specifically, our hybrid network recovers both clustering amplitudes and phases better than the U-Net on small scales.

#41 A Connection Between Score Matching and Local Intrinsic Dimension

著者: Eric Yeats, Aaron Jacobson, Darryl Hannan, Yiran Jia, Timothy Doster, Henry Kvinge, Scott Mahan

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.12975

要約:
The local intrinsic dimension (LID) of data is a fundamental quantity in signal processing and learning theory, but quantifying the LID of high-dimensional, complex data has been a historically challenging task. Recent works have discovered that diffusion models capture the LID of data through the spectra of their score estimates and through the rate of change of their density estimates under various noise perturbations. While these methods can accurately quantify LID, they require either many forward passes of the diffusion model or use of gradient computation, limiting their applicability in compute- and memory-constrained scenarios. We show that the LID is a lower bound on the denoising score matching loss, motivating use of the denoising score matching loss as a LID estimator. Moreover, we show that the equivalent implicit score matching loss also approximates LID via the normal dimension and is closely related to a recent LID estimator, FLIPD. Our experiments on a manifold benchmark and with Stable Diffusion 3.5 indicate that the denoising score matching loss is a highly competitive and scalable LID estimator, achieving superior accuracy and memory footprint under increasing problem size and quantization level.

#42 Weak-to-Strong Generalization under Distribution Shifts

著者: Myeongho Jeon, Jan Sobotka, Suhwan Choi, Maria Brbi\'c

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.21332

要約:
As future superhuman models become increasingly complex, accurately supervising their behavior may exceed human capabilities. Recent works have demonstrated that in such scenarios, weak models can effectively supervise strong models, a phenomenon known as weak-to-strong generalization. However, we find that naive weak-to-strong generalization fails under distribution shifts, often leading to worse performance of the strong model than its weak supervisors. To address this, we propose RAVEN, a robust weak-to-strong generalization framework that dynamically learns the optimal combinations of weak models in addition to parameters of the strong model. We demonstrate the effectiveness of RAVEN on image classification, text classification, and preference alignment tasks. RAVEN outperforms alternative baselines by over 30% on out-of-distribution tasks while matching or surpassing existing methods on in-distribution tasks. Moreover, our results show that RAVEN assigns higher weights to more accurate weak models, demonstrating its ability to automatically identify trustworthy supervision.

#43 Training and Testing with Multiple Splits: A Central Limit Theorem for Split-Sample Estimators

著者: Bruno Fava

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.04957

要約:
As predictive algorithms grow in popularity, using the same dataset to both train and test a new model has become routine across research, policy, and industry. Sample-splitting attains valid inference on model properties by using separate subsamples to estimate the model and to evaluate it. However, this approach has two drawbacks, since each task uses only part of the data, and different splits can lead to widely different estimates. Averaging across multiple splits, I develop an inference approach that uses more data for training, uses the entire sample for testing, and improves reproducibility. I address the statistical dependence from reusing observations across splits by proving a new central limit theorem for a large class of split-sample estimators under arguably mild and general conditions. Importantly, I make no restrictions on model complexity or convergence rates. I show that confidence intervals based on the normal approximation are valid for many applications, but may undercover in important cases of interest, such as comparing the performance between two models. I develop a new inference approach for such cases, explicitly accounting for the dependence across splits. Moreover, I provide a measure of reproducibility for p-values obtained from split-sample estimators. Finally, I apply my results to two important problems in development and public economics: predicting poverty and learning heterogeneous treatment effects in randomized experiments. I show that my inference approach with repeated cross-fitting achieves better power than existing alternatives, often enough to reveal statistical significance that would otherwise be missed.

#44 Approximation rates of quantum neural networks for periodic functions via Jackson's inequality

著者: Ariel Neufeld, Philipp Schmocker, Viet Khoa Tran

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.16149

要約:
Quantum neural networks (QNNs) are an analog of classical neural networks in the world of quantum computing, which are represented by a unitary matrix with trainable parameters. Inspired by the universal approximation property of classical neural networks, ensuring that every continuous function can be arbitrarily well approximated uniformly on a compact set of a Euclidean space, some recent works have established analogous results for QNNs, ranging from single-qubit to multi-qubit QNNs, and even hybrid classical-quantum models. In this paper, we study the approximation capabilities of QNNs for periodic functions with respect to the supremum norm. We use the Jackson inequality to approximate a given function by implementing its approximating trigonometric polynomial via a suitable QNN. In particular, we see that by restricting to the class of periodic functions, one can achieve a quadratic reduction of the number of parameters, producing better approximation results than in the literature. Moreover, the smoother the function, the fewer parameters are needed to construct a QNN to approximate the function.

#45 An Adaptive Resonance Theory-based Topological Clustering Algorithm with a Self-Adjusting Vigilance Parameter

著者: Naoki Masuyama, Yuichiro Toda, Yusuke Nojima, Hisao Ishibuchi

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.17983

要約:
Clustering in stationary and nonstationary settings, where data distributions remain static or evolve over time, requires models that can adapt to distributional shifts while preserving previously learned cluster structures. This paper proposes an Adaptive Resonance Theory (ART)-based topological clustering algorithm that autonomously adjusts its recalculation interval and vigilance threshold through a diversity-driven adaptation mechanism. This mechanism enables hyperparameter-free learning that maintains cluster stability and continuity in dynamic environments. Experiments on 24 real-world datasets demonstrate that the proposed algorithm outperforms state-of-the-art methods in both clustering performance and continual learning capability. These results highlight the effectiveness of the proposed parameter adaptation in mitigating catastrophic forgetting and maintaining consistent clustering in evolving data streams. Source code is available at https://github.com/Masuyama-lab/IDAT

#46 Lower Complexity Bounds for Nonconvex-Strongly-Convex Bilevel Optimization with First-Order Oracles

著者: Kaiyi Ji

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.19656

要約:
Although upper bound guarantees for bilevel optimization have been widely studied, progress on lower bounds has been limited due to the complexity of the bilevel structure. In this work, we focus on the smooth nonconvex-strongly-convex setting and develop new hard instances that yield nontrivial lower bounds under deterministic and stochastic first-order oracle models. In the deterministic case, we prove that any first-order zero-respecting algorithm requires at least $\Omega(\kappa^{3/2}\epsilon^{-2})$ oracle calls to find an $\epsilon$-accurate stationary point, improving the optimal lower bounds known for single-level nonconvex optimization and for nonconvex-strongly-convex min-max problems. In the stochastic case, we show that at least $\Omega(\kappa^{5/2}\epsilon^{-4})$ stochastic oracle calls are necessary, again strengthening the best known bounds in related settings. Our results expose substantial gaps between current upper and lower bounds for bilevel optimization and suggest that even simplified regimes, such as those with quadratic lower-level objectives, warrant further investigation toward understanding the optimal complexity of bilevel optimization under standard first-order oracles.

#47 Adaptive SGD with Line-Search and Polyak Stepsizes: Nonconvex Convergence and Accelerated Rates

著者: Haotian Wu

公開日: Thu, 27 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.20207

要約:
We extend the convergence analysis of AdaSLS and AdaSPS in [Jiang and Stich, 2024] to the nonconvex setting, presenting a unified convergence analysis of stochastic gradient descent with adaptive Armijo line-search (AdaSLS) and Polyak stepsize (AdaSPS) for nonconvex optimization. Our contributions include: (1) an $\mathcal{O}(1/\sqrt{T})$ convergence rate for general nonconvex smooth functions, (2) an $\mathcal{O}(1/T)$ rate under quasar-convexity and interpolation, and (3) an $\mathcal{O}(1/T)$ rate under the strong growth condition for general nonconvex functions.

stat.ML updates on arXiv.org

📋 論文タイトル一覧