stat.ML updates on arXiv.org

更新日時: Wed, 28 Jan 2026 05:00:23 +0000
論文数: 59件
0件選択中

📋 論文タイトル一覧

1. Statistical Inference for Explainable Boosting Machines
2. Implicit Q-Learning and SARSA: Liberating Policy Control from Step-Size Calibration
3. Collaborative Compressors in Distributed Mean Estimation with Limited Communication Budget
4. Convergence of Muon with Newton-Schulz
5. Double Fairness Policy Learning: Integrating Action Fairness and Outcome Fairness in Decision-making
6. Regularized $f$-Divergence Kernel Tests
7. Revisiting Incremental Stochastic Majorization-Minimization Algorithms with Applications to Mixture of Experts
8. Prediction Markets as Bayesian Inverse Problems: Uncertainty Quantification, Identifiability, and Information Gain from Price-Volume Histories under Latent Types
9. Time series forecasting with Hahn Kolmogorov-Arnold networks
10. RealStats: A Rigorous Real-Only Statistical Framework for Fake Image Detection
11. Advances in Diffusion-Based Generative Compression diffusion
12. FSD-CAP: Fractional Subgraph Diffusion with Class-Aware Propagation for Graph Feature Imputation diffusion
13. Vector-Valued Distributional Reinforcement Learning Policy Evaluation: A Hilbert Space Embedding Approach
14. Recommending Composite Items Using Multi-Level Preference Information: A Joint Interaction Modeling Approach
15. A Unifying View of Coverage in Linear Off-Policy Evaluation
16. XIMP: Cross Graph Inter-Message Passing for Molecular Property Prediction
17. Principled Fine-tuning of LLMs from User-Edits: A Medley of Preference, Supervision, and Reward
18. Nonlocal Kramers-Moyal formulas and data-driven discovery of stochastic dynamical systems with multiplicative L\'evy noise
19. E-QRGMM: Efficient Generative Metamodeling for Covariate-Dependent Uncertainty Quantification
20. LightSBB-M: Bridging Schr\"odinger and Bass for Generative Diffusion Modeling diffusion
21. Predictive Accuracy versus Interpretability in Energy Markets: A Copula-Enhanced TVP-SVAR Analysis
22. Divergence-Free Diffusion Models for Incompressible Fluid Flows diffusion
23. Intersectional Fairness via Mixed-Integer Optimization
24. Direct Doubly Robust Estimation of Conditional Quantile Contrasts
25. Provable Learning of Random Hierarchy Models and Hierarchical Shallow-to-Deep Chaining
26. To Grok Grokking: Provable Grokking in Ridge Regression
27. Beyond the noise: intrinsic dimension estimation with optimal neighbourhood identification
28. Global Minimizers of $\ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks model extraction
29. Single Index Bandits: Generalized Linear Contextual Bandits with Unknown Reward Functions
30. Bilateral Distribution Compression: Reducing Both Data Size and Dimensionality
31. Incorporating priors in learning: a random matrix study under a teacher-student framework
32. Concept activation vectors: a unifying view and adversarial attacks
33. Doubly-Regressing Approach for Subgroup Fairness
34. A general framework for adaptive nonparametric dimensionality reduction
35. Differential privacy for symmetric log-concave mechanisms privacy
36. Chaotic Hedging with Iterated Integrals and Neural Networks
37. An efficient, provably optimal algorithm for the 0-1 loss linear classification problem
38. Statistical Hypothesis Testing for Information Value (IV)
39. A simple algorithm for output range analysis for deep neural networks
40. Link Representation Learning for Probabilistic Travel Time Estimation
41. Have ASkotch: A Neat Solution for Large-scale Kernel Ridge Regression
42. CaTs and DAGs: Integrating Directed Acyclic Graphs with Transformers for Causally Constrained Predictions
43. Deeply Learned Robust Matrix Completion for Large-scale Low-rank Data Recovery
44. Minimax-Optimal Spectral Clustering with Covariance Projection for High-Dimensional Anisotropic Mixtures
45. Generative Modeling with Bayesian Sample Inference
46. Sample-Efficient Optimization over Generative Priors via Coarse Learnability
47. On the contraction properties of Sinkhorn semigroups
48. Fine Tuning a Simulation-Driven Estimator
49. Imputation-free Learning of Tabular Data with Missing Values using Incremental Feature Partitions in Transformer
50. Theoretical Investigation on Inductive Bias of Isolation Forest
51. Attention-Aided MMSE for OFDM Channel Estimation: Learning Linear Filters with Attention
52. Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards
53. NIMO: a Nonlinear Interpretable MOdel
54. Why is Your Language Model a Poor Implicit Reward Model?
55. Optimal Scaling Needs Optimal Norm
56. Adaptive Regularization for Large-Scale Sparse Feature Embedding Models
57. Conformal Online Learning of Deep Koopman Linear Embeddings
58. Learning under Distributional Drift: Reproducibility as an Intrinsic Statistical Resource
59. A Generalized Adaptive Joint Learning Framework for High-Dimensional Time-Varying Models
📄 論文詳細
著者: Haimo Fang, Kevin Tan, Jonathan Pipping, Giles Hooker
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Explainable boosting machines (EBMs) are popular "glass-box" models that learn a set of univariate functions using boosting trees. These achieve explainability through visualizations of each feature's effect. However, unlike linear model coefficients, uncertainty quantification for the learned univariate functions requires computationally intensive bootstrapping, making it hard to know which features truly matter. We provide an alternative using recent advances in statistical inference for gradient boosting, deriving methods for statistical inference as well as end-to-end theoretical guarantees. Using a moving average instead of a sum of trees (Boulevard regularization) allows the boosting process to converge to a feature-wise kernel ridge regression. This produces asymptotically normal predictions that achieve the minimax-optimal mean squared error for fitting Lipschitz GAMs with $p$ features at rate $O(pn^{-2/3})$, successfully avoiding the curse of dimensionality. We then construct prediction intervals for the response and confidence intervals for each learned univariate function with a runtime independent of the number of datapoints, enabling further explainability within EBMs.
著者: Hwanwoo Kim, Eric Laber
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Q-learning and SARSA are foundational reinforcement learning algorithms whose practical success depends critically on step-size calibration. Step-sizes that are too large can cause numerical instability, while step-sizes that are too small can lead to slow progress. We propose implicit variants of Q-learning and SARSA that reformulate their iterative updates as fixed-point equations. This yields an adaptive step-size adjustment that scales inversely with feature norms, providing automatic regularization without manual tuning. Our non-asymptotic analyses demonstrate that implicit methods maintain stability over significantly broader step-size ranges. Under favorable conditions, it permits arbitrarily large step-sizes while achieving comparable convergence rates. Empirical validation across benchmark environments spanning discrete and continuous state spaces shows that implicit Q-learning and SARSA exhibit substantially reduced sensitivity to step-size selection, achieving stable performance with step-sizes that would cause standard methods to fail.
著者: Harsh Vardhan, Arya Mazumdar
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Distributed high dimensional mean estimation is a common aggregation routine used often in distributed optimization methods. Most of these applications call for a communication-constrained setting where vectors, whose mean is to be estimated, have to be compressed before sharing. One could independently encode and decode these to achieve compression, but that overlooks the fact that these vectors are often close to each other. To exploit these similarities, recently Suresh et al., 2022, Jhunjhunwala et al., 2021, Jiang et al, 2023, proposed multiple correlation-aware compression schemes. However, in most cases, the correlations have to be known for these schemes to work. Moreover, a theoretical analysis of graceful degradation of these correlation-aware compression schemes with increasing dissimilarity is limited to only the $\ell_2$-error in the literature. In this paper, we propose four different collaborative compression schemes that agnostically exploit the similarities among vectors in a distributed setting. Our schemes are all simple to implement and computationally efficient, while resulting in big savings in communication. The analysis of our proposed schemes show how the $\ell_2$, $\ell_\infty$ and cosine estimation error varies with the degree of similarity among vectors.
著者: Gyu Yeol Kim, Min-hwan Oh
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
We analyze Muon as originally proposed and used in practice -- using the momentum orthogonalization with a few Newton-Schulz steps. The prior theoretical results replace this key step in Muon with an exact SVD-based polar factor. We prove that Muon with Newton-Schulz converges to a stationary point at the same rate as the SVD-polar idealization, up to a constant factor for a given number $q$ of Newton-Schulz steps. We further analyze this constant factor and prove that it converges to 1 doubly exponentially in $q$ and improves with the degree of the polynomial used in Newton-Schulz for approximating the orthogonalization direction. We also prove that Muon removes the typical square-root-of-rank loss compared to its vector-based counterpart, SGD with momentum. Our results explain why Muon with a few low-degree Newton-Schulz steps matches exact-polar (SVD) behavior at a much faster wall-clock time and explain how much momentum matrix orthogonalization via Newton-Schulz benefits over the vector-based optimizer. Overall, our theory justifies the practical Newton-Schulz design of Muon, narrowing its practice-theory gap.
著者: Zeyu Bian, Lan Wang, Chengchun Shi, Zhengling Qi
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Fairness is a central pillar of trustworthy machine learning, especially in domains where accuracy- or profit-driven optimization is insufficient. While most fairness research focuses on supervised learning, fairness in policy learning remains less explored. Because policy learning is interventional, it induces two distinct fairness targets: action fairness (equitable action assignments) and outcome fairness (equitable downstream consequences). Crucially, equalizing actions does not generally equalize outcomes when groups face different constraints or respond differently to the same action. We propose a novel double fairness learning (DFL) framework that explicitly manages the trade-off among three objectives: action fairness, outcome fairness, and value maximization. We integrate fairness directly into a multi-objective optimization problem for policy learning and employ a lexicographic weighted Tchebyshev method that recovers Pareto solutions beyond convex settings, with theoretical guarantees on the regret bounds. Our framework is flexible and accommodates various commonly used fairness notions. Extensive simulations demonstrate improved performance relative to competing methods. In applications to a motor third-party liability insurance dataset and an entrepreneurship training dataset, DFL substantially improves both action and outcome fairness while incurring only a modest reduction in overall value.
著者: M\'onica Ribero, Antonin Schrab, Arthur Gretton
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
We propose a framework to construct practical kernel-based two-sample tests from the family of $f$-divergences. The test statistic is computed from the witness function of a regularized variational representation of the divergence, which we estimate using kernel methods. The proposed test is adaptive over hyperparameters such as the kernel bandwidth and the regularization parameter. We provide theoretical guarantees for statistical test power across our family of $f$-divergence estimates. While our test covers a variety of $f$-divergences, we bring particular focus to the Hockey-Stick divergence, motivated by its applications to differential privacy auditing and machine unlearning evaluation. For two-sample testing, experiments demonstrate that different $f$-divergences are sensitive to different localized differences, illustrating the importance of leveraging diverse statistics. For machine unlearning, we propose a relative test that distinguishes true unlearning failures from safe distributional variations.
著者: TrungKhang Tran, TrungTin Nguyen, Gersende Fort, Tung Doan, Hien Duy Nguyen, Binh T. Nguyen, Florence Forbes, Christopher Drovandi
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Processing high-volume, streaming data is increasingly common in modern statistics and machine learning, where batch-mode algorithms are often impractical because they require repeated passes over the full dataset. This has motivated incremental stochastic estimation methods, including the incremental stochastic Expectation-Maximization (EM) algorithm formulated via stochastic approximation. In this work, we revisit and analyze an incremental stochastic variant of the Majorization-Minimization (MM) algorithm, which generalizes incremental stochastic EM as a special case. Our approach relaxes key EM requirements, such as explicit latent-variable representations, enabling broader applicability and greater algorithmic flexibility. We establish theoretical guarantees for the incremental stochastic MM algorithm, proving consistency in the sense that the iterates converge to a stationary point characterized by a vanishing gradient of the objective. We demonstrate these advantages on a softmax-gated mixture of experts (MoE) regression problem, for which no stochastic EM algorithm is available. Empirically, our method consistently outperforms widely used stochastic optimizers, including stochastic gradient descent, root mean square propagation, adaptive moment estimation, and second-order clipped stochastic optimization. These results support the development of new incremental stochastic algorithms, given the central role of softmax-gated MoE architectures in contemporary deep neural networks for heterogeneous data modeling. Beyond synthetic experiments, we also validate practical effectiveness on two real-world datasets, including a bioinformatics study of dent maize genotypes under drought stress that integrates high-dimensional proteomics with ecophysiological traits, where incremental stochastic MM yields stable gains in predictive performance.
著者: Juan Pablo Madrigal-Cianci, Camilo Monsalve Maya, Lachlan Breakey
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Prediction markets are often described as mechanisms that ``aggregate information'' into prices, yet the mapping from dispersed private information to observed market histories is typically noisy, endogenous, and shaped by heterogeneous and strategic participation. This paper formulates prediction markets as Bayesian inverse problems in which the unknown event outcome \(Y\in\{0,1\}\) is inferred from an observed history of market-implied probabilities and traded volumes. We introduce a mechanism-agnostic observation model in log-odds space in which price increments conditional on volume arise from a latent mixture of trader types. The resulting likelihood class encompasses informed and uninformed trading, heavy-tailed microstructure noise, and adversarial or manipulative flow, while requiring only price and volume as observables. Within this framework we define posterior uncertainty quantification for \(Y\), provide identifiability and well-posedness criteria in terms of Kullback--Leibler separation between outcome-conditional increment laws, and derive posterior concentration statements and finite-sample error bounds under general regularity assumptions. We further study stability of posterior odds to perturbations of the observed price--volume path and define realized and expected information gain via the posterior-vs-prior KL divergence and mutual information. The inverse-problem formulation yields explicit diagnostics for regimes in which market histories are informative and stable versus regimes in which inference is ill-posed due to type-composition confounding or outcome--nuisance symmetries. Extensive experiments on synthetic data validate our theoretical predictions regarding posterior concentration rates and identifiability thresholds.
著者: Md Zahidul Hasan, A. Ben Hamza, Nizar Bouguila
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Recent Transformer- and MLP-based models have demonstrated strong performance in long-term time series forecasting, yet Transformers remain limited by their quadratic complexity and permutation-equivariant attention, while MLPs exhibit spectral bias. We propose HaKAN, a versatile model based on Kolmogorov-Arnold Networks (KANs), leveraging Hahn polynomial-based learnable activation functions and providing a lightweight and interpretable alternative for multivariate time series forecasting. Our model integrates channel independence, patching, a stack of Hahn-KAN blocks with residual connections, and a bottleneck structure comprised of two fully connected layers. The Hahn-KAN block consists of inter- and intra-patch KAN layers to effectively capture both global and local temporal patterns. Extensive experiments on various forecasting benchmarks demonstrate that our model consistently outperforms recent state-of-the-art methods, with ablation studies validating the effectiveness of its core components.
著者: Haim Zisman, Uri Shaham
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
As generative models continue to evolve, detecting AI-generated images remains a critical challenge. While effective detection methods exist, they often lack formal interpretability and may rely on implicit assumptions about fake content, potentially limiting robustness to distributional shifts. In this work, we introduce a rigorous, statistically grounded framework for fake image detection that focuses on producing a probability score interpretable with respect to the real-image population. Our method leverages the strengths of multiple existing detectors by combining training-free statistics. We compute p-values over a range of test statistics and aggregate them using classical statistical ensembling to assess alignment with the unified real-image distribution. This framework is generic, flexible, and training-free, making it well-suited for robust fake image detection across diverse and evolving settings.
diffusion
著者: Yibo Yang, Stephan Mandt
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Popularized by their strong image generation performance, diffusion and related methods for generative modeling have found widespread success in visual media applications. In particular, diffusion methods have enabled new approaches to data compression, where realistic reconstructions can be generated at extremely low bit-rates. This article provides a unifying review of recent diffusion-based methods for generative lossy compression, with a focus on image compression. These methods generally encode the source into an embedding and employ a diffusion model to iteratively refine it in the decoding procedure, such that the final reconstruction approximately follows the ground truth data distribution. The embedding can take various forms and is typically transmitted via an auxiliary entropy model, and recent methods also explore the use of diffusion models themselves for information transmission via channel simulation. We review representative approaches through the lens of rate-distortion-perception theory, highlighting the role of common randomness and connections to inverse problems, and identify open challenges.
diffusion
著者: Xin Qiao, Shijie Sun, Anqi Dong, Cong Hua, Xia Zhao, Longfei Zhang, Guangming Zhu, Liang Zhang
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Imputing missing node features in graphs is challenging, particularly under high missing rates. Existing methods based on latent representations or global diffusion often fail to produce reliable estimates, and may propagate errors across the graph. We propose FSD-CAP, a two-stage framework designed to improve imputation quality under extreme sparsity. In the first stage, a graph-distance-guided subgraph expansion localizes the diffusion process. A fractional diffusion operator adjusts propagation sharpness based on local structure. In the second stage, imputed features are refined using class-aware propagation, which incorporates pseudo-labels and neighborhood entropy to promote consistency. We evaluated FSD-CAP on multiple datasets. With $99.5\%$ of features missing across five benchmark datasets, FSD-CAP achieves average accuracies of $80.06\%$ (structural) and $81.01\%$ (uniform) in node classification, close to the $81.31\%$ achieved by a standard GCN with full features. For link prediction under the same setting, it reaches AUC scores of $91.65\%$ (structural) and $92.41\%$ (uniform), compared to $95.06\%$ for the fully observed case. Furthermore, FSD-CAP demonstrates superior performance on both large-scale and heterophily datasets when compared to other models.
著者: Mehrdad Mohammadi, Qi Zheng, Ruoqing Zhu
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
We propose an (offline) multi-dimensional distributional reinforcement learning framework (KE-DRL) that leverages Hilbert space mappings to estimate the kernel mean embedding of the multi-dimensional value distribution under a proposed target policy. In our setting, the state-action variables are multi-dimensional and continuous. By mapping probability measures into a reproducing kernel Hilbert space via kernel mean embeddings, our method replaces Wasserstein metrics with an integral probability metric. This enables efficient estimation in multi-dimensional state-action spaces and reward settings, where direct computation of Wasserstein distances is computationally challenging. Theoretically, we establish contraction properties of the distributional Bellman operator under our proposed metric involving the Matern family of kernels and provide uniform convergence guarantees. Simulations and empirical results demonstrate robust off-policy evaluation and recovery of the kernel mean embedding under mild assumptions, namely, Lipschitz continuity and boundedness of the kernels, highlighting the potential of embedding-based approaches in complex real-world decision-making scenarios and risk evaluation.
著者: Xuan Bi, Yaqiong Wang, Gediminas Adomavicius, Shawn Curley
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
With the advancement of machine learning and artificial intelligence technologies, recommender systems have been increasingly used across a vast variety of platforms to efficiently and effectively match users with items. As application contexts become more diverse and complex, there is a growing need for more sophisticated recommendation techniques. One example is the composite item (for example, fashion outfit) recommendation where multiple levels of user preference information might be available and relevant. In this study, we propose JIMA, a joint interaction modeling approach that uses a single model to take advantage of all data from different levels of granularity and incorporate interactions to learn the complex relationships among lower-order (atomic item) and higher-order (composite item) user preferences as well as domain expertise (e.g., on the stylistic fit). We comprehensively evaluate the proposed method and compare it with advanced baselines through multiple simulation studies as well as with real data in both offline and online settings. The results consistently demonstrate the superior performance of the proposed approach.
著者: Philip Amortila, Audrey Huang, Akshay Krishnamurthy, Nan Jiang
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Off-policy evaluation (OPE) is a fundamental task in reinforcement learning (RL). In the classic setting of linear OPE, finite-sample guarantees often take the form $$ \textrm{Evaluation error} \le \textrm{poly}(C^\pi, d, 1/n,\log(1/\delta)), $$ where $d$ is the dimension of the features and $C^\pi$ is a coverage parameter that characterizes the degree to which the visited features lie in the span of the data distribution. While such guarantees are well-understood for several popular algorithms under stronger assumptions (e.g. Bellman completeness), the understanding is lacking and fragmented in the minimal setting where only the target value function is linearly realizable in the features. Despite recent interest in tight characterizations of the statistical rate in this setting, the right notion of coverage remains unclear, and candidate definitions from prior analyses have undesirable properties and are starkly disconnected from more standard definitions in the literature. We provide a novel finite-sample analysis of a canonical algorithm for this setting, LSTDQ. Inspired by an instrumental-variable view, we develop error bounds that depend on a novel coverage parameter, the feature-dynamics coverage, which can be interpreted as linear coverage in an induced dynamical system for feature evolution. With further assumptions -- such as Bellman-completeness -- our definition successfully recovers the coverage parameters specialized to those settings, finally yielding a unified understanding for coverage in linear OPE.
著者: Anatol Ehrlich, Lorenz Kummer, Vojtech Voracek, Franka Bause, Nils M. Kriege
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Accurate molecular property prediction is central to drug discovery, yet graph neural networks often underperform in data-scarce regimes and fail to surpass traditional fingerprints. We introduce cross-graph inter-message passing (XIMP), which performs message passing both within and across multiple related graph representations. For small molecules, we combine the molecular graph with scaffold-aware junction trees and pharmacophore-encoding extended reduced graphs, integrating complementary abstractions. While prior work is either limited to a single abstraction or non-iterative communication across graphs, XIMP supports an arbitrary number of abstractions and both direct and indirect communication between them in each layer. Across ten diverse molecular property prediction tasks, XIMP outperforms state-of-the-art baselines in most cases, leveraging interpretable abstractions as an inductive bias that guides learning toward established chemical concepts, enhancing generalization in low-data settings.
著者: Dipendra Misra, Aldo Pacchiano, Ta-Chung Chi, Ge Gao
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
We study how to fine-tune LLMs using user-edit deployment data consisting of a set of context, an agent's response, and user edits. This deployment data is naturally generated by users in applications such as LLMs-based writing assistants and coding agents. The _natural_ origin of user edits makes it a desired source for adapting and personalizing LLMs. In this setup, there emerges a unification of various feedback types namely preferences, supervised labels, and cost that are typically studied separately in the literature. In this paper, we initiate the theoretical investigation of learning from user edits. We first derive bounds for learning algorithms that learn from each of these feedback types. We prove that these algorithms have different trade-offs depending upon the user, data distribution, and model class. We then propose a simple ensembling procedure to jointly learn from these feedback types. On two domains adapted from Gao et al. 2024, we show our ensembling procedure outperforms these methods that learn from individual feedback. Further, we show that our proposed procedure can robustly adapt to different user-edit distributions at test time.
著者: Yang Li, Jinqiao Duan
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Traditional data-driven methods, effective for deterministic systems or stochastic differential equations (SDEs) with Gaussian noise, fail to handle the discontinuous sample paths and heavy-tailed fluctuations characteristic of L\'evy processes, particularly when the noise is state-dependent. To bridge this gap, we establish nonlocal Kramers-Moyal formulas, rigorously generalizing the classical Kramers-Moyal relations to SDEs with multiplicative L\'evy noise. These formulas provide a direct link between short-time transition probability densities (or sample path statistics) and the underlying SDE coefficients: the drift vector, diffusion matrix, L\'evy jump measure kernel, and L\'evy noise intensity functions. Leveraging these theoretical foundations, we develop novel data-driven algorithms capable of simultaneously identifying all governing components from data and establish convergence results and error analysis for the algorithms. We validate the framework through extensive numerical experiments on prototypical systems. This work provides a principled and practical toolbox for discovering interpretable SDE models governing complex systems influenced by discontinuous, heavy-tailed, state-dependent fluctuations, with broad applicability in climate science, neuroscience, epidemiology, finance, and biological physics.
著者: Zhiyang Liang, Qingkai Zhang
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Covariate-dependent uncertainty quantification in simulation-based inference is crucial for high-stakes decision-making but remains challenging due to the limitations of existing methods such as conformal prediction and classical bootstrap, which struggle with covariate-specific conditioning. We propose Efficient Quantile-Regression-Based Generative Metamodeling (E-QRGMM), a novel framework that accelerates the quantile-regression-based generative metamodeling (QRGMM) approach by integrating cubic Hermite interpolation with gradient estimation. Theoretically, we show that E-QRGMM preserves the convergence rate of the original QRGMM while reducing grid complexity from $O(n^{1/2})$ to $O(n^{1/5})$ for the majority of quantile levels, thereby substantially improving computational efficiency. Empirically, E-QRGMM achieves a superior trade-off between distributional accuracy and training speed compared to both QRGMM and other advanced deep generative models on synthetic and practical datasets. Moreover, by enabling bootstrap-based construction of confidence intervals for arbitrary estimands of interest, E-QRGMM provides a practical solution for covariate-dependent uncertainty quantification.
diffusion
著者: Alexandre Alouadi, Pierre Henry-Labord\`ere, Gr\'egoire Loeper, Othmane Mazhar, Huy\^en Pham, Nizar Touzi
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
The Schrodinger Bridge and Bass (SBB) formulation, which jointly controls drift and volatility, is an established extension of the classical Schrodinger Bridge (SB). Building on this framework, we introduce LightSBB-M, an algorithm that computes the optimal SBB transport plan in only a few iterations. The method exploits a dual representation of the SBB objective to obtain analytic expressions for the optimal drift and volatility, and it incorporates a tunable parameter beta greater than zero that interpolates between pure drift (the Schrodinger Bridge) and pure volatility (Bass martingale transport). We show that LightSBB-M achieves the lowest 2-Wasserstein distance on synthetic datasets against state-of-the-art SB and diffusion baselines with up to 32 percent improvement. We also illustrate the generative capability of the framework on an unpaired image-to-image translation task (adult to child faces in FFHQ). These findings demonstrate that LightSBB-M provides a scalable, high-fidelity SBB solver that outperforms existing SB and diffusion baselines across both synthetic and real-world generative tasks. The code is available at https://github.com/alexouadi/LightSBB-M.
著者: Fredy Pokou (MRE, CRIStAL), Jules Sadefo Kamdem (MRE), Kpante Emmanuel Gnandi (ENAC-LAB)
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
This paper investigates whether structural econometric models can rival machine learning in forecasting energy--macro dynamics while retaining causal interpretability. Using monthly data from 1999 to 2025, we develop a unified framework that integrates Time-Varying Parameter Structural VARs (TVP-SVAR) with advanced dependence structures, including DCC-GARCH, t-copulas, and mixed Clayton--Frank--Gumbel copulas. These models are empirically evaluated against leading machine learning techniques Gaussian Process Regression (GPR), Artificial Neural Networks, Random Forests, and Support Vector Regression across seven macro-financial and energy variables, with Brent crude oil as the central asset. The findings reveal three major insights. First, TVP-SVAR consistently outperforms standard VAR models, confirming structural instability in energy transmission channels. Second, copula-based extensions capture non-linear and tail dependence more effectively than symmetric DCC models, particularly during periods of macroeconomic stress. Third, despite their methodological differences, copula-enhanced econometric models and GPR achieve statistically equivalent predictive accuracy (t-test p = 0.8444). However, only the econometric approach provides interpretable impulse responses, regime shifts, and tail-risk diagnostics. We conclude that machine learning can replicate predictive performance but cannot substitute the explanatory power of structural econometrics. This synthesis offers a pathway where AI accuracy and economic interpretability jointly inform energy policy and risk management.
diffusion
著者: Wilfried Genuist, \'Eric Savin, Filippo Gatti, Didier Clouteau
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Generative diffusion models are extensively used in unsupervised and self-supervised machine learning with the aim to generate new samples from a probability distribution estimated with a set of known samples. They have demonstrated impressive results in replicating dense, real-world contents such as images, musical pieces, or human languages. This work investigates their application to the numerical simulation of incompressible fluid flows, with a view toward incorporating physical constraints such as incompressibility in the probabilistic forecasting framework enabled by generative networks. For that purpose, we explore different conditional, score-based diffusion models where the divergence-free constraint is imposed by the Leray spectral projector, and autoregressive conditioning is aimed at stabilizing the forecasted flow snapshots at distant time horizons. The proposed models are run on a benchmark turbulence problem, namely a Kolmogorov flow, which allows for a fairly detailed analytical and numerical treatment and thus simplifies the evaluation of the numerical methods used to simulate it. Numerical experiments of increasing complexity are performed in order to compare the advantages and limitations of the diffusion models we have implemented and appraise their performances, including: (i) in-distribution assessment over the same time horizons and for similar physical conditions as the ones seen during training; (ii) rollout predictions over time horizons unseen during training; and (iii) out-of-distribution tests for forecasting flows markedly different from those seen during training. In particular, these results illustrate the ability of diffusion models to reproduce the main statistical characteristics of Kolmogorov turbulence in scenarios departing from the ones they were trained on.
著者: Ji\v{r}\'i N\v{e}me\v{c}ek, Mark Kozdoba, Illia Kryvoviaz, Tom\'a\v{s} Pevn\'y, Jakub Mare\v{c}ek
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
The deployment of Artificial Intelligence in high-risk domains, such as finance and healthcare, necessitates models that are both fair and transparent. While regulatory frameworks, including the EU's AI Act, mandate bias mitigation, they are deliberately vague about the definition of bias. In line with existing research, we argue that true fairness requires addressing bias at the intersections of protected groups. We propose a unified framework that leverages Mixed-Integer Optimization (MIO) to train intersectionally fair and intrinsically interpretable classifiers. We prove the equivalence of two measures of intersectional fairness (MSD and SPSF) in detecting the most unfair subgroup and empirically demonstrate that our MIO-based algorithm improves performance in finding bias. We train high-performing, interpretable classifiers that bound intersectional bias below an acceptable threshold, offering a robust solution for regulated industries and beyond.
著者: Josh Givens, Song Liu, Henry W J Reeve, Katarzyna Reluga
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Within heterogeneous treatment effect (HTE) analysis, various estimands have been proposed to capture the effect of a treatment conditional on covariates. Recently, the conditional quantile comparator (CQC) has emerged as a promising estimand, offering quantile-level summaries akin to the conditional quantile treatment effect (CQTE) while preserving some interpretability of the conditional average treatment effect (CATE). It achieves this by summarising the treated response conditional on both the covariates and the untreated response. Despite these desirable properties, the CQC's current estimation is limited by the need to first estimate the difference in conditional cumulative distribution functions and then invert it. This inversion obscures the CQC estimate, hampering our ability to both model and interpret it. To address this, we propose the first direct estimator of the CQC, allowing for explicit modelling and parameterisation. This explicit parameterisation enables better interpretation of our estimate while also providing a means to constrain and inform the model. We show, both theoretically and empirically, that our estimation error depends directly on the complexity of the CQC itself, improving upon the existing estimation procedure. Furthermore, it retains the desirable double robustness property with respect to nuisance parameter estimation. We further show our method to outperform existing procedures in estimation accuracy across multiple data scenarios while varying sample size and nuisance error. Finally, we apply it to real-world data from an employment scheme, uncovering a reduced range of potential earnings improvement as participant age increases.
著者: Yunwei Ren, Yatin Dandi, Florent Krzakala, Jason D. Lee
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
The empirical success of deep learning is often attributed to deep networks' ability to exploit hierarchical structure in data, constructing increasingly complex features across layers. Yet despite substantial progress in deep learning theory, most optimization results sill focus on networks with only two or three layers, leaving the theoretical understanding of hierarchical learning in genuinely deep models limited. This leads to a natural question: can we prove that deep networks, trained by gradient-based methods, can efficiently exploit hierarchical structure? In this work, we consider Random Hierarchy Models -- a hierarchical context-free grammar introduced by arXiv:2307.02129 and conjectured to separate deep and shallow networks. We prove that, under mild conditions, a deep convolutional network can be efficiently trained to learn this function class. Our proof builds on a general observation: if intermediate layers can receive clean signal from the labels and the relevant features are weakly identifiable, then layerwise training each individual layer suffices to hierarchically learn the target function.
著者: Mingyue Xu, Gal Vardi, Itay Safran
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
We study grokking, the onset of generalization long after overfitting, in a classical ridge regression setting. We prove end-to-end grokking results for learning over-parameterized linear regression models using gradient descent with weight decay. Specifically, we prove that the following stages occur: (i) the model overfits the training data early during training; (ii) poor generalization persists long after overfitting has manifested; and (iii) the generalization error eventually becomes arbitrarily small. Moreover, we show, both theoretically and empirically, that grokking can be amplified or eliminated in a principled manner through proper hyperparameter tuning. To the best of our knowledge, these are the first rigorous quantitative bounds on the generalization delay (which we refer to as the "grokking time") in terms of training hyperparameters. Lastly, going beyond the linear setting, we empirically demonstrate that our quantitative bounds also capture the behavior of grokking on non-linear neural networks. Our results suggest that grokking is not an inherent failure mode of deep learning, but rather a consequence of specific training conditions, and thus does not require fundamental changes to the model architecture or learning algorithm to avoid.
著者: Antonio Di Noia, Iuri Macocco, Aldo Glielmo, Alessandro Laio, Antonietta Mira
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
The Intrinsic Dimension (ID) is a key concept in unsupervised learning and feature selection, as it is a lower bound to the number of variables which are necessary to describe a system. However, in almost any real-world dataset the ID depends on the scale at which the data are analysed. Quite typically at a small scale, the ID is very large, as the data are affected by measurement errors. At large scale, the ID can also be erroneously large, due to the curvature and the topology of the manifold containing the data. In this work, we introduce an automatic protocol to select the sweet spot, namely the correct range of scales in which the ID is meaningful and useful. This protocol is based on imposing that for distances smaller than the correct scale the density of the data is constant. In the presented framework, to estimate the density it is necessary to know the ID, therefore, this condition is imposed self-consistently. We illustrate the usefulness and robustness of this procedure to noise by benchmarks on artificial and real-world datasets.
model extraction
著者: Julia Nakhleh, Robert D. Nowak
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Overparameterized neural networks can interpolate a given dataset in many different ways, prompting the fundamental question: which among these solutions should we prefer, and what explicit regularization strategies will provably yield these solutions? This paper addresses the challenge of finding the sparsest interpolating ReLU network--i.e., the network with the fewest nonzero parameters or neurons--a goal with wide-ranging implications for efficiency, generalization, interpretability, theory, and model compression. Unlike post hoc pruning approaches, we propose a continuous, almost-everywhere differentiable training objective whose global minima are guaranteed to correspond to the sparsest single-hidden-layer ReLU networks that fit the data. This result marks a conceptual advance: it recasts the combinatorial problem of sparse interpolation as a smooth optimization task, potentially enabling the use of gradient-based training methods. Our objective is based on minimizing $\ell^p$ quasinorms of the weights for $0 < p < 1$, a classical sparsity-promoting strategy in finite-dimensional settings. However, applying these ideas to neural networks presents new challenges: the function class is infinite-dimensional, and the weights are learned using a highly nonconvex objective. We prove that, under our formulation, global minimizers correspond exactly to sparsest solutions. Our work lays a foundation for understanding when and how continuous sparsity-inducing objectives can be leveraged to recover sparse networks through training.
著者: Yue Kang, Mingshuo Liu, Bongsoo Yi, Jing Lyu, Zhi Zhang, Doudou Zhou, Yao Li
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Generalized linear bandits have been extensively studied due to their broad applicability in real-world online decision-making problems. However, these methods typically assume that the expected reward function is known to the users, an assumption that is often unrealistic in practice. Misspecification of this link function can lead to the failure of all existing algorithms. In this work, we address this critical limitation by introducing a new problem of generalized linear bandits with unknown reward functions, also known as single index bandits. We first consider the case where the unknown reward function is monotonically increasing, and propose two novel and efficient algorithms, STOR and ESTOR, that achieve decent regrets under standard assumptions. Notably, our ESTOR can obtain the nearly optimal regret bound $\tilde{O}_T(\sqrt{T})$ in terms of the time horizon $T$. We then extend our methods to the high-dimensional sparse setting and show that the same regret rate can be attained with the sparsity index. Next, we introduce GSTOR, an algorithm that is agnostic to general reward functions, and establish regret bounds under a Gaussian design assumption. Finally, we validate the efficiency and effectiveness of our algorithms through experiments on both synthetic and real-world datasets.
著者: Dominic Broadbent, Nick Whiteley, Robert Allison, Tom Lovett
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Existing distribution compression methods reduce the number of observations in a dataset by minimising the Maximum Mean Discrepancy (MMD) between original and compressed sets, but modern datasets are often large in both sample size and dimensionality. We propose Bilateral Distribution Compression (BDC), a two-stage framework that compresses along both axes while preserving the underlying distribution, with overall linear time and memory complexity in dataset size and dimension. Central to BDC is the Decoded MMD (DMMD), which we introduce to quantify the discrepancy between the original data and a compressed set decoded from a low-dimensional latent space. BDC proceeds by (i) learning a low-dimensional projection using the Reconstruction MMD (RMMD), and (ii) optimising a latent compressed set with the Encoded MMD (EMMD). We show that this procedure minimises the DMMD, guaranteeing that the compressed set faithfully represents the original distribution. Experiments show that BDC can achieve comparable or superior downstream task performance to ambient-space compression at substantially lower cost and with significantly higher rates of compression.
著者: Malik Tiomoko, Ekkehard Schnoor
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Regularized linear regression is central to machine learning, yet its high-dimensional behavior with informative priors remains poorly understood. We provide the first exact asymptotic characterization of training and test risks for maximum a posteriori (MAP) regression with Gaussian priors centered at a domain-informed initialization. Our framework unifies ridge regression, least squares, and prior-informed estimators, and -- using random matrix theory -- yields closed-form risk formulas that expose the bias-variance-prior tradeoff, explain double descent, and quantify prior mismatch. We also identify a closed-form minimizer of test risk, enabling a simple estimator of the optimal regularization parameter. Simulations confirm the theory with high accuracy. By connecting Bayesian priors, classical regularization, and modern asymptotics, our results provide both conceptual clarity and practical guidance for learning with structured prior knowledge.
著者: Ekkehard Schnoor, Malik Tiomoko, Jawher Said, Alex Jung, Wojciech Samek
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Concept Activation Vectors (CAVs) are a tool from explainable AI, offering a promising approach for understanding how human-understandable concepts are encoded in a model's latent spaces. They are computed from hidden-layer activations of inputs belonging either to a concept class or to non-concept examples. Adopting a probabilistic perspective, the distribution of the (non-)concept inputs induces a distribution over the CAV, making it a random vector in the latent space. This enables us to derive mean and covariance for different types of CAVs, leading to a unified theoretical view. This probabilistic perspective also reveals a potential vulnerability: CAVs can strongly depend on the rather arbitrary non-concept distribution, a factor largely overlooked in prior work. We illustrate this with a simple yet effective adversarial attack, underscoring the need for a more systematic study.
著者: Kunwoong Kim, Kyungseon Lee, Jihu Lee, Dongyoon Yang, Yongdai Kim
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Algorithmic fairness is a socially crucial topic in real-world applications of AI. Among many notions of fairness, subgroup fairness is widely studied when multiple sensitive attributes (e.g., gender, race, age) are present. However, as the number of sensitive attributes grows, the number of subgroups increases accordingly, creating heavy computational burdens and data sparsity problem (subgroups with too small sizes). In this paper, we develop a novel learning algorithm for subgroup fairness which resolves these issues by focusing on subgroups with sufficient sample sizes as well as marginal fairness (fairness for each sensitive attribute). To this end, we formalize a notion of subgroup-subset fairness and introduce a corresponding distributional fairness measure called the supremum Integral Probability Metric (supIPM). Building on this formulation, we propose the Doubly Regressing Adversarial learning for subgroup Fairness (DRAF) algorithm, which reduces a surrogate fairness gap for supIPM with much less computation than directly reducing supIPM. Theoretically, we prove that the proposed surrogate fairness gap is an upper bound of supIPM. Empirically, we show that the DRAF algorithm outperforms baseline methods in benchmark datasets, specifically when the number of sensitive attributes is large so that many subgroups are very small.
著者: Antonio Di Noia, Federico Ravenda, Antonietta Mira
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Dimensionality reduction is a fundamental task in modern data science. Several projection methods specifically tailored to take into account the non-linearity of the data via local embeddings have been proposed. Such methods are often based on local neighbourhood structures and require tuning the number of neighbours that define this local structure, and the dimensionality of the lower-dimensional space onto which the data are projected. Such choices critically influence the quality of the resulting embedding. In this paper, we exploit a recently proposed intrinsic dimension estimator which also returns the optimal locally adaptive neighbourhood sizes according to some desirable criteria. In principle, this adaptive framework can be employed to perform an optimal hyper-parameter tuning of any dimensionality reduction algorithm that relies on local neighbourhood structures. Numerical experiments on both real-world and simulated datasets show that the proposed method can be used to significantly improve well-known projection methods when employed for various learning tasks, with improvements measurable through both quantitative metrics and the quality of low-dimensional visualizations.
privacy
著者: Staal A. Vinterbo
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Adding random noise to database query results is an important tool for achieving privacy. A challenge is to minimize this noise while still meeting privacy requirements. Recently, a sufficient and necessary condition for $(\epsilon, \delta)$-differential privacy for Gaussian noise was published. This condition allows the computation of the minimum privacy-preserving scale for this distribution. We extend this work and provide a sufficient and necessary condition for $(\epsilon, \delta)$-differential privacy for all symmetric and log-concave noise densities. Our results allow fine-grained tailoring of the noise distribution to the dimensionality of the query result. We demonstrate that this can yield significantly lower mean squared errors than those incurred by the currently used Laplace and Gaussian mechanisms for the same $\epsilon$ and $\delta$.
著者: Ariel Neufeld, Philipp Schmocker
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
In this paper, we derive an $L^p$-chaos expansion based on iterated Stratonovich integrals with respect to a given exponentially integrable continuous semimartingale. By omitting the orthogonality of the expansion, we show that every $p$-integrable functional, $p \in [1,\infty)$, can be approximated by a finite sum of iterated Stratonovich integrals. Using (possibly random) neural networks as integrands, we therefere obtain universal approximation results for $p$-integrable financial derivatives in the $L^p$-sense. Moreover, we can approximately solve the $L^p$-hedging problem (coinciding for $p = 2$ with the quadratic hedging problem), where the approximating hedging strategy can be computed in closed form within short runtime.
著者: Xi He, Max A. Little
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Algorithms for solving the linear classification problem have a long history, dating back at least to 1936 with linear discriminant analysis. For linearly separable data, many algorithms can obtain the exact solution to the corresponding 0-1 loss classification problem efficiently, but for data which is not linearly separable, it has been shown that this problem, in full generality, is NP-hard. Alternative approaches all involve approximations of some kind, such as the use of surrogates for the 0-1 loss (for example, the hinge or logistic loss), none of which can be guaranteed to solve the problem exactly. Finding an efficient, rigorously proven algorithm for obtaining an exact (i.e., globally optimal) solution to the 0-1 loss linear classification problem remains an open problem. By analyzing the combinatorial and incidence relations between hyperplanes and data points, we derive a rigorous construction algorithm, incremental cell enumeration (ICE), that can solve the 0-1 loss classification problem exactly in $O(N^{D+1})$. To the best of our knowledge, this is the first standalone algorithm-one that does not rely on general-purpose solvers-with rigorously proven guarantees for this problem. Moreover, we further generalize ICE to address the polynomial hypersurface classification problem in $O(N^{G+1})$ time, where $G$ is determined by both the data dimension and the polynomial hypersurface degree. The correctness of our algorithm is proved by the use of tools from the theory of hyperplane arrangements and oriented matroids. We demonstrate the effectiveness of our algorithm on real-world datasets, achieving optimal training accuracy for small-scale datasets and higher test accuracy on most datasets. Furthermore, our complexity analysis shows that the ICE algorithm offers superior computational efficiency compared with state-of-the-art branch-and-bound algorithm.
著者: Helder Rojas, Cirilo Alvarez, Nilton Rojas
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Information Value (IV) is a widely used technique for feature selection prior to the modeling phase, particularly in credit scoring and related domains. However, conventional IV-based practices rely on fixed empirical thresholds, which lack statistical justification and may be sensitive to characteristics such as class imbalance. In this work, we develop a formal statistical framework for IV by establishing its connection with Jeffreys divergence and propose a novel nonparametric hypothesis test, referred to as the J-Divergence test. Our method provides rigorous asymptotic guarantees and enables interpretable decisions based on \(p\)-values. Numerical experiments, including synthetic and real-world data, demonstrate that the proposed test is more reliable than traditional IV thresholding, particularly under strong imbalance. The test is model-agnostic, computationally efficient, and well-suited for the pre-modeling phase in high-dimensional or imbalanced settings. An open-source Python library is provided for reproducibility and practical adoption.
著者: Helder Rojas, Nilton Rojas, Espinoza J. B., Luis Huamanchumo
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
This paper presents a novel approach for the output range estimation problem in Deep Neural Networks (DNNs) by integrating a Simulated Annealing (SA) algorithm tailored to operate within constrained domains and ensure convergence towards global optima. The method effectively addresses the challenges posed by the lack of local geometric information and the high non-linearity inherent to DNNs, making it applicable to a wide variety of architectures, with a special focus on Residual Networks (ResNets) due to their practical importance. Unlike existing methods, our algorithm imposes minimal assumptions on the internal architecture of neural networks, thereby extending its usability to complex models. Theoretical analysis guarantees convergence, while extensive empirical evaluations-including optimization tests involving functions with multiple local minima-demonstrate the robustness of our algorithm in navigating non-convex response surfaces. The experimental results highlight the algorithm's efficiency in accurately estimating DNN output ranges, even in scenarios characterized by high non-linearity and complex constraints. For reproducibility, Python codes and datasets used in the experiments are publicly available through our GitHub repository.
著者: Chen Xu, Qiang Wang, Lijun Sun
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Travel time estimation is a key task in navigation apps and web mapping services. Existing deterministic and probabilistic methods, based on the assumption of trip independence, predominantly focus on modeling individual trips while overlooking trip correlations. However, real-world conditions frequently introduce strong correlations between trips, influenced by external and internal factors such as weather and the tendencies of drivers. To address this, we propose a deep hierarchical joint probabilistic model ProbETA for travel time estimation, capturing both inter-trip and intra-trip correlations. The joint distribution of travel times across multiple trips is modeled as a low-rank multivariate Gaussian, parameterized by learnable link representations estimated using the empirical Bayes approach. We also introduce a data augmentation method based on trip sub-sampling, allowing for fine-grained gradient backpropagation when learning link representations. During inference, our model estimates the probability distribution of travel time for a queried trip, conditional on spatiotemporally adjacent completed trips. Evaluation on two real-world GPS trajectory datasets demonstrates that ProbETA outperforms state-of-the-art deterministic and probabilistic baselines, with Mean Absolute Percentage Error decreasing by over 12.60%. Moreover, the learned link representations align with the physical network geometry, potentially making them applicable for other tasks.
著者: Pratik Rathore, Zachary Frangella, Jiaming Yang, Micha{\l} Derezi\'nski, Madeleine Udell
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Kernel ridge regression (KRR) is a fundamental computational tool, appearing in problems that range from computational chemistry to health analytics, with a particular interest due to its starring role in Gaussian process regression. However, full KRR solvers are challenging to scale to large datasets: both direct (i.e., Cholesky decomposition) and iterative methods (i.e., PCG) incur prohibitive computational and storage costs. The standard approach to scale KRR to large datasets chooses a set of inducing points and solves an approximate version of the problem, inducing points KRR. However, the resulting solution tends to have worse predictive performance than the full KRR solution. In this work, we introduce a new solver, ASkotch, for full KRR that provides better solutions faster than state-of-the-art solvers for full and inducing points KRR. ASkotch is a scalable, accelerated, iterative method for full KRR that provably obtains linear convergence. Under appropriate conditions, we show that ASkotch obtains condition-number-free linear convergence. This convergence analysis rests on the theory of ridge leverage scores and determinantal point processes. ASkotch outperforms state-of-the-art KRR solvers on a testbed of 23 large-scale KRR regression and classification tasks derived from a wide range of application domains, demonstrating the superiority of full KRR over inducing points KRR. Our work opens up the possibility of as-yet-unimagined applications of full KRR across a number of disciplines.
著者: Matthew J. Vowels, Mathieu Rochat, Sina Akbari
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Artificial Neural Networks (ANNs), including fully-connected networks and transformers, are highly flexible and powerful function approximators, widely applied in fields like computer vision and natural language processing. However, their inability to inherently respect causal structures can limit their robustness, making them vulnerable to covariate shift and difficult to interpret/explain. This poses significant challenges for their reliability in real-world applications. In this paper, we introduce Causal Transformers (CaTs), a general model class designed to operate under predefined causal constraints, as specified by a Directed Acyclic Graph (DAG). CaTs retain the powerful function approximation abilities of traditional neural networks while adhering to the underlying structural constraints, improving robustness, reliability, and interpretability at inference time. This approach opens new avenues for deploying neural networks in more demanding, real-world scenarios where robustness and explainability is critical.
著者: HanQin Cai, Chandra Kundu, Jialin Liu, Wotao Yin
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Robust matrix completion (RMC) is a widely used machine learning tool that simultaneously tackles two critical issues in low-rank data analysis: missing data entries and extreme outliers. This paper proposes a novel scalable and learnable non-convex approach, coined Learned Robust Matrix Completion (LRMC), for large-scale RMC problems. LRMC enjoys low computational complexity with linear convergence. Motivated by the proposed theorem, the free parameters of LRMC can be effectively learned via deep unfolding to achieve optimum performance. Furthermore, this paper proposes a flexible feedforward-recurrent-mixed neural network framework that extends deep unfolding from fix-number iterations to infinite iterations. The superior empirical performance of LRMC is verified with extensive experiments against state-of-the-art on synthetic datasets and real applications, including video background subtraction, ultrasound imaging, face modeling, and cloud removal from satellite imagery.
著者: Chengzhu Huang, Yuqi Gu
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
In mixture models, anisotropic noise within each cluster is widely present in real-world data. This work investigates both computationally efficient procedures and fundamental statistical limits for clustering in high-dimensional anisotropic mixtures. We propose a new clustering method, Covariance Projected Spectral Clustering (COPO), which adapts to a wide range of dependent noise structures. We first project the data onto a low-dimensional space via eigen-decomposition of a diagonal-deleted Gram matrix. Our central methodological idea is to sharpen clustering in this embedding space by a covariance-aware reassignment step, using quadratic distances induced by estimated projected covariances. Through a novel row-wise analysis of the subspace estimation step in weak-signal regimes, which is of independent interest, we establish tight performance guarantees and algorithmic upper bounds for COPO, covering both Gaussian noise with flexible covariance and general noise with local dependence. To characterize the fundamental difficulty of clustering high-dimensional anisotropic Gaussian mixtures, we further establish two distinct and complementary minimax lower bounds, each highlighting different covariance-driven barriers. Our results show that COPO attains minimax-optimal misclustering rates in Gaussian settings. Extensive simulation studies across diverse noise structures, along with a real data application, demonstrate the superior empirical performance of our method.
著者: Marten Lienen, Marcel Kollovieh, Stephan G\"unnemann
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
We derive a novel generative model from iterative Gaussian posterior inference. By treating the generated sample as an unknown variable, we can formulate the sampling process in the language of Bayesian probability. Our model uses a sequence of prediction and posterior update steps to iteratively narrow down the unknown sample starting from a broad initial belief. In addition to a rigorous theoretical analysis, we establish a connection between our model and diffusion models and show that it includes Bayesian Flow Networks (BFNs) as a special case. In our experiments, we demonstrate that our model improves sample quality on ImageNet32 over both BFNs and the closely related Variational Diffusion Models, while achieving equivalent log-likelihoods on ImageNet32 and ImageNet64. Find our code at https://github.com/martenlienen/bsi.
著者: Pranjal Awasthi, Sreenivas Gollapudi, Ravi Kumar, Kamesh Munagala
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
In zeroth-order optimization, we seek to minimize a function $d(\cdot)$, which may encode combinatorial feasibility, using only function evaluations. We focus on the setting where solutions must also satisfy qualitative constraints or conform to a complex prior distribution. To address this, we introduce a new framework in which such constraints are represented by an initial generative prior $\L(\cdot)$, for example, a Large Language Model (LLM). The objective is to find solutions $s$ that minimize $d(s)$ while having high probability under $\L(s)$, effectively sampling from a target distribution proportional to $\L(s) \cdot e^{-T \cdot d(s)}$ for a temperature parameter $T$. While this framework aligns with classical Model-Based Optimization (e.g., the Cross-Entropy method), existing theory is ill-suited for deriving sample complexity bounds in black-box deep generative models. We therefore propose a novel learning assumption, which we term \emph{coarse learnability}, where an agent with access to a polynomial number of samples can learn a model whose point-wise density approximates the target within a polynomial factor. Leveraging this assumption, we design an iterative algorithm that employs a Metropolis-Hastings correction to provably approximate the target distribution using a polynomial number of samples. To the best of our knowledge, this is one of the first works to establish such sample-complexity guarantees for model-based optimization with deep generative priors. We provide two lines of evidence supporting the coarse learnability assumption. Theoretically, we show that maximum likelihood estimation naturally induces the required coverage properties, holding for both standard exponential families and for misspecified models. Empirically, we demonstrate that LLMs can adapt their learned distributions to zeroth-order feedback to solve combinatorial optimization problems.
著者: O. Deniz Akyildiz, Pierre del Moral, Joaquin Miguez
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
We develop a novel stability theory for Sinkhorn semigroups based on Lyapunov techniques and quantitative contraction coefficients, and establish exponential convergence of Sinkhorn iterations on weighted Banach spaces. This operator-theoretic framework yields explicit exponential decay rates of Sinkhorn iterates toward Schr\"odinger bridges with respect to a broad class of $\phi$-divergences and Kantorovich-type distances, including relative entropy, squared Hellinger integrals, $\alpha$-divergences, weighted total variation norms, and Wasserstein distances. To the best of our knowledge, these results provide the first systematic contraction inequalities of this kind for entropic transport and the Sinkhorn algorithm. We further introduce Lyapunov contraction principles under minimal regularity assumptions, leading to quantitative exponential stability estimates for a large family of Sinkhorn semigroups. The framework applies to models with polynomially growing potentials and heavy-tailed marginals on general normed spaces, as well as to more structured boundary state-space models, including semicircle transitions and Beta, Weibull, and exponential marginals, together with semi-compact settings. Finally, our approach extends naturally to statistical finite mixtures of such models, including kernel-based density estimators arising in modern generative modeling.
著者: Braghadeesh Lakshminarayanan, Margarita A. Guerrero, Cristian R. Rojas
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Many industries now deploy high-fidelity simulators (digital twins) to represent physical systems, yet their parameters must be calibrated to match the true system. This motivated the construction of simulation-driven parameter estimators, built by generating synthetic observations for sampled parameter values and learning a supervised mapping from observations to parameters. However, when the true parameters lie outside the sampled range, predictions suffer from an out-of-distribution (OOD) error. This paper introduces a fine-tuning approach for the Two-Stage estimator that mitigates OOD effects and improves accuracy. The effectiveness of the proposed method is verified through numerical simulations.
著者: Manar D. Samad, Kazi Fuad B. Akhter, Shourav B. Rabbani, Ibna Kowsar
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Tabular data sets with varying missing values are prepared for machine learning using an arbitrary imputation strategy. Synthetic values generated by imputation models often raise concerns regarding data quality and the reliability of data-driven outcomes. To address these concerns, this article proposes an imputation-free incremental attention learning (IFIAL) method for tabular data with missing values. A pair of attention masks is derived and retrofitted to a transformer to directly streamline tabular data without imputing or initializing missing values. The proposed method incrementally learns partitions of overlapping and fixed-size feature sets to enhance the performance of the transformer. The average classification performance rank order across 17 diverse tabular data sets highlights the superiority of IFIAL over 11 state-of-the-art learning methods with or without missing value imputations. Additional experiments corroborate the robustness of IFIAL to varying types and proportions of missing data, demonstrating its superiority over methods that rely on explicit imputations. A feature partition size equal to one-half the original feature space yields the best trade-off between computational efficiency and predictive performance. IFIAL is one of the first solutions that enables deep attention models to learn directly from tabular data, eliminating the need to impute missing values. %without the need for imputing missing values. The source code for this paper is publicly available.
著者: Qin-Cheng Zheng, Shao-Qun Zhang, Shen-Huan Lyu, Yuan Jiang, Zhi-Hua Zhou
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Isolation Forest (iForest) stands out as a widely-used unsupervised anomaly detector, primarily owing to its remarkable runtime efficiency and superior performance in large-scale tasks. Despite its widespread adoption, a theoretical foundation explaining iForest's success remains unclear. This paper focuses on the inductive bias of iForest, which theoretically elucidates under what circumstances and to what extent iForest works well. The key is to formulate the growth process of iForest, where the split dimensions and split values are randomly selected. We model the growth process of iForest as a random walk, enabling us to derive the expected depth function, which is the outcome of iForest, using transition probabilities. The case studies reveal key inductive biases: iForest exhibits lower sensitivity to central anomalies while demonstrating greater parameter adaptability compared to $k$-Nearest Neighbor. Our study provides a theoretical understanding of the effectiveness of iForest and establishes a foundation for further theoretical exploration.
著者: TaeJun Ha, Chaehyun Jung, Hyeonuk Kim, Jeongwoo Park, Jeonghun Park
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
In orthogonal frequency division multiplexing (OFDM), accurate channel estimation is crucial. Classical signal processing-based approaches, such as linear minimum mean-squared error (LMMSE) estimation, often require second-order statistics that are difficult to obtain in practice. Recent deep neural network (DNN)-based methods have been introduced to address this; yet they often suffer from high inference complexity. This paper proposes an Attention-aided MMSE (A-MMSE), a model-based DNN framework that learns the linear MMSE filter via the Attention Transformer. Once trained, the A-MMSE performs channel estimation through a single linear operation, eliminating nonlinear activations during inference and thus reducing computational complexity. To improve the learning efficiency of the A-MMSE, we develop a two-stage Attention encoder that captures the frequency and temporal correlation structure of OFDM channels. We also introduce a rank-adaptive extension that enables a flexible performance-complexity trade-off. Numerical simulations show that the proposed A-MMSE consistently outperforms other baseline methods in terms of normalized MSE across a wide range of signal-to-noise ratio (SNR) conditions. In particular, the A-MMSE and its rank-adaptive extension provide an improved performance-complexity trade-off, providing a powerful and highly efficient solution for practical channel estimation.
著者: Artin Tajdini, Jonathan Scarlett, Kevin Jamieson
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
We study stochastic linear bandits with heavy-tailed rewards, where the rewards have a finite $(1+\epsilon)$-absolute central moment bounded by $\upsilon$ for some $\epsilon \in (0,1]$. We improve both upper and lower bounds on the minimax regret compared to prior work. When $\upsilon = \mathcal{O}(1)$, the best prior known regret upper bound is $\tilde{\mathcal{O}}(d T^{\frac{1}{1+\epsilon}})$. While a lower with the same scaling has been given, it relies on a construction using $\upsilon = \mathcal{O}(d)$, and adapting the construction to the bounded-moment regime with $\upsilon = \mathcal{O}(1)$ yields only a $\Omega(d^{\frac{\epsilon}{1+\epsilon}} T^{\frac{1}{1+\epsilon}})$ lower bound. This matches the known rate for multi-armed bandits and is generally loose for linear bandits, in particular being $\sqrt{d}$ below the optimal rate in the finite-variance case ($\epsilon = 1$). We propose a new elimination-based algorithm guided by experimental design, which achieves regret $\tilde{\mathcal{O}}(d^{\frac{1+3\epsilon}{2(1+\epsilon)}} T^{\frac{1}{1+\epsilon}})$, thus improving the dependence on $d$ for all $\epsilon \in (0,1)$ and recovering a known optimal result for $\epsilon = 1$. We also establish a lower bound of $\Omega(d^{\frac{2\epsilon}{1+\epsilon}} T^{\frac{1}{1+\epsilon}})$, which strictly improves upon the multi-armed bandit rate and highlights the hardness of heavy-tailed linear bandit problems. For finite action sets, we derive similarly improved upper and lower bounds for regret. Finally, we provide action set dependent regret upper bounds showing that for some geometries, such as $l_p$-norm balls for $p \le 1 + \epsilon$, we can further reduce the dependence on $d$, and we can handle infinite-dimensional settings via the kernel trick, in particular establishing new regret bounds for the Mat\'ern kernel that are the first to be sublinear for all $\epsilon \in (0, 1]$.
著者: Shijian Xu, Marcello Massimo Negri, Volker Roth
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Deep learning has achieved remarkable success across many domains, but it has also created a growing demand for interpretability in model predictions. Although many explainable machine learning methods have been proposed, post-hoc explanations lack guaranteed fidelity and are sensitive to hyperparameter choices, highlighting the appeal of inherently interpretable models. For example, linear regression provides clear feature effects through its coefficients. However, such models are often outperformed by more complex neural networks (NNs) that usually lack inherent interpretability. To address this dilemma, we introduce NIMO, a framework that combines inherent interpretability with the expressive power of neural networks. Building on the simple linear regression, NIMO is able to provide flexible and intelligible feature effects. Relevantly, we develop an optimization method based on parameter elimination, that allows for optimizing the NN parameters and linear coefficients effectively and efficiently. By relying on adaptive ridge regression we can easily incorporate sparsity as well. We show empirically that our model can provide faithful and intelligible feature effects while maintaining good predictive performance.
著者: Noam Razin, Yong Lin, Jiarui Yao, Sanjeev Arora
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Reward models are key to language model post-training and inference pipelines. Conveniently, recent work showed that every language model defines an implicit reward model (IM-RM), without requiring any architectural changes. However, such IM-RMs tend to generalize worse, especially out-of-distribution, compared to explicit reward models (EX-RMs) that apply a dedicated linear head over the hidden representations of a language model. The existence of a generalization gap is puzzling, as EX-RMs and IM-RMs are nearly identical. They can be trained using the same data, loss function, and language model, and differ only in how the reward is computed. Toward a fundamental understanding of the implicit biases underlying different reward model types, we investigate the root cause of this gap. Our main finding, backed by theory and experiments, is that IM-RMs rely more heavily on superficial token-level cues. Consequently, they often generalize worse than EX-RMs under token-level distribution shifts, as well as in-distribution. Furthermore, we provide evidence against alternative hypotheses for the generalization gap. Most notably, we challenge the claim that IM-RMs struggle in tasks where generation is harder than verification because they can operate both as a verifier and a generator. Overall, our results highlight that seemingly minor design choices can substantially impact the generalization behavior of reward models.
著者: Oleg Filatov, Jiangtao Wang, Jan Ebert, Stefan Kesselheim
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Despite recent progress in optimal hyperparameter transfer under model and dataset scaling, no unifying explanatory principle has been established. For Adam and Scion optimizers, we discover that joint optimal scaling across model and dataset sizes is conditioned on a single invariant: the operator norm of the output layer. Across models with up to 1.3B parameters trained on up to 138B tokens, the optimal learning rate/batch size pair $(\eta^{\ast}, B^{\ast})$ consistently has the same operator norm value - a phenomenon we term norm transfer. This constant norm condition is necessary but not sufficient: while for each dataset size, multiple $(\eta, B)$ reach the optimal norm, only a unique $(\eta^{\ast}, B^{\ast})$ achieves the best loss. As a sufficient condition, we provide the first measurement of $(\eta^{\ast}, B^{\ast})$ scaling with dataset size for Scion, and find that the scaling rules are consistent with those of Adam. Tuning per-layer-group learning rates also improves model performance, with the output layer being the most sensitive and hidden layers benefiting from lower learning rates. We provide practical insights on norm-guided optimal scaling and release our Distributed Scion (Disco) implementation with logs from over two thousand runs to support research on LLM training dynamics at scale.
著者: Mang Li, Wei Lyu
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
The one-epoch overfitting problem has drawn widespread attention, especially in CTR and CVR estimation models in search, advertising, and recommendation domains. These models which rely heavily on large-scale sparse categorical features, often suffer a significant decline in performance when trained for multiple epochs. Although recent studies have proposed heuristic solutions, the fundamental cause of this phenomenon remains unclear. In this work, we present a theoretical explanation grounded in Rademacher complexity, supported by empirical experiments, to explain why overfitting occurs in models with large-scale sparse categorical features. Based on this analysis, we propose a regularization method that constrains the norm budget of embedding layers adaptively. Our approach not only prevents the severe performance degradation observed during multi-epoch training, but also improves model performance within a single epoch. This method has already been deployed in online production systems.
著者: Ben Gao, Jordan Patracone, St\'ephane Chr\'etien, Olivier Alata
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
We introduce Conformal Online Learning of Koopman embeddings (COLoKe), a novel framework for adaptively updating Koopman-invariant representations of nonlinear dynamical systems from streaming data. Our modeling approach combines deep feature learning with multistep prediction consistency in the lifted space, where the dynamics evolve linearly. To prevent overfitting, COLoKe employs a conformal-style mechanism that shifts the focus from evaluating the conformity of new states to assessing the consistency of the current Koopman model. Updates are triggered only when the current model's prediction error exceeds a dynamically calibrated threshold, allowing selective refinement of the Koopman operator and embedding. Empirical results on benchmark dynamical systems demonstrate the effectiveness of COLoKe in maintaining long-term predictive accuracy while significantly reducing unnecessary updates and avoiding overfitting.
著者: Sofiya Zaichyk
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
Statistical learning under distributional drift remains insufficiently characterized: when each observation alters the data-generating law, classical generalization bounds can collapse. We introduce a new statistical primitive, the reproducibility budget $C_T$, which quantifies a system's finite capacity for statistical reproducibility: the extent to which its sampling process can remain governed by a consistent underlying distribution in the presence of both exogenous change and endogenous feedback. Formally, $C_T$ is defined as the cumulative Fisher-Rao path length of the coupled learner-environment evolution, measuring the total distributional motion accumulated during learning. From this construct we derive a drift-feedback generalization bound of order $O(T^{-1/2} + C_T/T)$, and we prove a matching minimax lower bound showing that this rate is minimax-optimal. Consequently, the results establish a reproducibility speed limit: no algorithm can achieve smaller worst-case generalization error than that imposed by the average Fisher-Rao drift rate $C_T/T$ of the data-generating process. The framework situates exogenous drift, adaptive data analysis, and performative prediction within a common geometric structure, with $C_T$ emerging as the intrinsic quantity measuring distributional motion across these settings.
著者: Baolin Chen, Mengfei Ran
公開日: Wed, 28 Jan 2026 00:00:00 -0500
要約:
In modern biomedical and econometric studies, longitudinal processes are often characterized by complex time-varying associations and abrupt regime shifts that are shared across correlated outcomes. Standard functional data analysis (FDA) methods, which prioritize smoothness, often fail to capture these dynamic structural features, particularly in high-dimensional settings. This article introduces Adaptive Joint Learning (AJL), a hierarchical regularization framework designed to integrate functional variable selection with structural changepoint detection in multivariate time-varying coefficient models. Unlike standard simultaneous estimation approaches, we propose a theoretically grounded two-stage screening-and-refinement procedure. This framework first synergizes adaptive group-wise penalization with sure screening principles to robustly identify active predictors, followed by a refined fused regularization step that effectively borrows strength across multiple outcomes to detect local regime shifts. We provide a rigorous theoretical analysis of the estimator in the ultra-high-dimensional regime (p >> n). Crucially, we establish the sure screening consistency of the first stage, which serves as the foundation for proving that the refined estimator achieves the oracle property-performing as well as if the true active set and changepoint locations were known a priori. A key theoretical contribution is the explicit handling of approximation bias via undersmoothing conditions to ensure valid asymptotic inference. The proposed method is validated through comprehensive simulations and an application to Sleep-EDF data, revealing novel dynamic patterns in physiological states.
生成日時: 2026-01-28 18:00:02