arXiv論文一覧 - stat.ML updates on arXiv.org

#1 Gradient-based Active Learning with Gaussian Processes for Global Sensitivity Analysis

著者: Guerlain Lambert, C\'eline Helbert, Claire Lauvernet

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.11790

要約:
Global sensitivity analysis of complex numerical simulators is often limited by the small number of model evaluations that can be afforded. In such settings, surrogate models built from a limited set of simulations can substantially reduce the computational burden, provided that the design of computer experiments is enriched efficiently. In this context, we propose an active learning approach that, for a fixed evaluation budget, targets the most informative regions of the input space to improve sensitivity analysis accuracy. More specifically, our method builds on recent advances in active learning for sensitivity analysis (Sobol' indices and derivative-based global sensitivity measures, DGSM) that exploit derivatives obtained from a Gaussian process (GP) surrogate. By leveraging the joint posterior distribution of the GP gradient, we develop acquisition functions that better account for correlations between partial derivatives and their impact on the response surface, leading to a more comprehensive and robust methodology than existing DGSM-oriented criteria. The proposed approach is first compared to state-of-the-art methods on standard benchmark functions, and is then applied to a real environmental model of pesticide transfers.

#2 A Kernel Approach for Semi-implicit Variational Inference

著者: Longlin Yu, Ziheng Cheng, Shiyue Zhang, Cheng Zhang

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.12023

要約:
Semi-implicit variational inference (SIVI) enhances the expressiveness of variational families through hierarchical semi-implicit distributions, but the intractability of their densities makes standard ELBO-based optimization biased. Recent score-matching approaches to SIVI (SIVI-SM) address this issue via a minimax formulation, at the expense of an additional lower-level optimization problem. In this paper, we propose kernel semi-implicit variational inference (KSIVI), a principled and tractable alternative that eliminates the lower-level optimization by leveraging kernel methods. We show that when optimizing over a reproducing kernel Hilbert space, the lower-level problem admits an explicit solution, reducing the objective to the kernel Stein discrepancy (KSD). Exploiting the hierarchical structure of semi-implicit distributions, the resulting KSD objective can be efficiently optimized using stochastic gradient methods. We establish optimization guarantees via variance bounds on Monte Carlo gradient estimators and derive statistical generalization bounds of order $\tilde{\mathcal{O}}(1/\sqrt{n})$. We further introduce a multi-layer hierarchical extension that improves expressiveness while preserving tractability. Empirical results on synthetic and real-world Bayesian inference tasks demonstrate the effectiveness of KSIVI.

#3 On the Provable Suboptimality of Momentum SGD in Nonstationary Stochastic Optimization

著者: Sharan Sahu, Cameron J. Hogan, Martin T. Wells

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.12238

要約:
While momentum-based acceleration has been studied extensively in deterministic optimization problems, its behavior in nonstationary environments -- where the data distribution and optimal parameters drift over time -- remains underexplored. We analyze the tracking performance of Stochastic Gradient Descent (SGD) and its momentum variants (Polyak heavy-ball and Nesterov) under uniform strong convexity and smoothness in varying stepsize regimes. We derive finite-time bounds in expectation and with high probability for the tracking error, establishing a sharp decomposition into three components: a transient initialization term, a noise-induced variance term, and a drift-induced tracking lag. Crucially, our analysis uncovers a fundamental trade-off: while momentum can suppress gradient noise, it incurs an explicit penalty on the tracking capability. We show that momentum can substantially amplify drift-induced tracking error, with amplification that becomes unbounded as the momentum parameter approaches one, formalizing the intuition that using 'stale' gradients hinders adaptation to rapid regime shifts. Complementing these upper bounds, we establish minimax lower bounds for dynamic regret under gradient-variation constraints. These lower bounds prove that the inertia-induced penalty is not an artifact of analysis but an information-theoretic barrier: in drift-dominated regimes, momentum creates an unavoidable 'inertia window' that fundamentally degrades performance. Collectively, these results provide a definitive theoretical grounding for the empirical instability of momentum in dynamic environments and delineate the precise regime boundaries where SGD provably outperforms its accelerated counterparts.

#4 A Theory of Diversity for Random Matrices with Applications to In-Context Learning of Schr\"odinger Equations

著者: Frank Cole, Yulong Lu, Shaurya Sehgal

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.12587

要約:
We address the following question: given a collection $\{\mathbf{A}^{(1)}, \dots, \mathbf{A}^{(N)}\}$ of independent $d \times d$ random matrices drawn from a common distribution $\mathbb{P}$, what is the probability that the centralizer of $\{\mathbf{A}^{(1)}, \dots, \mathbf{A}^{(N)}\}$ is trivial? We provide lower bounds on this probability in terms of the sample size $N$ and the dimension $d$ for several families of random matrices which arise from the discretization of linear Schr\"odinger operators with random potentials. When combined with recent work on machine learning theory, our results provide guarantees on the generalization ability of transformer-based neural networks for in-context learning of Schr\"odinger equations.

#5 Approximate full conformal prediction in RKHS

著者: Davidson Lova Razafindrakoto, Alain Celisse, J\'er\^ome Lacaille

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.13102

要約:
Full conformal prediction is a framework that implicitly formulates distribution-free confidence prediction regions for a wide range of estimators. However, a classical limitation of the full conformal framework is the computation of the confidence prediction regions, which is usually impossible since it requires training infinitely many estimators (for real-valued prediction for instance). The main purpose of the present work is to describe a generic strategy for designing a tight approximation to the full conformal prediction region that can be efficiently computed. Along with this approximate confidence region, a theoretical quantification of the tightness of this approximation is developed, depending on the smoothness assumptions on the loss and score functions. The new notion of thickness is introduced for quantifying the discrepancy between the approximate confidence region and the full conformal one.

#6 Empirical Risk Minimization with $f$-Divergence Regularization

著者: Francisco Daunas, I\~naki Esnaola, Samir M. Perlaza, H. Vincent Poor

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.13191

要約:
In this paper, the solution to the empirical risk minimization problem with $f$-divergence regularization (ERM-$f$DR) is presented and conditions under which the solution also serves as the solution to the minimization of the expected empirical risk subject to an $f$-divergence constraint are established. The proposed approach extends applicability to a broader class of $f$-divergences than previously reported and yields theoretical results that recover previously known results. Additionally, the difference between the expected empirical risk of the ERM-$f$DR solution and that of its reference measure is characterized, providing insights into previously studied cases of $f$-divergences. A central contribution is the introduction of the normalization function, a mathematical object that is critical in both the dual formulation and practical computation of the ERM-$f$DR solution. This work presents an implicit characterization of the normalization function as a nonlinear ordinary differential equation (ODE), establishes its key properties, and subsequently leverages them to construct a numerical algorithm for approximating the normalization factor under mild assumptions. Further analysis demonstrates structural equivalences between ERM-$f$DR problems with different $f$-divergences via transformations of the empirical risk. Finally, the proposed algorithm is used to compute the training and test risks of ERM-$f$DR solutions under different $f$-divergence regularizers. This numerical example highlights the practical implications of choosing different functions $f$ in ERM-$f$DR problems.

#7 Distribution-Free Confidence Ellipsoids for Ridge Regression with PAC Bounds

著者: Szabolcs Szentp\'eteri, Bal\'azs Csan\'ad Cs\'aji

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.13436

要約:
Linearly parametrized models are widely used in control and signal processing, with the least-squares (LS) estimate being the archetypical solution. When the input is insufficiently exciting, the LS problem may be unsolvable or numerically unstable. This issue can be resolved through regularization, typically with ridge regression. Although regularized estimators reduce the variance error, it remains important to quantify their estimation uncertainty. A possible approach for linear regression is to construct confidence ellipsoids with the Sign-Perturbed Sums (SPS) ellipsoidal outer approximation (EOA) algorithm. The SPS EOA builds non-asymptotic confidence ellipsoids under the assumption that the noises are independent and symmetric about zero. This paper introduces an extension of the SPS EOA algorithm to ridge regression, and derives probably approximately correct (PAC) upper bounds for the resulting region sizes. Compared with previous analyses, our result explicitly show how the regularization parameter affects the region sizes, and provide tighter bounds under weaker excitation assumptions. Finally, the practical effect of regularization is also demonstrated via simulation experiments.

#8 Labels or Preferences? Budget-Constrained Learning with Human Judgments over AI-Generated Outputs

著者: Zihan Dong, Ruijia Wu, Linjun Zhang

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.13458

要約:
The increasing reliance on human preference feedback to judge AI-generated pseudo labels has created a pressing need for principled, budget-conscious data acquisition strategies. We address the crucial question of how to optimally allocate a fixed annotation budget between ground-truth labels and pairwise preferences in AI. Our solution, grounded in semi-parametric inference, casts the budget allocation problem as a monotone missing data framework. Building on this formulation, we introduce Preference-Calibrated Active Learning (PCAL), a novel method that learns the optimal data acquisition strategy and develops a statistically efficient estimator for functionals of the data distribution. Theoretically, we prove the asymptotic optimality of our PCAL estimator and establish a key robustness guarantee that ensures robust performance even with poorly estimated nuisance models. Our flexible framework applies to a general class of problems, by directly optimizing the estimator's variance instead of requiring a closed-form solution. This work provides a principled and statistically efficient approach for budget-constrained learning in modern AI. Simulations and real-data analysis demonstrate the practical benefits and superior performance of our proposed method.

#9 Small Gradient Norm Regret for Online Convex Optimization

著者: Wenzhi Gao, Chang He, Madeleine Udell

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.13519

要約:
This paper introduces a new problem-dependent regret measure for online convex optimization with smooth losses. The notion, which we call the $G^\star$ regret, depends on the cumulative squared gradient norm evaluated at the decision in hindsight $\sum_{t=1}^T \|\nabla \ell(x^\star)\|^2$. We show that the $G^\star$ regret strictly refines the existing $L^\star$ (small loss) regret, and that it can be arbitrarily sharper when the losses have vanishing curvature around the hindsight decision. We establish upper and lower bounds on the $G^\star$ regret and extend our results to dynamic regret and bandit settings. As a byproduct, we refine the existing convergence analysis of stochastic optimization algorithms in the interpolation regime. Some experiments validate our theoretical findings.

#10 Sample Complexity of Average-Reward Q-Learning: From Single-agent to Federated Reinforcement Learning

agent

著者: Yuchen Jiao, Jiin Woo, Gen Li, Gauri Joshi, Yuejie Chi

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.13642

要約:
Average-reward reinforcement learning offers a principled framework for long-term decision-making by maximizing the mean reward per time step. Although Q-learning is a widely used model-free algorithm with established sample complexity in discounted and finite-horizon Markov decision processes (MDPs), its theoretical guarantees for average-reward settings remain limited. This work studies a simple but effective Q-learning algorithm for average-reward MDPs with finite state and action spaces under the weakly communicating assumption, covering both single-agent and federated scenarios. For the single-agent case, we show that Q-learning with carefully chosen parameters achieves sample complexity $\widetilde{O}\left(\frac{|\mathcal{S}||\mathcal{A}|\|h^{\star}\|_{\mathsf{sp}}^3}{\varepsilon^3}\right)$, where $\|h^{\star}\|_{\mathsf{sp}}$ is the span norm of the bias function, improving previous results by at least a factor of $\frac{\|h^{\star}\|_{\mathsf{sp}}^2}{\varepsilon^2}$. In the federated setting with $M$ agents, we prove that collaboration reduces the per-agent sample complexity to $\widetilde{O}\left(\frac{|\mathcal{S}||\mathcal{A}|\|h^{\star}\|_{\mathsf{sp}}^3}{M\varepsilon^3}\right)$, with only $\widetilde{O}\left(\frac{\|h^{\star}\|_{\mathsf{sp}}}{\varepsilon}\right)$ communication rounds required. These results establish the first federated Q-learning algorithm for average-reward MDPs, with provable efficiency in both sample and communication complexity.

#11 Unified Unbiased Variance Estimation for MMD: Robust Finite-Sample Performance with Imbalanced Data and Exact Acceleration under Null and Alternative Hypotheses

著者: Shijie Zhong, Jiangfeng Fu, Yikun Yang

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.13874

要約:
The maximum mean discrepancy (MMD) is a kernel-based nonparametric statistic for two-sample testing, whose inferential accuracy depends critically on variance characterization. Existing work provides various finite-sample estimators of the MMD variance, often differing under the null and alternative hypotheses and across balanced or imbalanced sampling schemes. In this paper, we study the variance of the MMD statistic through its U-statistic representation and Hoeffding decomposition, and establish a unified finite-sample characterization covering different hypotheses and sample configurations. Building on this analysis, we propose an exact acceleration method for the univariate case under the Laplacian kernel, which reduces the overall computational complexity from $\mathcal O(n^2)$ to $\mathcal O(n \log n)$.

#12 Intermittent time series forecasting: local vs global models

著者: Stefano Damato, Nicol\`o Rubattu, Dario Azzimonti, Giorgio Corani

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.14031

要約:
Intermittent time series, characterised by the presence of a significant amount of zeros, constitute a large percentage of inventory items in supply chain. Probabilistic forecasts are needed to plan the inventory levels; the predictive distribution should cover non-negative values, have a mass in zero and a long upper tail. Intermittent time series are commonly forecast using local models, which are trained individually on each time series. In the last years global models, which are trained on a large collection of time series, have become popular for time series forecasting. Global models are often based on neural networks. However, they have not yet been exhaustively tested on intermittent time series. We carry out the first study comparing state-of-the-art local (iETS, TweedieGP) and global models (D-Linear, DeepAR, Transformers) on intermittent time series. For neural networks models we consider three different distribution heads suitable for intermittent time series: negative binomial, hurdle-shifted negative binomial and Tweedie. We use, for the first time, the last two distribution heads with neural networks. We perform experiments on five large datasets comprising more than 40'000 real-world time series. Among neural networks D-Linear provides best accuracy; it also consistently outperforms the local models. Moreover, it has also low computational requirements. Transformers-based architectures are instead much more computationally demanding and less accurate. Among the distribution heads, the Tweedie provides the best estimates of the highest quantiles, while the negative binomial offers overall the best performance.

#13 Verifying Physics-Informed Neural Network Fidelity using Classical Fisher Information from Differentiable Dynamical System

著者: Josafat Ribeiro Leal Filho, Ant\^onio Augusto Fr\"ohlich

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.11638

要約:
Physics-Informed Neural Networks (PINNs) have emerged as a powerful tool for solving differential equations and modeling physical systems by embedding physical laws into the learning process. However, rigorously quantifying how well a PINN captures the complete dynamical behavior of the system, beyond simple trajectory prediction, remains a challenge. This paper proposes a novel experimental framework to address this by employing Fisher information for differentiable dynamical systems, denoted $g_F^C$. This Fisher information, distinct from its statistical counterpart, measures inherent uncertainties in deterministic systems, such as sensitivity to initial conditions, and is related to the phase space curvature and the net stretching action of the state space evolution. We hypothesize that if a PINN accurately learns the underlying dynamics of a physical system, then the Fisher information landscape derived from the PINN's learned equations of motion will closely match that of the original analytical model. This match would signify that the PINN has achieved comprehensive fidelity capturing not only the state evolution but also crucial geometric and stability properties. We outline an experimental methodology using the dynamical model of a car to compute and compare $g_F^C$ for both the analytical model and a trained PINN. The comparison, based on the Jacobians of the respective system dynamics, provides a quantitative measure of the PINN's fidelity in representing the system's intricate dynamical characteristics.

#14 Stability and Accuracy Trade-offs in Statistical Estimation

著者: Abhinav Chakraborty, Yuetian Luo, Rina Foygel Barber

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.11701

要約:
Algorithmic stability is a central concept in statistics and learning theory that measures how sensitive an algorithm's output is to small changes in the training data. Stability plays a crucial role in understanding generalization, robustness, and replicability, and a variety of stability notions have been proposed in different learning settings. However, while stability entails desirable properties, it is typically not sufficient on its own for statistical learning -- and indeed, it may be at odds with accuracy, since an algorithm that always outputs a constant function is perfectly stable but statistically meaningless. Thus, it is essential to understand the potential statistical cost of stability. In this work, we address this question by adopting a statistical decision-theoretic perspective, treating stability as a constraint in estimation. Focusing on two representative notions-worst-case stability and average-case stability-we first establish general lower bounds on the achievable estimation accuracy under each type of stability constraint. We then develop optimal stable estimators for four canonical estimation problems, including several mean estimation and regression settings. Together, these results characterize the optimal trade-offs between stability and accuracy across these tasks. Our findings formalize the intuition that average-case stability imposes a qualitatively weaker restriction than worst-case stability, and they further reveal that the gap between these two can vary substantially across different estimation problems.

#15 On Nonasymptotic Confidence Intervals for Treatment Effects in Randomized Experiments

著者: Ricardo J. Sandoval, Sivaraman Balakrishnan, Avi Feller, Michael I. Jordan, Ian Waudby-Smith

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.11744

要約:
We study nonasymptotic (finite-sample) confidence intervals for treatment effects in randomized experiments. In the existing literature, the effective sample sizes of nonasymptotic confidence intervals tend to be looser than the corresponding central-limit-theorem-based confidence intervals by a factor depending on the square root of the propensity score. We show that this performance gap can be closed, designing nonasymptotic confidence intervals that have the same effective sample size as their asymptotic counterparts. Our approach involves systematic exploitation of negative dependence or variance adaptivity (or both). We also show that the nonasymptotic rates that we achieve are unimprovable in an information-theoretic sense.

#16 Task-tailored Pre-processing: Fair Downstream Supervised Learning

著者: Jinwon Sohn, Guang Lin, Qifan Song

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.11897

要約:
Fairness-aware machine learning has recently attracted various communities to mitigate discrimination against certain societal groups in data-driven tasks. For fair supervised learning, particularly in pre-processing, there have been two main categories: data fairness and task-tailored fairness. The former directly finds an intermediate distribution among the groups, independent of the type of the downstream model, so a learned downstream classification/regression model returns similar predictive scores to individuals inputting the same covariates irrespective of their sensitive attributes. The latter explicitly takes the supervised learning task into account when constructing the pre-processing map. In this work, we study algorithmic fairness for supervised learning and argue that the data fairness approaches impose overly strong regularization from the perspective of the HGR correlation. This motivates us to devise a novel pre-processing approach tailored to supervised learning. We account for the trade-off between fairness and utility in obtaining the pre-processing map. Then we study the behavior of arbitrary downstream supervised models learned on the transformed data to find sufficient conditions to guarantee their fairness improvement and utility preservation. To our knowledge, no prior work in the branch of task-tailored methods has theoretically investigated downstream guarantees when using pre-processed data. We further evaluate our framework through comparison studies based on tabular and image data sets, showing the superiority of our framework which preserves consistent trade-offs among multiple downstream models compared to recent competing models. Particularly for computer vision data, we see our method alters only necessary semantic features related to the central machine learning task to achieve fairness.

#17 Lost in Aggregation: The Causal Interpretation of the IV Estimand

著者: Danielle Tsao, Krikamol Muandet, Frederick Eberhardt, Emilija Perkovi\'c

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.12120

要約:
Instrumental variable based estimation of a causal effect has emerged as a standard approach to mitigate confounding bias in the social sciences and epidemiology, where conducting randomized experiments can be too costly or impossible. However, justifying the validity of the instrument often poses a significant challenge. In this work, we highlight a problem generally neglected in arguments for instrumental variable validity: the presence of an ''aggregate treatment variable'', where the treatment (e.g., education, GDP, caloric intake) is composed of finer-grained components that each may have a different effect on the outcome. We show that the causal effect of an aggregate treatment is generally ambiguous, as it depends on how interventions on the aggregate are instantiated at the component level, formalized through the aggregate-constrained component intervention distribution. We then characterize conditions on the interventional distribution and the aggregate setting under which standard instrumental variable estimators identify the aggregate effect. The contrived nature of these conditions implies major limitations on the interpretation of instrumental variable estimates based on aggregate treatments and highlights the need for a broader justificatory base for the exclusion restriction in such settings.

#18 Federated Learning for the Design of Parametric Insurance Indices under Heterogeneous Renewable Production Losses

著者: Fallou Niakh

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.12178

要約:
We propose a federated learning framework for the calibration of parametric insurance indices under heterogeneous renewable energy production losses. Producers locally model their losses using Tweedie generalized linear models and private data, while a common index is learned through federated optimization without sharing raw observations. The approach accommodates heterogeneity in variance and link functions and directly minimizes a global deviance objective in a distributed setting. We implement and compare FedAvg, FedProx and FedOpt, and benchmark them against an existing approximation-based aggregation method. An empirical application to solar power production in Germany shows that federated learning recovers comparable index coefficients under moderate heterogeneity, while providing a more general and scalable framework.

#19 One-Sided Matrix Completion from Ultra-Sparse Samples

著者: Hongyang R. Zhang, Zhenshuo Zhang, Huy L. Nguyen, Guanghui Lan

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.12213

要約:
Matrix completion is a classical problem that has received recurring interest across a wide range of fields. In this paper, we revisit this problem in an ultra-sparse sampling regime, where each entry of an unknown, $n\times d$ matrix $M$ (with $n \ge d$) is observed independently with probability $p = C / d$, for a fixed integer $C \ge 2$. This setting is motivated by applications involving large, sparse panel datasets, where the number of rows far exceeds the number of columns. When each row contains only $C$ entries -- fewer than the rank of $M$ -- accurate imputation of $M$ is impossible. Instead, we estimate the row span of $M$ or the averaged second-moment matrix $T = M^{\top} M / n$. The empirical second-moment matrix computed from observed entries exhibits non-random and sparse missingness. We propose an unbiased estimator that normalizes each nonzero entry of the second moment by its observed frequency, followed by gradient descent to impute the missing entries of $T$. The normalization divides a weighted sum of $n$ binomial random variables by the total number of ones. We show that the estimator is unbiased for any $p$ and enjoys low variance. When the row vectors of $M$ are drawn uniformly from a rank-$r$ factor model satisfying an incoherence condition, we prove that if $n \ge O({d r^5 \epsilon^{-2} C^{-2} \log d})$, any local minimum of the gradient-descent objective is approximately global and recovers $T$ with error at most $\epsilon^2$. Experiments on both synthetic and real-world data validate our approach. On three MovieLens datasets, our algorithm reduces bias by $88\%$ relative to baseline estimators. We also empirically validate the linear sampling complexity of $n$ relative to $d$ on synthetic data. On an Amazon reviews dataset with sparsity $10^{-7}$, our method reduces the recovery error of $T$ by $59\%$ and $M$ by $38\%$ compared to baseline methods.

#20 How Well Do LLMs Predict Human Behavior? A Measure of their Pretrained Knowledge

著者: Wayne Gao, Sukjin Han, Annie Liang

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.12343

要約:
Large language models (LLMs) are increasingly used to predict human behavior. We propose a measure for evaluating how much knowledge a pretrained LLM brings to such a prediction: its equivalent sample size, defined as the amount of task-specific data needed to match the predictive accuracy of the LLM. We estimate this measure by comparing the prediction error of a fixed LLM in a given domain to that of flexible machine learning models trained on increasing samples of domain-specific data. We further provide a statistical inference procedure by developing a new asymptotic theory for cross-validated prediction error. Finally, we apply this method to the Panel Study of Income Dynamics. We find that LLMs encode considerable predictive information for some economic variables but much less for others, suggesting that their value as substitutes for domain-specific data differs markedly across settings.

#21 Statistical-Neural Interaction Networks for Interpretable Mixed-Type Data Imputation

著者: Ou Deng, Shoji Nishimura, Atsushi Ogihara, Qun Jin

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.12380

要約:
Real-world tabular databases routinely combine continuous measurements and categorical records, yet missing entries are pervasive and can distort downstream analysis. We propose Statistical-Neural Interaction (SNI), an interpretable mixed-type imputation framework that couples correlation-derived statistical priors with neural feature attention through a Controllable-Prior Feature Attention (CPFA) module. CPFA learns head-wise prior-strength coefficients $\{\lambda_h\}$ that softly regularize attention toward the prior while allowing data-driven deviations when nonlinear patterns appear to be present in the data. Beyond imputation, SNI aggregates attention maps into a directed feature-dependency matrix that summarizes which variables the imputer relied on, without requiring post-hoc explainers. We evaluate SNI against six baselines (Mean/Mode, MICE, KNN, MissForest, GAIN, MIWAE) on six datasets spanning ICU monitoring, population surveys, socio-economic statistics, and engineering applications. Under MCAR/strict-MAR at 30\% missingness, SNI is generally competitive on continuous metrics but is often outperformed by accuracy-first baselines (MissForest, MIWAE) on categorical variables; in return, it provides intrinsic dependency diagnostics and explicit statistical-neural trade-off parameters. We additionally report MNAR stress tests (with a mask-aware variant) and discuss computational cost, limitations -- particularly for severely imbalanced categorical targets -- and deployment scenarios where interpretability may justify the trade-off.

#22 Cooperative Multi-agent RL with Communication Constraints

agent

著者: Nuoya Xiong, Aarti Singh

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.12518

要約:
Cooperative MARL often assumes frequent access to global information in a data buffer, such as team rewards or other agents' actions, which is typically unrealistic in decentralized MARL systems due to high communication costs. When communication is limited, agents must rely on outdated information to estimate gradients and update their policies. A common approach to handle missing data is called importance sampling, in which we reweigh old data from a base policy to estimate gradients for the current policy. However, it quickly becomes unstable when the communication is limited (i.e. missing data probability is high), so that the base policy in importance sampling is outdated. To address this issue, we propose a technique called base policy prediction, which utilizes old gradients to predict the policy update and collect samples for a sequence of base policies, which reduces the gap between the base policy and the current policy. This approach enables effective learning with significantly fewer communication rounds, since the samples of predicted base policies could be collected within one communication round. Theoretically, we show that our algorithm converges to an $\varepsilon$-Nash equilibrium in potential games with only $O(\varepsilon^{-3/4})$ communication rounds and $O(poly(\max_i |A_i|)\varepsilon^{-11/4})$ samples, improving existing state-of-the-art results in communication cost, as well as sample complexity without the exponential dependence on the joint action space size. We also extend these results to general Markov Cooperative Games to find an agent-wise local maximum. Empirically, we test the base policy prediction algorithm in both simulated games and MAPPO for complex environments.

#23 What Trace Powers Reveal About Log-Determinants: Closed-Form Estimators, Certificates, and Failure Modes

著者: Piyush Sao

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.12612

要約:
Computing $\log\det(A)$ for large symmetric positive definite matrices arises in Gaussian process inference and Bayesian model comparison. Standard methods combine matrix-vector products with polynomial approximations. We study a different model: access to trace powers $p_k = \tr(A^k)$, natural when matrix powers are available. Classical moment-based approximations Taylor-expand $\log(\lambda)$ around the arithmetic mean. This requires $|\lambda - \AM| < \AM$ and diverges when $\kappa > 4$. We work instead with the moment-generating function $M(t) = \E[X^t]$ for normalized eigenvalues $X = \lambda/\AM$. Since $M'(0) = \E[\log X]$, the log-determinant becomes $\log\det(A) = n(\log \AM + M'(0))$ -- the problem reduces to estimating a derivative at $t = 0$. Trace powers give $M(k)$ at positive integers, but interpolating $M(t)$ directly is ill-conditioned due to exponential growth. The transform $K(t) = \log M(t)$ compresses this range. Normalization by $\AM$ ensures $K(0) = K(1) = 0$. With these anchors fixed, we interpolate $K$ through $m+1$ consecutive integers and differentiate to estimate $K'(0)$. However, this local interpolation cannot capture arbitrary spectral features. We prove a fundamental limit: no continuous estimator using finitely many positive moments can be uniformly accurate over unbounded conditioning. Positive moments downweight the spectral tail; $K'(0) = \E[\log X]$ is tail-sensitive. This motivates guaranteed bounds. From the same traces we derive upper bounds on $(\det A)^{1/n}$. Given a spectral floor $r \leq \lambda_{\min}$, we obtain moment-constrained lower bounds, yielding a provable interval for $\log\det(A)$. A gap diagnostic indicates when to trust the point estimate and when to report bounds. All estimators and bounds cost $O(m)$, independent of $n$. For $m \in \{4, \ldots, 8\}$, this is effectively constant time.

#24 New Trends in the Stability of Sinkhorn Semigroups

著者: Pierre Del Moral, Ajay Jasra

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.12633

要約:
Entropic optimal transport problems play an increasingly important role in machine learning and generative modelling. In contrast with optimal transport maps which often have limited applicability in high dimensions, Schrodinger bridges can be solved using the celebrated Sinkhorn's algorithm, a.k.a. the iterative proportional fitting procedure. The stability properties of Sinkhorn bridges when the number of iterations tends to infinity is a very active research area in applied probability and machine learning. Traditional proofs of convergence are mainly based on nonlinear versions of Perron-Frobenius theory and related Hilbert projective metric techniques, gradient descent, Bregman divergence techniques and Hamilton-Jacobi-Bellman equations, including propagation of convexity profiles based on coupling diffusions by reflection methods. The objective of this review article is to present, in a self-contained manner, recently developed Sinkhorn/Gibbs-type semigroup analysis based upon contraction coefficients and Lyapunov-type operator-theoretic techniques. These powerful, off-the-shelf semigroup methods are based upon transportation cost inequalities (e.g. log-Sobolev, Talagrand quadratic inequality, curvature estimates), $\phi$-divergences, Kantorovich-type criteria and Dobrushin contraction-type coefficients on weighted Banach spaces as well as Wasserstein distances. This novel semigroup analysis allows one to unify and simplify many arguments in the stability of Sinkhorn algorithm. It also yields new contraction estimates w.r.t. generalized $\phi$-entropies, as well as weighted total variation norms, Kantorovich criteria and Wasserstein distances.

#25 Decoding Rewards in Competitive Games: Inverse Game Theory with Entropy Regularization

著者: Junyi Liao, Zihan Zhu, Ethan Fang, Zhuoran Yang, Vahid Tarokh

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.12707

要約:
Estimating the unknown reward functions driving agents' behaviors is of central interest in inverse reinforcement learning and game theory. To tackle this problem, we develop a unified framework for reward function recovery in two-player zero-sum matrix games and Markov games with entropy regularization, where we aim to reconstruct the underlying reward functions given observed players' strategies and actions. This task is challenging due to the inherent ambiguity of inverse problems, the non-uniqueness of feasible rewards, and limited observational data coverage. To address these challenges, we establish the reward function's identifiability using the quantal response equilibrium (QRE) under linear assumptions. Building upon this theoretical foundation, we propose a novel algorithm to learn reward functions from observed actions. Our algorithm works in both static and dynamic settings and is adaptable to incorporate different methods, such as Maximum Likelihood Estimation (MLE). We provide strong theoretical guarantees for the reliability and sample efficiency of our algorithm. Further, we conduct extensive numerical studies to demonstrate the practical effectiveness of the proposed framework, offering new insights into decision-making in competitive environments.

#26 Online Continual Learning for Time Series: a Natural Score-driven Approach

著者: Edoardo Urettini, Daniele Atzeni, Ioanna-Yvonni Tsaknaki, Antonio Carta

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.12931

要約:
Online continual learning (OCL) methods adapt to changing environments without forgetting past knowledge. Similarly, online time series forecasting (OTSF) is a real-world problem where data evolve in time and success depends on both rapid adaptation and long-term memory. Indeed, time-varying and regime-switching forecasting models have been extensively studied, offering a strong justification for the use of OCL in these settings. Building on recent work that applies OCL to OTSF, this paper aims to strengthen the theoretical and practical connections between time series methods and OCL. First, we reframe neural network optimization as a parameter filtering problem, showing that natural gradient descent is a score-driven method and proving its information-theoretic optimality. Then, we show that using a Student's t likelihood in addition to natural gradient induces a bounded update, which improves robustness to outliers. Finally, we introduce Natural Score-driven Replay (NatSR), which combines our robust optimizer with a replay buffer and a dynamic scale heuristic that improves fast adaptation at regime drifts. Empirical results demonstrate that NatSR achieves stronger forecasting performance than more complex state-of-the-art methods.

#27 Multi-level Monte Carlo Dropout for Efficient Uncertainty Quantification

著者: Aaron Pim, Tristan Pryer

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.13272

要約:
We develop a multilevel Monte Carlo (MLMC) framework for uncertainty quantification with Monte Carlo dropout. Treating dropout masks as a source of epistemic randomness, we define a fidelity hierarchy by the number of stochastic forward passes used to estimate predictive moments. We construct coupled coarse--fine estimators by reusing dropout masks across fidelities, yielding telescoping MLMC estimators for both predictive means and predictive variances that remain unbiased for the corresponding dropout-induced quantities while reducing sampling variance at fixed evaluation budget. We derive explicit bias, variance and effective cost expressions, together with sample-allocation rules across levels. Numerical experiments on forward and inverse PINNs--Uzawa benchmarks confirm the predicted variance rates and demonstrate efficiency gains over single-level MC-dropout at matched cost.

#28 Fairness-informed Pareto Optimization : An Efficient Bilevel Framework

著者: Sofiane Tanji, Samuel Vaiter, Yassine Laguel

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.13448

要約:
Despite their promise, fair machine learning methods often yield Pareto-inefficient models, in which the performance of certain groups can be improved without degrading that of others. This issue arises frequently in traditional in-processing approaches such as fairness-through-regularization. In contrast, existing Pareto-efficient approaches are biased towards a certain perspective on fairness and fail to adapt to the broad range of fairness metrics studied in the literature. In this paper, we present BADR, a simple framework to recover the optimal Pareto-efficient model for any fairness metric. Our framework recovers its models through a Bilevel Adaptive Rescalarisation procedure. The lower level is a weighted empirical risk minimization task where the weights are a convex combination of the groups, while the upper level optimizes the chosen fairness objective. We equip our framework with two novel large-scale, single-loop algorithms, BADR-GD and BADR-SGD, and establish their convergence guarantees. We release badr, an open-source Python toolbox implementing our framework for a variety of learning tasks and fairness metrics. Finally, we conduct extensive numerical experiments demonstrating the advantages of BADR over existing Pareto-efficient approaches to fairness.

#29 Preconditioning Benefits of Spectral Orthogonalization in Muon

著者: Jianhao Ma, Yu Huang, Yuejie Chi, Yuxin Chen

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.13474

要約:
The Muon optimizer, a matrix-structured algorithm that leverages spectral orthogonalization of gradients, is a milestone in the pretraining of large language models. However, the underlying mechanisms of Muon -- particularly the role of gradient orthogonalization -- remain poorly understood, with very few works providing end-to-end analyses that rigorously explain its advantages in concrete applications. We take a step by studying the effectiveness of a simplified variant of Muon through two case studies: matrix factorization, and in-context learning of linear transformers. For both problems, we prove that simplified Muon converges linearly with iteration complexities independent of the relevant condition number, provably outperforming gradient descent and Adam. Our analysis reveals that the Muon dynamics decouple into a collection of independent scalar sequences in the spectral domain, each exhibiting similar convergence behavior. Our theory formalizes the preconditioning effect induced by spectral orthogonalization, offering insight into Muon's effectiveness in these matrix optimization problems and potentially beyond.

#30 Does Privacy Always Harm Fairness? Data-Dependent Trade-offs via Chernoff Information Neural Estimation

privacy

著者: Arjun Nichani (Richard), Hsiang Hsu (Richard), Chun-Fu (Richard), Chen, Haewon Jeong

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.13698

要約:
Fairness and privacy are two vital pillars of trustworthy machine learning. Despite extensive research on these individual topics, the relationship between fairness and privacy has received significantly less attention. In this paper, we utilize the information-theoretic measure Chernoff Information to highlight the data-dependent nature of the relationship among the triad of fairness, privacy, and accuracy. We first define Noisy Chernoff Difference, a tool that allows us to analyze the relationship among the triad simultaneously. We then show that for synthetic data, this value behaves in 3 distinct ways (depending on the distribution of the data). We highlight the data distributions involved in these cases and explore their fairness and privacy implications. Additionally, we show that Noisy Chernoff Difference acts as a proxy for the steepness of the fairness-accuracy curves. Finally, we propose a method for estimating Chernoff Information on data from unknown distributions and utilize this framework to examine the triad dynamic on real datasets. This work builds towards a unified understanding of the fairness-privacy-accuracy relationship and highlights its data-dependent nature.

#31 Orthogonium : A Unified, Efficient Library of Orthogonal and 1-Lipschitz Building Blocks

著者: Thibaut Boissin (IRIT-MISFIT), Franck Mamalet (ANITI, IMT), Valentin Lafargue (ANITI, IMT), Mathieu Serrurier (IRIT-MISFIT)

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.13776

要約:
Orthogonal and 1-Lipschitz neural network layers are essential building blocks in robust deep learning architectures, crucial for certified adversarial robustness, stable generative models, and reliable recurrent networks. Despite significant advancements, existing implementations remain fragmented, limited, and computationally demanding. To address these issues, we introduce Orthogonium , a unified, efficient, and comprehensive PyTorch library providing orthogonal and 1-Lipschitz layers. Orthogonium provides access to standard convolution features-including support for strides, dilation, grouping, and transposed-while maintaining strict mathematical guarantees. Its optimized implementations reduce overhead on large scale benchmarks such as ImageNet. Moreover, rigorous testing within the library has uncovered critical errors in existing implementations, emphasizing the importance of standardized and reliable tools. Orthogonium thus significantly lowers adoption barriers, enabling scalable experimentation and integration across diverse applications requiring orthogonality and robust Lipschitz constraints. Orthogonium is available at https://github.com/deel-ai/orthogonium.

#32 Inverting Self-Organizing Maps: A Unified Activation-Based Framework

著者: Alessandro Londei, Matteo Benati, Denise Lanzieri, Vittorio Loreto

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.13851

要約:
Self-Organizing Maps provide topology-preserving projections of high-dimensional data and have been widely used for visualization, clustering, and vector quantization. In this work, we show that the activation pattern of a SOM - the squared distances to its prototypes - can be inverted to recover the exact input under mild geometric conditions. This follows from a classical fact in Euclidean distance geometry: a point in $D$ dimensions is uniquely determined by its distances to $D{+}1$ affinely independent references. We derive the corresponding linear system and characterize the conditions under which the inversion is well-posed. Building upon this mechanism, we introduce the Manifold-Aware Unified SOM Inversion and Control (MUSIC) update rule, which enables controlled, semantically meaningful trajectories in latent space. MUSIC modifies squared distances to selected prototypes while preserving others, resulting in a deterministic geometric flow aligned with the SOM's piecewise-linear structure. Tikhonov regularization stabilizes the update rule and ensures smooth motion on high-dimensional datasets. Unlike variational or probabilistic generative models, MUSIC does not rely on sampling, latent priors, or encoder-decoder architectures. If no perturbation is applied, inversion recovers the exact input; when a target cluster or prototype is specified, MUSIC produces coherent semantic variations while remaining on the data manifold. This leads to a new perspective on data augmentation and controllable latent exploration based solely on prototype geometry. We validate the approach using synthetic Gaussian mixtures, the MNIST and the Faces in the Wild dataset. Across all settings, MUSIC produces smooth, interpretable trajectories that reveal the underlying geometry of the learned manifold, illustrating the advantages of SOM-based inversion over unsupervised clustering.

#33 Demystifying the trend of the healthcare index: Is historical price a key driver?

著者: Payel Sadhukhan, Samrat Gupta, Subhasis Ghosh, Tanujit Chakraborty

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.14062

要約:
Healthcare sector indices consolidate the economic health of pharmaceutical, biotechnology, and healthcare service firms. The short-term movements in these indices are closely intertwined with capital allocation decisions affecting research and development investment, drug availability, and long-term health outcomes. This research investigates whether historical open-high-low-close (OHLC) index data contain sufficient information for predicting the directional movement of the opening index on the subsequent trading day. The problem is formulated as a supervised classification task involving a one-step-ahead rolling window. A diverse feature set is constructed, comprising original prices, volatility-based technical indicators, and a novel class of nowcasting features derived from mutual OHLC ratios. The framework is evaluated on data from healthcare indices in the U.S. and Indian markets over a five-year period spanning multiple economic phases, including the COVID-19 pandemic. The results demonstrate robust predictive performance, with accuracy exceeding 0.8 and Matthews correlation coefficients above 0.6. Notably, the proposed nowcasting features have emerged as a key determinant of the market movement. We have employed the Shapley-based explainability paradigm to further elucidate the contribution of the features: outcomes reveal the dominant role of the nowcasting features, followed by a more moderate contribution of original prices. This research offers a societal utility: the proposed features and model for short-term forecasting of healthcare indices can reduce information asymmetry and support a more stable and equitable health economy.

#34 Penalizing Localized Dirichlet Energies in Low Rank Tensor Products

著者: Paris A. Karakasis, Nicholas D. Sidiropoulos

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.14173

要約:
We study low-rank tensor-product B-spline (TPBS) models for regression tasks and investigate Dirichlet energy as a measure of smoothness. We show that TPBS models admit a closed-form expression for the Dirichlet energy, and reveal scenarios where perfect interpolation is possible with exponentially small Dirichlet energy. This renders global Dirichlet energy-based regularization ineffective. To address this limitation, we propose a novel regularization strategy based on local Dirichlet energies defined on small hypercubes centered at the training points. Leveraging pretrained TPBS models, we also introduce two estimators for inference from incomplete samples. Comparative experiments with neural networks demonstrate that TPBS models outperform neural networks in the overfitting regime for most datasets, and maintain competitive performance otherwise. Overall, TPBS models exhibit greater robustness to overfitting and consistently benefit from regularization, while neural networks are more sensitive to overfitting and less effective in leveraging regularization.

#35 Q-learning with Adjoint Matching

著者: Qiyang Li, Sergey Levine

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.14234

要約:
We propose Q-learning with Adjoint Matching (QAM), a novel TD-based reinforcement learning (RL) algorithm that tackles a long-standing challenge in continuous-action RL: efficient optimization of an expressive diffusion or flow-matching policy with respect to a parameterized Q-function. Effective optimization requires exploiting the first-order information of the critic, but it is challenging to do so for flow or diffusion policies because direct gradient-based optimization via backpropagation through their multi-step denoising process is numerically unstable. Existing methods work around this either by only using the value and discarding the gradient information, or by relying on approximations that sacrifice policy expressivity or bias the learned policy. QAM sidesteps both of these challenges by leveraging adjoint matching, a recently proposed technique in generative modeling, which transforms the critic's action gradient to form a step-wise objective function that is free from unstable backpropagation, while providing an unbiased, expressive policy at the optimum. Combined with temporal-difference backup for critic learning, QAM consistently outperforms prior approaches on hard, sparse reward tasks in both offline and offline-to-online RL.

#36 Opportunities in AI/ML for the Rubin LSST Dark Energy Science Collaboration

著者: LSST Dark Energy Science Collaboration, Eric Aubourg, Camille Avestruz, Matthew R. Becker, Biswajit Biswas, Rahul Biswas, Boris Bolliet, Adam S. Bolton, Clecio R. Bom, Rapha\"el Bonnet-Guerrini, Alexandre Boucaud, Jean-Eric Campagne, Chihway Chang, Aleksandra \'Ciprijanovi\'c, Johann Cohen-Tanugi, Michael W. Coughlin, John Franklin Crenshaw, Juan C. Cuevas-Tello, Juan de Vicente, Seth W. Digel, Steven Dillmann, Mariano Javier de Le\'on Dominguez Romero, Alex Drlica-Wagner, Sydney Erickson, Alexander T. Gagliano, Christos Georgiou, Aritra Ghosh, Matthew Grayling, Kirill A. Grishin, Alan Heavens, Lindsay R. House, Mustapha Ishak, Wassim Kabalan, Arun Kannawadi, Fran\c{c}ois Lanusse, C. Danielle Leonard, Pierre-Fran\c{c}ois L\'eget, Michelle Lochner, Yao-Yuan Mao, Peter Melchior, Grant Merz, Martin Millon, Anais M\"oller, Gautham Narayan, Yuuki Omori, Hiranya Peiris, Laurence Perreault-Levasseur, Andr\'es A. Plazas Malag\'on, Nesar Ramachandra, Benjamin Remy, C\'ecile Roucelle, Jaime Ruiz-Zapatero, Stefan Schuldt, Ignacio Sevilla-Noarbe, Ved G. Shah, Tjitske Starkenburg, Stephen Thorp, Laura Toribio San Cipriano, Tilman Tr\"oster, Roberto Trotta, Padma Venkatraman, Amanda Wasserman, Tim White, Justine Zeghal, Tianqing Zhang, Yuanyuan Zhang

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.14235

要約:
The Vera C. Rubin Observatory's Legacy Survey of Space and Time (LSST) will produce unprecedented volumes of heterogeneous astronomical data (images, catalogs, and alerts) that challenge traditional analysis pipelines. The LSST Dark Energy Science Collaboration (DESC) aims to derive robust constraints on dark energy and dark matter from these data, requiring methods that are statistically powerful, scalable, and operationally reliable. Artificial intelligence and machine learning (AI/ML) are already embedded across DESC science workflows, from photometric redshifts and transient classification to weak lensing inference and cosmological simulations. Yet their utility for precision cosmology hinges on trustworthy uncertainty quantification, robustness to covariate shift and model misspecification, and reproducible integration within scientific pipelines. This white paper surveys the current landscape of AI/ML across DESC's primary cosmological probes and cross-cutting analyses, revealing that the same core methodologies and fundamental challenges recur across disparate science cases. Since progress on these cross-cutting challenges would benefit multiple probes simultaneously, we identify key methodological research priorities, including Bayesian inference at scale, physics-informed methods, validation frameworks, and active learning for discovery. With an eye on emerging techniques, we also explore the potential of the latest foundation model methodologies and LLM-driven agentic AI systems to reshape DESC workflows, provided their deployment is coupled with rigorous evaluation and governance. Finally, we discuss critical software, computing, data infrastructure, and human capital requirements for the successful deployment of these new methodologies, and consider associated risks and opportunities for broader coordination with external actors.

#37 Classification of high-dimensional data with spiked covariance matrix structure

著者: Yin-Jen Chen, Minh Tang

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2110.01950

要約:
We study the classification problem for high-dimensional data with $n$ observations on $p$ features where the $p \times p$ covariance matrix $\Sigma$ exhibits a spiked eigenvalue structure and the vector $\zeta$, given by the difference between the {\em whitened} mean vectors, is sparse. We analyze an adaptive classifier (adaptive with respect to the sparsity $s$) that first performs dimension reduction on the feature vectors prior to classification in the dimensionally reduced space, i.e., the classifier whitens the data, then screens the features by keeping only those corresponding to the $s$ largest coordinates of $\zeta$ and finally applies Fisher linear discriminant on the selected features. Leveraging recent results on entrywise matrix perturbation bounds for covariance matrices, we show that the resulting classifier is Bayes optimal whenever $n \rightarrow \infty$ and $s \sqrt{n^{-1} \ln p} \rightarrow 0$. Notably, our theory also guarantees Bayes optimality for the corresponding quadratic discriminant analysis (QDA). Experimental results on real and synthetic data further indicate that the proposed approach is competitive with state-of-the-art methods while operating on a substantially lower-dimensional representation.

#38 U-learning for Prediction Inference via Combinatory Multi-Subsampling: With Applications to LASSO and Neural Networks

著者: Zhe Fei, Yi Li

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2407.15301

要約:
Epigenetic aging clocks play a pivotal role in estimating an individual's biological age through the examination of DNA methylation patterns at numerous CpG (Cytosine-phosphate-Guanine) sites within their genome. However, making valid inferences on predicted epigenetic ages, or more broadly, on predictions derived from high-dimensional inputs, presents challenges. We introduce a novel U-learning approach via combinatory multi-subsampling for making ensemble predictions and constructing confidence intervals for predictions of continuous outcomes when traditional asymptotic methods are not applicable. More specifically, our approach conceptualizes the ensemble estimators within the framework of generalized U-statistics and invokes the H\'ajek projection for deriving the variances of predictions and constructing confidence intervals with valid conditional coverage probabilities. We apply our approach to two commonly used predictive algorithms, Lasso and deep neural networks (DNNs), and illustrate the validity of inferences with extensive numerical studies. We have applied these methods to predict the DNA methylation age (DNAmAge) of patients with various health conditions, aiming to accurately characterize the aging process and potentially guide anti-aging interventions.

#39 A survey on Clustered Federated Learning: Taxonomy, Analysis and Applications

著者: Michael Ben Ali (IRIT), Omar El-Rifai (CIS-ENSMSE), Imen Megdiche (IRIT-SIG, INUC), Andr\'e Peninou (IRIT-SIG, UT2J), Olivier Teste (IRIT-SIG)

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2501.17512

要約:
As Federated Learning (FL) expands, the challenge of non-independent and identically distributed (non-IID) data becomes critical. Clustered Federated Learning (CFL) addresses this by training multiple specialized models, each representing a group of clients with similar data distributions. However, the term ''CFL'' has increasingly been applied to operational strategies unrelated to data heterogeneity, creating significant ambiguity. This survey provides a systematic review of the CFL literature and introduces a principled taxonomy that classifies algorithms into Server-side, Client-side, and Metadata-based approaches. Our analysis reveals a distinct dichotomy: while theoretical research prioritizes privacy-preserving Server/Client-side methods, real-world applications in IoT, Mobility, and Energy overwhelmingly favor Metadata-based efficiency. Furthermore, we explicitly distinguish ''Core CFL'' (grouping clients for non-IID data) from ''Clustered X FL'' (operational variants for system heterogeneity). Finally, we outline lessons learned and future directions to bridge the gap between theoretical privacy and practical efficiency.

#40 Variable transformations in consistent loss functions

著者: Hristos Tyralis, Georgia Papacharalampous

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2502.16542

要約:
The empirical use of variable transformations within (strictly) consistent loss functions is widespread, yet a theoretical understanding is lacking. To address this gap, we develop a theoretical framework that establishes formal characterizations of (strict) consistency for such transformed loss functions. Our analysis focuses on two interrelated cases: (a) transformations applied solely to the realization variable and (b) bijective transformations applied jointly to both the realization and prediction variables. These cases extend the well-established framework of transformations applied exclusively to the prediction variable, as formalized by Osband's revelation principle. We further develop analogous characterizations for (strict) identification functions. The resulting theoretical framework is broadly applicable to statistical and machine learning methodologies. For instance, we apply the framework to Bregman and expectile loss functions to interpret empirical findings from models trained with transformed loss functions and systematically construct new identifiable and elicitable functionals, which we term respectively $g$-transformed expectation and $g$-transformed expectile. Applications of the framework to simulated and real-world data illustrate its practical utility in diverse settings. By unifying theoretical insights with practical applications, this work advances principled methodologies for designing loss functions in complex predictive tasks.

#41 Flow-based Generative Modeling of Potential Outcomes and Counterfactuals

著者: Dongze Wu, David I. Inouye, Yao Xie

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.16051

要約:
Predicting potential and counterfactual outcomes from observational data is central to individualized decision-making, particularly in clinical settings where treatment choices must be tailored to each patient rather than guided solely by population averages. We propose PO-Flow, a continuous normalizing flow (CNF) framework for causal inference that jointly models potential outcome distributions and factual-conditioned counterfactual outcomes. Trained via flow matching, PO-Flow provides a unified approach to individualized potential outcome prediction, conditional average treatment effect estimation, and counterfactual prediction. By encoding an observed factual outcome into a shared latent representation and decoding it under an alternative treatment, PO-Flow relates factual and counterfactual realizations at the individual level, rather than generating counterfactuals independently from marginal conditional distributions. In addition, PO-Flow supports likelihood-based evaluation of potential outcomes, enabling uncertainty-aware assessment of predictions. A supporting recovery guarantee is established under certain assumptions, and empirical results on benchmark datasets demonstrate strong performance across a range of causal inference tasks within the potential outcomes framework.

#42 ALPCAHUS: Subspace Clustering for Heteroscedastic Data

著者: Javier Salazar Cavazos, Jeffrey A Fessler, Laura Balzano

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.18918

要約:
Principal component analysis (PCA) is a key tool in the field of data dimensionality reduction. Various methods have been proposed to extend PCA to the union of subspace (UoS) setting for clustering data that comes from multiple subspaces like K-Subspaces (KSS). However, some applications involve heterogeneous data that vary in quality due to noise characteristics associated with each data sample. Heteroscedastic methods aim to deal with such mixed data quality. This paper develops a heteroscedastic-based subspace clustering method, named ALPCAHUS, that can estimate the sample-wise noise variances and use this information to improve the estimate of the subspace bases associated with the low-rank structure of the data. This clustering algorithm builds on K-Subspaces (KSS) principles by extending the recently proposed heteroscedastic PCA method, named LR-ALPCAH, for clusters with heteroscedastic noise in the UoS setting. Simulations and real-data experiments show the effectiveness of accounting for data heteroscedasticity compared to existing clustering algorithms. Code available at https://github.com/javiersc1/ALPCAHUS.

#43 Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning

著者: Korel Gundem, Juncheng Dong, Dennis Zhang, Vahid Tarokh, Zhengling Qi

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.23783

要約:
In-Context Learning (ICL) allows Large Language Models (LLMs) to adapt to new tasks with just a few examples, but their predictions often suffer from systematic biases, leading to unstable performances in classification. While calibration techniques are proposed to mitigate these biases, we show that, in the logit space, many of these methods are equivalent to merely shifting the LLM's decision boundary without having the ability to alter its orientation. This proves inadequate when biases cause the LLM to be severely misdirected. To address these limitations and provide a unifying framework, we propose Supervised Calibration (SC), a loss-minimization based framework which learns an optimal, per-class affine transformation of the LLM's predictive probabilities in the logit space without requiring external data beyond the context. By using a more expressive functional class, SC not only subsumes many existing calibration methods in ICL as special cases, but also enables the ability to alter and even completely reverse the orientation of the LLM's decision boundary. Furthermore, SC's loss-based nature facilitates the seamless integration of two purpose-built regularization techniques: context-invariance and directional trust-region. The former is designed to tackle the instability issue in ICL, while the latter controls the degree of calibration. Finally, SC delivers state-of-the-art performance over calibration baselines in the 4-shot, 8-shot, and 16-shot settings across all nine datasets for Mistral-7B-Instruct-v0.3, LLaMA-2-7B-chat, and Qwen2-7B-Instruct.

#44 Thompson Sampling in Function Spaces via Neural Operators

著者: Rafael Oliveira, Xuesong Wang, Kian Ming A. Chai, Edwin V. Bonilla

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2506.21894

要約:
We propose an extension of Thompson sampling to optimization problems over function spaces where the objective is a known functional of an unknown operator's output. We assume that queries to the operator (such as running a high-fidelity simulator or physical experiment) are costly, while functional evaluations on the operator's output are inexpensive. Our algorithm employs a sample-then-optimize approach using neural operator surrogates. This strategy avoids explicit uncertainty quantification by treating trained neural operators as approximate samples from a Gaussian process (GP) posterior. We derive regret bounds and theoretical results connecting neural operators with GPs in infinite-dimensional settings. Experiments benchmark our method against other Bayesian optimization baselines on functional optimization tasks involving partial differential equations of physical systems, demonstrating better sample efficiency and significant performance gains.

#45 Predicting Parkinson's Disease Progression Using Statistical and Neural Mixed Effects Models: Comparative Study on Longitudinal Biomarkers

著者: Ran Tong, Lanruo Wang, Tong Wang, Wei Yan

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2507.20058

要約:
Predicting Parkinson's Disease (PD) progression is crucial, and voice biomarkers offer a non-invasive method for tracking symptom severity (UPDRS scores) through telemonitoring. Analyzing this longitudinal data is challenging due to within-subject correlations and complex, nonlinear patient-specific progression patterns. This study benchmarks LMMs against two advanced hybrid approaches: the Generalized Neural Network Mixed Model (GNMM) (Mandel 2021), which embeds a neural network within a GLMM structure, and the Neural Mixed Effects (NME) model (Wortwein 2023), allowing nonlinear subject-specific parameters throughout the network. Using the Oxford Parkinson's telemonitoring voice dataset, we evaluate these models' performance in predicting Total UPDRS to offer practical guidance for PD research and clinical applications.

#46 PAC Learnability in the Presence of Performativity

著者: Ivan Kirev, Lyuben Baltadzhiev, Nikola Konstantinov

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.08335

要約:
Following the wide-spread adoption of machine learning models in real-world applications, the phenomenon of performativity, i.e. model-dependent shifts in the test distribution, becomes increasingly prevalent. Unfortunately, since models are usually trained solely based on samples from the original (unshifted) distribution, this performative shift may lead to decreased test-time performance. In this paper, we study the question of whether and when performative binary classification problems are learnable, via the lens of the classic PAC (Probably Approximately Correct) learning framework. We motivate several performative scenarios, accounting in particular for linear shifts in the label distribution, as well as for more general changes in both the labels and the features. We construct a performative empirical risk function, which depends only on data from the original distribution and on the type performative effect, and is yet an unbiased estimate of the true risk of a classifier on the shifted distribution. Minimizing this notion of performative risk allows us to show that any PAC-learnable hypothesis space in the standard binary classification setting remains PAC-learnable for the considered performative scenarios. We also conduct an extensive experimental evaluation of our performative risk minimization method and showcase benefits on synthetic and real data.

#47 Calibrating Generative Models to Distributional Constraints

著者: Henry D. Smith, Nathaniel L. Diamant, Brian L. Trippe

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.10020

要約:
Generative models frequently suffer miscalibration, wherein statistics of the sampling distribution such as class probabilities deviate from desired values. We frame calibration as a constrained optimization problem and seek the closest model in Kullback-Leibler divergence satisfying calibration constraints. To address the intractability of imposing these constraints exactly, we introduce two surrogate objectives for fine-tuning: (1) the relax loss, which replaces the constraint with a miscalibration penalty, and (2) the reward loss, which converts calibration into a reward fine-tuning problem. We demonstrate that these approaches substantially reduce calibration error across hundreds of simultaneous constraints and models with up to one billion parameters, spanning applications in protein design, image generation, and language modeling.

#48 Geopolitics, Geoeconomics and Risk: A Machine Learning Approach

著者: Alvaro Ortiz, Tomasa Rodrigo, Pablo Saborido

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.12416

要約:
We introduce a novel high-frequency daily panel dataset of both markets and news-based indicators -- including Geopolitical Risk, Economic Policy Uncertainty, Trade Policy Uncertainty, and Political Sentiment -- for 42 countries across both emerging and developed markets. Using this dataset, we study how sentiment dynamics shape sovereign risk, measured by Credit Default Swap (CDS) spreads, and evaluate their forecasting value relative to traditional drivers such as global monetary policy and market volatility. Our horse-race analysis of forecasting models demonstrates that incorporating news-based indicators significantly enhances predictive accuracy and enriches the analysis, with non-linear machine learning methods -- particularly Random Forests -- delivering the largest gains. Our analysis reveals that while global financial variables remain the dominant drivers of sovereign risk, geopolitical risk and economic policy uncertainty also play a meaningful role. Crucially, their effects are amplified through non-linear interactions with global financial conditions. Finally, we document pronounced regional heterogeneity, as certain asset classes and emerging markets exhibit heightened sensitivity to shocks in policy rates, global financial volatility, and geopolitical risk.

#49 Heuristics for Combinatorial Optimization via Value-based Reinforcement Learning: A Unified Framework and Analysis

著者: Orit Davidovich, Shimrit Shtern, Segev Wasserkrug, Nimrod Megiddo

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.08601

要約:
Since the 1990s, considerable empirical work has been carried out to train statistical models, such as neural networks (NNs), as learned heuristics for combinatorial optimization (CO) problems. When successful, such an approach eliminates the need for experts to design heuristics per problem type. Due to their structure, many hard CO problems are amenable to treatment through reinforcement learning (RL). Indeed, we find a wealth of literature training NNs using value-based, policy gradient, or actor-critic approaches, with promising results, both in terms of empirical optimality gaps and inference runtimes. Nevertheless, there has been a paucity of theoretical work undergirding the use of RL for CO problems. To this end, we introduce a unified framework to model CO problems through Markov decision processes (MDPs) and solve them using RL techniques. We provide easy-to-test assumptions under which CO problems can be formulated as equivalent undiscounted MDPs that provide optimal solutions to the original CO problems. Moreover, we establish conditions under which value-based RL techniques converge to approximate solutions of the CO problem with a guarantee on the associated optimality gap. Our convergence analysis provides: (1) a sufficient rate of increase in batch size and projected gradient descent steps at each RL iteration; (2) the resulting optimality gap in terms of problem parameters and targeted RL accuracy; and (3) the importance of a choice of state-space embedding. Together, our analysis illuminates the success (and limitations) of the celebrated deep Q-learning algorithm in this problem context.

#50 Improving the Accuracy of Amortized Model Comparison with Self-Consistency

著者: \v{S}imon Kucharsk\'y, Aayush Mishra, Daniel Habermann, Stefan T. Radev, Paul-Christian B\"urkner

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.14308

要約:
Amortized Bayesian inference (ABI) offers fast, scalable approximations to posterior densities by training neural surrogates on data simulated from the statistical model. However, ABI methods are highly sensitive to model misspecification: when observed data fall outside the training distribution (generative scope of the statistical models), neural surrogates can behave unpredictably. This makes it a challenge in a model comparison setting, where multiple statistical models are considered, of which at least some are misspecified. Recent work on self-consistency (SC) provides a promising remedy to this issue, accessible even for empirical data (without ground-truth labels). In this work, we investigate how SC can improve amortized model comparison conceptualized in four different ways. Across two synthetic and two real-world case studies, we find that approaches for model comparison that estimate marginal likelihoods through approximate parameter posteriors consistently outperform methods that directly approximate model evidence or posterior model probabilities. SC training improves robustness when the likelihood is available, even under severe model misspecification. The benefits of SC for methods without access of analytic likelihoods are more limited and inconsistent. Our results suggest practical guidance for reliable amortized Bayesian model comparison: prefer parameter posterior-based methods and augment them with SC training on empirical datasets to mitigate extrapolation bias under model misspecification.

#51 Local EGOP for Continuous Index Learning

著者: Alex Kokot, Anand Hemmady, Vydhourie Thiyageswaran, Marina Meila

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07061

要約:
We introduce the setting of continuous index learning, in which a function of many variables varies only along a small number of directions at each point. For efficient estimation, it is beneficial for a learning algorithm to adapt, near each point $x$, to the subspace that captures the local variability of the function $f$. We pose this task as kernel adaptation along a manifold with noise, and introduce Local EGOP learning, a recursive algorithm that utilizes the Expected Gradient Outer Product (EGOP) quadratic form as both a metric and inverse-covariance of our target distribution. We prove that Local EGOP learning adapts to the regularity of the function of interest, showing that under a supervised noisy manifold hypothesis, intrinsic dimensional learning rates are achieved for arbitrarily high-dimensional noise. Empirically, we compare our algorithm to the feature learning capabilities of deep learning. Additionally, we demonstrate improved regression quality compared to two-layer neural networks in the continuous single-index setting.

#52 Memorize Early, Then Query: Inlier-Memorization-Guided Active Outlier Detection

privacy

著者: Minseo Kang, Seunghwan Park, Dongha Kim

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.10993

要約:
Outlier detection (OD) aims to identify abnormal instances, known as outliers or anomalies, by learning typical patterns of normal data, or inliers. Performing OD under an unsupervised regime-without any information about anomalous instances in the training data-is challenging. A recently observed phenomenon, known as the inlier-memorization (IM) effect, where deep generative models (DGMs) tend to memorize inlier patterns during early training, provides a promising signal for distinguishing outliers. However, existing unsupervised approaches that rely solely on the IM effect still struggle when inliers and outliers are not well-separated or when outliers form dense clusters. To address these limitations, we incorporate active learning to selectively acquire informative labels, and propose IMBoost, a novel framework that explicitly reinforces the IM effect to improve outlier detection. Our method consists of two stages: 1) a warm-up phase that induces and promotes the IM effect, and 2) a polarization phase in which actively queried samples are used to maximize the discrepancy between inlier and outlier scores. In particular, we propose a novel query strategy and tailored loss function in the polarization phase to effectively identify informative samples and fully leverage the limited labeling budget. We provide a theoretical analysis showing that the IMBoost consistently decreases inlier risk while increasing outlier risk throughout training, thereby amplifying their separation. Extensive experiments on diverse benchmark datasets demonstrate that IMBoost not only significantly outperforms state-of-the-art active OD methods but also requires substantially less computational cost.

#53 On the Global Convergence of Risk-Averse Natural Policy Gradient Methods with Expected Conditional Risk Measures

著者: Xian Yu, Lei Ying

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2301.10932

要約:
Risk-sensitive reinforcement learning (RL) has become a popular tool for controlling the risk of uncertain outcomes and ensuring reliable performance in highly stochastic sequential decision-making problems. While it has been shown that policy gradient methods can find globally optimal policies in the risk-neutral setting, it remains unclear if the risk-averse variants enjoy the same global convergence guarantees. In this paper, we consider a class of dynamic time-consistent risk measures, named Expected Conditional Risk Measures (ECRMs), and derive natural policy gradient (NPG) updates for ECRMs-based RL problems. We provide global optimality and iteration complexity of the proposed risk-averse NPG algorithm with softmax parameterization and entropy regularization under both exact and inexact policy evaluation. Furthermore, we test our risk-averse NPG algorithm on a stochastic Cliffwalk environment to demonstrate the efficacy of our method.

#54 Hidden Minima in Two-Layer ReLU Networks

model extraction

著者: Yossi Arjevani

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2312.16819

要約:
We consider the optimization problem associated with training two-layer ReLU networks with $d$ inputs under the squared loss, where the labels are generated by a target network. Recent work has identified two distinct classes of infinite families of minima: one whose training loss vanishes in the high-dimensional limit, and another whose loss remains bounded away from zero. The latter family is empirically avoided by stochastic gradient descent, hence \emph{hidden}, motivating the search for analytic criteria that distinguish hidden from non-hidden minima. A key challenge is that prior analyses have shown the Hessian spectra at hidden and non-hidden minima to coincide up to terms of order $O(d^{-1/2})$, seemingly limiting the discriminative power of spectral methods. We therefore take a different route, studying instead certain curves along which the loss is locally minimized. Our main result shows that arcs emanating from hidden minima exhibit distinctive structural and symmetry properties, arising precisely from $\Omega(d^{-1/2})$ eigenvalue contributions that are absent from earlier analyses.

#55 Deeper or Wider: A Perspective from Optimal Generalization Error with Sobolev Loss

著者: Yahong Yang, Juncai He

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2402.00152

要約:
Constructing the architecture of a neural network is a challenging pursuit for the machine learning community, and the dilemma of whether to go deeper or wider remains a persistent question. This paper explores a comparison between deeper neural networks (DeNNs) with a flexible number of layers and wider neural networks (WeNNs) with limited hidden layers, focusing on their optimal generalization error in Sobolev losses. Analytical investigations reveal that the architecture of a neural network can be significantly influenced by various factors, including the number of sample points, parameters within the neural networks, and the regularity of the loss function. Specifically, a higher number of parameters tends to favor WeNNs, while an increased number of sample points and greater regularity in the loss function lean towards the adoption of DeNNs. We ultimately apply this theory to address partial differential equations using deep Ritz and physics-informed neural network (PINN) methods, guiding the design of neural networks.

#56 Trading off Consistency and Dimensionality of Convex Surrogates for the Mode

著者: Enrique Nueve, Bo Waggoner, Dhamma Kimpara, Jessie Finocchiaro

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2402.10818

要約:
In multiclass classification over $n$ outcomes, the outcomes must be embedded into the reals with dimension at least $n-1$ in order to design a consistent surrogate loss that leads to the "correct" classification, regardless of the data distribution. For large $n$, such as in information retrieval and structured prediction tasks, optimizing a surrogate in $n-1$ dimensions is often intractable. We investigate ways to trade off surrogate loss dimension, the number of problem instances, and restricting the region of consistency in the simplex for multiclass classification. Following past work, we examine an intuitive embedding procedure that maps outcomes into the vertices of convex polytopes in a low-dimensional surrogate space. We show that full-dimensional subsets of the simplex exist around each point mass distribution for which consistency holds, but also, with less than $n-1$ dimensions, there exist distributions for which a phenomenon called hallucination occurs, which is when the optimal report under the surrogate loss is an outcome with zero probability. Looking towards application, we derive a result to check if consistency holds under a given polytope embedding and low-noise assumption, providing insight into when to use a particular embedding. We provide examples of embedding $n = 2^{d}$ outcomes into the $d$-dimensional unit cube and $n = d!$ outcomes into the $d$-dimensional permutahedron under low-noise assumptions. Finally, we demonstrate that with multiple problem instances, we can learn the mode with $\frac{n}{2}$ dimensions over the whole simplex.

#57 On the Identification of Temporally Causal Representation with Instantaneous Dependence

著者: Zijian Li, Yifan Shen, Kaitao Zheng, Ruichu Cai, Xiangchen Song, Mingming Gong, Guangyi Chen, Kun Zhang

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2405.15325

要約:
Temporally causal representation learning aims to identify the latent causal process from time series observations, but most methods require the assumption that the latent causal processes do not have instantaneous relations. Although some recent methods achieve identifiability in the instantaneous causality case, they require either interventions on the latent variables or grouping of the observations, which are in general difficult to obtain in real-world scenarios. To fill this gap, we propose an \textbf{ID}entification framework for instantane\textbf{O}us \textbf{L}atent dynamics (\textbf{IDOL}) by imposing a sparse influence constraint that the latent causal processes have sparse time-delayed and instantaneous relations. Specifically, we establish identifiability results of the latent causal process based on sufficient variability and the sparse influence constraint by employing contextual information of time series data. Based on these theories, we incorporate a temporally variational inference architecture to estimate the latent variables and a gradient-based sparsity regularization to identify the latent causal process. Experimental results on simulation datasets illustrate that our method can identify the latent causal process. Furthermore, evaluations on multiple human motion forecasting benchmarks with instantaneous dependencies indicate the effectiveness of our method in real-world settings.

#58 CHANI: Correlation-based Hawkes Aggregation of Neurons with bio-Inspiration

著者: Sophie Jaffard (CSBD, MPI-CBG), Samuel Vaiter (CNRS, LJAD), Patricia Reynaud-Bouret (CNRS, LJAD)

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2405.18828

要約:
The present work aims at proving mathematically that a neural network inspired by biology can learn a classification task thanks to local transformations only. In this purpose, we propose a spiking neural network named CHANI (Correlation-based Hawkes Aggregation of Neurons with bio-Inspiration), whose neurons activity is modeled by Hawkes processes. Synaptic weights are updated thanks to an expert aggregation algorithm, providing a local and simple learning rule. We were able to prove that our network can learn on average and asymptotically. Moreover, we demonstrated that it automatically produces neuronal assemblies in the sense that the network can encode several classes and that a same neuron in the intermediate layers might be activated by more than one class, and we provided numerical simulations on synthetic dataset. This theoretical approach contrasts with the traditional empirical validation of biologically inspired networks and paves the way for understanding how local learning rules enable neurons to form assemblies able to represent complex concepts.

#59 Winners with Confidence: Discrete Argmin Inference with an Application to Model Selection

著者: Tianyu Zhang, Hao Lee, Jing Lei

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2408.02060

要約:
We study the problem of finding the index of the minimum value of a vector from noisy observations. This problem is relevant in population/policy comparison, discrete maximum likelihood, and model selection. We develop an asymptotically normal test statistic, even in high-dimensional settings and with potentially many ties in the population mean vector, by integrating concepts and tools from cross-validation and differential privacy. The key technical ingredient is a central limit theorem for globally dependent data. We also propose practical ways to select the tuning parameter that adapts to the signal landscape. Numerical experiments and data examples demonstrate the ability of the proposed method to achieve a favorable bias-variance trade-off in practical scenarios.

#60 On Expressive Power of Quantized Neural Networks under Fixed-Point Arithmetic

著者: Yeachan Park, Sejun Park, Geonho Hwang

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2409.00297

要約:
Existing works on the expressive power of neural networks typically assume real parameters and exact operations. In this work, we study the expressive power of quantized networks under discrete fixed-point parameters and inexact fixed-point operations with round-off errors. We first provide a necessary condition and a sufficient condition on fixed-point arithmetic and activation functions for quantized networks to represent all fixed-point functions from fixed-point vectors to fixed-point numbers. Then, we show that various popular activation functions satisfy our sufficient condition, e.g., Sigmoid, ReLU, ELU, SoftPlus, SiLU, Mish, and GELU. In other words, networks using those activation functions are capable of representing all fixed-point functions. We further show that our necessary condition and sufficient condition coincide under a mild condition on activation functions: e.g., for an activation function $\sigma$, there exists a fixed-point number $x$ such that $\sigma(x)=0$. Namely, we find a necessary and sufficient condition for a large class of activation functions. We lastly show that even quantized networks using binary weights in $\{-1,1\}$ can also represent all fixed-point functions for practical activation functions.

#61 Neural timescales from a computational perspective

著者: Roxana Zeraati, Anna Levina, Jakob H. Macke, Richard Gao

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2409.02684

要約:
Neural activity fluctuates over a wide range of timescales within and across brain areas. Experimental observations suggest that diverse neural timescales reflect information in dynamic environments. However, how timescales are defined and measured from brain recordings vary across the literature. Moreover, these observations do not specify the mechanisms underlying timescale variations, nor whether specific timescales are necessary for neural computation and brain function. Here, we synthesize three directions where computational approaches can distill the broad set of empirical observations into quantitative and testable theories: We review (i) how different data analysis methods quantify timescales across distinct behavioral states and recording modalities, (ii) how biophysical models provide mechanistic explanations for the emergence of diverse timescales, and (iii) how task-performing networks and machine learning models uncover the functional relevance of neural timescales. This integrative computational perspective thus complements experimental investigations, providing a holistic view on how neural timescales reflect the relationship between brain structure, dynamics, and behavior.

#62 Entropy contraction of the Gibbs sampler under log-concavity

著者: Filippo Ascolani, Hugo Lavenant, Giacomo Zanella

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2410.00858

要約:
The Gibbs sampler (a.k.a. Glauber dynamics and heat-bath algorithm) is a popular Markov Chain Monte Carlo algorithm which iteratively samples from the conditional distributions of a probability measure $\pi$ of interest. Under the assumption that $\pi$ is strongly log-concave, we show that the random scan Gibbs sampler contracts in relative entropy and provide a sharp characterization of the associated contraction rate. Assuming that evaluating conditionals is cheap compared to evaluating the joint density, our results imply that the number of full evaluations of $\pi$ needed for the Gibbs sampler to mix grows linearly with the condition number and is independent of the dimension. If $\pi$ is non-strongly log-concave, the convergence rate in entropy degrades from exponential to polynomial. Our techniques are versatile and extend to Metropolis-within-Gibbs schemes and the Hit-and-Run algorithm. A comparison with gradient-based schemes and the connection with the optimization literature are also discussed.

#63 TabDPT: Scaling Tabular Foundation Models on Real Data

著者: Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh, Alex Labach, Hamidreza Kamkari, Jesse C. Cresswell, Keyvan Golestan, Guangwei Yu, Anthony L. Caterini, Maksims Volkovs

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2410.18164

要約:
Tabular data is one of the most ubiquitous sources of information worldwide, spanning a wide variety of domains. This inherent heterogeneity has slowed the development of Tabular Foundation Models (TFMs) capable of fast generalization to unseen datasets. In-Context Learning (ICL) has recently emerged as a promising solution for TFMs, enabling dynamic adaptation to new tasks without additional tuning. While many studies have attempted to re-purpose large language models for tabular ICL, they have had limited success, so recent works have focused on developing tabular-specific foundation models. In this work, we propose an approach to combine ICL-based retrieval with self supervised learning to train tabular foundation models. We also investigate the utility of real vs. synthetic data for model pre-training, and show that real data can contain useful signal not easily captured in synthetic training. Specifically, we show that incorporating real data during the pre-training phase can lead to significantly faster training and better downstream generalization to unseen data. Our resulting model, TabDPT, achieves strong performance on both regression (CTR23) and classification (CC18) benchmarks. Importantly, we also demonstrate that with our pre-training procedure, scaling both model and data size leads to consistent performance improvements that follow power laws. This echoes scaling laws in LLMs and other foundation models, and suggests that large-scale TFMs can be achievable. We open-source our full pipeline: inference code including trained model weights can be found at github.com/layer6ai-labs/TabDPT-inference, and the training code to reproduce experiments can be found at github.com/layer6ai-labs/TabDPT-training.

#64 CausAdv: A Causal-based Framework for Detecting Adversarial Examples

著者: Hichem Debbi

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2411.00839

要約:
Deep learning has led to tremendous success in computer vision, largely due to Convolutional Neural Networks (CNNs). However, CNNs have been shown to be vulnerable to crafted adversarial perturbations. This vulnerability of adversarial examples has has motivated research into improving model robustness through adversarial detection and defense methods. In this paper, we address the adversarial robustness of CNNs through causal reasoning. We propose CausAdv: a causal framework for detecting adversarial examples based on counterfactual reasoning. CausAdv learns both causal and non-causal features of every input, and quantifies the counterfactual information (CI) of every filter of the last convolutional layer. We then perform a statistical analysis of the filters' CI across clean and adversarial samples, to demonstrate that adversarial examples exhibit different CI distributions compared to clean samples. Our results show that causal reasoning enhances the process of adversarial detection without the need to train a separate detector. Moreover, we illustrate the efficiency of causal explanations as a helpful detection tool by visualizing the extracted causal features.

#65 Optimization Insights into Deep Diagonal Linear Networks

著者: Hippolyte Labarri\`ere, Cesare Molinari, Lorenzo Rosasco, Cristian Vega, Silvia Villa

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2412.16765

要約:
Gradient-based methods successfully train highly overparameterized models in practice, even though the associated optimization problems are markedly nonconvex. Understanding the mechanisms that make such methods effective has become a central problem in modern optimization. To investigate this question in a tractable setting, we study Deep Diagonal Linear Networks. These are multilayer architectures with a reparameterization that preserves convexity in the effective parameter, while inducing a nontrivial geometry in the optimization landscape. Under mild initialization conditions, we show that gradient flow on the layer parameters induces a mirror-flow dynamic in the effective parameter space. This structural insight yields explicit convergence guarantees, including exponential decay of the loss under a Polyak-Lojasiewicz condition, and clarifies how the parametrization and initialization scale govern the training speed. Overall, our results demonstrate that deep diagonal over parameterizations, despite their apparent complexity, can endow standard gradient methods with well-behaved and interpretable optimization dynamics.

#66 From discrete-time policies to continuous-time diffusion samplers: Asymptotic equivalences and faster training

diffusion

著者: Julius Berner, Lorenz Richter, Marcin Sendera, Jarrid Rector-Brooks, Nikolay Malkin

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2501.06148

要約:
We study the problem of training neural stochastic differential equations, or diffusion models, to sample from a Boltzmann distribution without access to target samples. Existing methods for training such models enforce time-reversal of the generative and noising processes, using either differentiable simulation or off-policy reinforcement learning (RL). We prove equivalences between families of objectives in the limit of infinitesimal discretization steps, linking entropic RL methods (GFlowNets) with continuous-time objects (partial differential equations and path space measures). We further show that an appropriate choice of coarse time discretization during training allows greatly improved sample efficiency and the use of time-local objectives, achieving competitive performance on standard sampling benchmarks with reduced computational cost.

#67 Network-Level Measures of Mobility from Aggregated Origin-Destination Data

著者: Alisha Foster, David A. Meyer, Asif Shakeel

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2502.04162

要約:
We introduce a framework for defining and interpreting collective mobility measures from spatially and temporally aggregated origin--destination (OD) data. Rather than characterizing individual behavior, these measures describe properties of the mobility system itself: how network organization, spatial structure, and routing constraints shape and channel population movement. In this view, aggregate mobility flows reveal aspects of connectivity, functional organization, and large-scale daily activity patterns encoded in the underlying transport and spatial network. To support interpretation and provide a controlled reference for the proposed time-elapsed calculations, we first employ an independent, network-driven synthetic data generator in which trajectories arise from prescribed system structure rather than observed data. This controlled setting provides a concrete reference for understanding how the proposed measures reflect network organization and flow constraints. We then apply the measures to fully anonymized data from the NetMob 2024 Data Challenge, examining their behavior under realistic limitations of spatial and temporal aggregation. While such data constraints restrict dynamical resolution, the resulting metrics still exhibit interpretable large-scale structure and temporal variation at the city scale.

#68 Semiparametric Off-Policy Inference for Optimal Policy Values under Possible Non-Uniqueness

著者: Haoyu Wei

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.13809

要約:
Off-policy evaluation (OPE) constructs confidence intervals for the value of a target policy using data generated under a different behavior policy. Most existing inference methods focus on fixed target policies and may fail when the target policy is estimated as optimal, particularly when the optimal policy is non-unique or nearly deterministic. We study inference for the value of optimal policies in Markov decision processes. We characterize the existence of the efficient influence function and show that non-regularity arises under policy non-uniqueness. Motivated by this analysis, we propose a novel \textit{N}onparametric \textit{S}equenti\textit{A}l \textit{V}alue \textit{E}valuation (NSAVE) method, which achieves semiparametric efficiency and retains the double robustness property when the optimal policy is unique, and remains stable in degenerate regimes beyond the scope of existing asymptotic theory. We further develop a smoothing-based approach for valid inference under non-unique optimal policies, and a post-selection procedure with uniform coverage for data-selected optimal policies. Simulation studies support the theoretical results. An application to the OhioT1DM mobile health dataset provides patient-specific confidence intervals for optimal policy values and their improvement over observed treatment policies.

#69 Modeling Hierarchical Spaces: A Review and Unified Framework for Surrogate-Based Architecture Design

著者: Paul Saves, Edward Hall\'e-Hannan, Jasper Bussemaker, Youssef Diouane, Nathalie Bartoli

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2506.22621

要約:
Simulation-based problems involving mixed-variable inputs frequently feature domains that are hierarchical, conditional, heterogeneous, or tree-structured. These characteristics pose challenges for data representation, modeling, and optimization. This paper reviews extensive literature on these structured input spaces and proposes a unified framework that generalizes existing approaches. In this framework, input variables may be continuous, integer, or categorical. A variable is described as meta if its value governs the presence of other decreed variables, enabling the modeling of conditional and hierarchical structures. We further introduce the concept of partially-decreed variables, whose activation depends on contextual conditions. To capture these inter-variable hierarchical relationships, we introduce design space graphs, combining principles from feature modeling and graph theory. This allows the definition of general hierarchical domains suitable for describing complex system architectures. Our framework defines hierarchical distances and kernels to enable surrogate modeling and optimization on hierarchical domains. We demonstrate its effectiveness on complex system design problems, including a neural network and a green-aircraft case study. Our methods are available in the open-source Surrogate Modeling Toolbox (SMT 2.0).

#70 Matrix-Free Two-to-Infinity and One-to-Two Norms Estimation

著者: Askar Tsyganov, Evgeny Frolov, Sergey Samsonov, Maxim Rakhuba

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2508.04444

要約:
In this paper, we propose new randomized algorithms for estimating the two-to-infinity and one-to-two norms in a matrix-free setting, using only matrix-vector multiplications. Our methods are based on appropriate modifications of Hutchinson's diagonal estimator and its Hutch++ version. We provide oracle complexity bounds for both modifications. We further illustrate the practical utility of our algorithms for Jacobian-based regularization in deep neural network training on image classification tasks. We also demonstrate that our methodology can be applied to mitigate the effect of adversarial attacks in the domain of recommender systems.

#71 Single-Step Reconstruction-Free Anomaly Detection and Segmentation via Diffusion Models

privacydiffusion

著者: Mehrdad Moradi, Marco Grasso, Bianca Maria Colosimo, Kamran Paynabar

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2508.04818

要約:
Generative models have demonstrated significant success in anomaly detection and segmentation over the past decade. Recently, diffusion models have emerged as a powerful alternative, outperforming previous approaches such as GANs and VAEs. In typical diffusion-based anomaly detection, a model is trained on normal data, and during inference, anomalous images are perturbed to a predefined intermediate step in the forward diffusion process. The corresponding normal image is then reconstructed through iterative reverse sampling. However, reconstruction-based approaches present three major challenges: (1) the reconstruction process is computationally expensive due to multiple sampling steps, making real-time applications impractical; (2) for complex or subtle patterns, the reconstructed image may correspond to a different normal pattern rather than the original input; and (3) Choosing an appropriate intermediate noise level is challenging because it is application-dependent and often assumes prior knowledge of anomalies, an assumption that does not hold in unsupervised settings. We introduce Reconstruction-free Anomaly Detection with Attention-based diffusion models in Real-time (RADAR), which overcomes the limitations of reconstruction-based anomaly detection. Unlike current SOTA methods that reconstruct the input image, RADAR directly produces anomaly maps from the diffusion model, improving both detection accuracy and computational efficiency. We evaluate RADAR on real-world 3D-printed material and the MVTec-AD dataset. Our approach surpasses state-of-the-art diffusion-based and statistical machine learning models across all key metrics, including accuracy, precision, recall, and F1 score. Specifically, RADAR improves F1 score by 7% on MVTec-AD and 13% on the 3D-printed material dataset compared to the next best model. Code available at: https://github.com/mehrdadmoradi124/RADAR

#72 Discovering equations from data: symbolic regression in dynamical systems

著者: Beatriz R. Brum, Luiza Lober, Isolde Previdelli, Francisco A. Rodrigues

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2508.20257

要約:
The process of discovering equations from data lies at the heart of physics and in many other areas of research, including mathematical ecology and epidemiology. Recently, machine learning methods known as symbolic regression emerged as a way to automate this task. This study presents an overview of the current literature on symbolic regression, while also comparing the efficiency of five state-of-the-art methods in recovering the governing equations from nine processes, including chaotic dynamics and epidemic models. Benchmark results demonstrate the PySR method as the most suitable for inferring equations, with some estimates being indistinguishable from the original analytical forms. These results highlight the potential of symbolic regression as a robust tool for inferring and modeling real-world phenomena.

#73 Message passing-based inference in an autoregressive active inference agent

agent

著者: Wouter M. Kouw, Tim N. Nisslbeck, Wouter L. N. Nuijten

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.25482

要約:
We present the design of an autoregressive active inference agent in the form of message passing on a factor graph. Expected free energy is derived and distributed across a planning graph. The proposed agent is validated on a robot navigation task, demonstrating exploration and exploitation in a continuous-valued observation space with bounded continuous-valued actions. Compared to a classical optimal controller, the agent modulates action based on predictive uncertainty, arriving later but with a better model of the robot's dynamics.

#74 Chain-of-Influence: Tracing Interdependencies Across Time and Features in Clinical Predictive Modelings

著者: Yubo Li, Rema Padman

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.09895

要約:
Modeling clinical time-series data is hampered by the challenge of capturing latent, time-varying dependencies among features. State-of-the-art approaches often rely on black-box mechanisms or simple aggregation, failing to explicitly model how the influence of one clinical variable propagates through others over time. We propose $\textbf{Chain-of-Influence (CoI)}$, an interpretable deep learning framework that constructs an explicit, time-unfolded graph of feature interactions. CoI enables the tracing of influence pathways, providing a granular audit trail that shows how any feature at any time contributes to the final prediction, both directly and through its influence on other variables. We evaluate CoI on mortality and disease progression tasks using the MIMIC-IV dataset and a chronic kidney disease cohort. Our framework achieves state-of-the-art predictive performance (AUROC of 0.960 on CKD progression and 0.950 on ICU mortality), with deletion-based sensitivity analyses confirming that CoI's learned attributions faithfully reflect its decision process. Through case studies, we demonstrate that CoI uncovers clinically meaningful, patient-specific patterns of disease progression, offering enhanced transparency into the temporal and cross-feature dependencies that inform clinical decision-making.

#75 Joint Score-Threshold Optimization for Interpretable Risk Assessment

著者: Fardin Gankhanloo, Emmett Springer, Erik H. Hoyer, Daniel L. Young, Kimia Ghobadi

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.21934

要約:
Risk assessment tools in healthcare commonly employ point-based scoring systems that map patients to ordinal risk categories via thresholds. While electronic health record (EHR) data presents opportunities for data-driven optimization of these tools, two fundamental challenges impede standard supervised learning: (1) labels are often available only for extreme risk categories due to intervention-censored outcomes, and (2) misclassification cost is asymmetric and increases with ordinal distance. We propose a mixed-integer programming (MIP) framework that jointly optimizes scoring weights and category thresholds in the face of these challenges. Our approach prevents label-scarce category collapse via threshold constraints, and utilizes an asymmetric, distance-aware objective. The MIP framework supports governance constraints, including sign restrictions, sparsity, and minimal modifications to incumbent tools, ensuring practical deployability in clinical workflows. We further develop a continuous relaxation of the MIP problem to provide warm-start solutions for more efficient MIP optimization. We apply the proposed score optimization framework to a case study of inpatient falls risk assessment using the Johns Hopkins Fall Risk Assessment Tool.

#76 Cyclic Counterfactuals under Shift-Scale Interventions

著者: Saptarshi Saha, Dhruv Vansraj Rathore, Utpal Garain

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.25005

要約:
Most counterfactual inference frameworks traditionally assume acyclic structural causal models (SCMs), i.e. directed acyclic graphs (DAGs). However, many real-world systems (e.g. biological systems) contain feedback loops or cyclic dependencies that violate acyclicity. In this work, we study counterfactual inference in cyclic SCMs under shift-scale interventions, i.e., soft, policy-style changes that rescale and/or shift a variable's mechanism.

#77 GraphBench: Next-generation graph learning benchmarking

著者: Timo Stoll, Chendi Qian, Ben Finkelshtein, Ali Parviz, Darius Weber, Fabrizio Frasca, Hadar Shavit, Antoine Siraudin, Arman Mielke, Marie Anastacio, Erik M\"uller, Maya Bechler-Speicher, Michael Bronstein, Mikhail Galkin, Holger Hoos, Mathias Niepert, Bryan Perozzi, Jan T\"onshoff, Christopher Morris

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.04475

要約:
Machine learning on graphs has recently achieved impressive progress in various domains, including molecular property prediction and chip design. However, benchmarking practices remain fragmented, often relying on narrow, task-specific datasets and inconsistent evaluation protocols, which hampers reproducibility and broader progress. To address this, we introduce GraphBench, a comprehensive benchmarking suite that spans diverse domains and prediction tasks, including node-level, edge-level, graph-level, and generative settings. GraphBench provides standardized evaluation protocols -- with consistent dataset splits and performance metrics that account for out-of-distribution generalization -- as well as a unified hyperparameter tuning framework. Additionally, we benchmark GraphBench using message-passing neural networks and graph transformer models, providing principled baselines and establishing a reference performance. See www.graphbench.io for further details.

#78 Cluster-Dags as Powerful Background Knowledge For Causal Discovery

著者: Jan Marco Ruiz de Vargas, Kirtan Padh, Niki Kilbertus

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.10032

要約:
Finding cause-effect relationships is of key importance in science. Causal discovery aims to recover a graph from data that succinctly describes these cause-effect relationships. However, current methods face several challenges, especially when dealing with high-dimensional data and complex dependencies. Incorporating prior knowledge about the system can aid causal discovery. In this work, we leverage Cluster-DAGs as a prior knowledge framework to warm-start causal discovery. We show that Cluster-DAGs offer greater flexibility than existing approaches based on tiered background knowledge and introduce two modified constraint-based algorithms, Cluster-PC and Cluster-FCI, for causal discovery in the fully and partially observed setting, respectively. Empirical evaluation on simulated data demonstrates that Cluster-PC and Cluster-FCI outperform their respective baselines without prior knowledge.

#79 From Many Models, One: Macroeconomic Forecasting with Reservoir Ensembles

著者: Giovanni Ballarin, Lyudmila Grigoryeva, Yui Ching Li

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.13642

要約:
Model combination is a powerful approach for achieving superior performance compared to selecting a single model. We study both theoretically and empirically the effectiveness of ensembles of Multi-Frequency Echo State Networks (MFESNs), which have been shown to achieve state-of-the-art macroeconomic time series forecasting results (Ballarin et al., 2024a). The Hedge and Follow-the-Leader schemes are discussed, and their online learning guarantees are extended to settings with dependent data. In empirical applications, the proposed Ensemble Echo State Networks demonstrate significantly improved predictive performance relative to individual MFESN models.

#80 When Does Pairing Seeds Reduce Variance? Evidence from a Multi-Agent Economic Simulation

agent

著者: Udit Sharma

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.24145

要約:
Machine learning systems appear stochastic but are deterministically random, as seeded pseudorandom number generators produce identical realisations across repeated executions. Standard evaluation practice typically treats runs across alternatives as independent and does not exploit shared sources of randomness. This paper analyses the statistical structure of comparative evaluation under shared random seeds. Under this design, competing systems are evaluated using identical seeds, inducing matched stochastic realisations and yielding strict variance reduction whenever outcomes are positively correlated at the seed level. We demonstrate these effects using an extended learning-based multi-agent economic simulator, where paired evaluation exposes systematic differences in aggregate and distributional outcomes that remain statistically inconclusive under independent evaluation at fixed budgets.

#81 A Targeted Learning Framework for Estimating Restricted Mean Survival Time Difference using Pseudo-observations

著者: Man Jin, Yixin Fang

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06296

要約:
A targeted learning (TL) framework is developed to estimate the difference in the restricted mean survival time (RMST) for a clinical trial with time-to-event outcomes. The approach starts by defining the target estimand as the RMST difference between investigational and control treatments. Next, an efficient estimation method is introduced: a targeted minimum loss estimator (TMLE) utilizing pseudo-observations. Moreover, a version of the copy reference (CR) approach is developed to perform a sensitivity analysis for right-censoring. The proposed TL framework is demonstrated using a real data application.

#82 Geometric Stability: The Missing Axis of Representations

著者: Prashant C. Raju

公開日: Wed, 21 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.09173

要約:
Analysis of learned representations has a blind spot: it focuses on $similarity$, measuring how closely embeddings align with external references, but similarity reveals only what is represented, not whether that structure is robust. We introduce $geometric$ $stability$, a distinct dimension that quantifies how reliably representational geometry holds under perturbation, and present $Shesha$, a framework for measuring it. Across 2,463 configurations in seven domains, we show that stability and similarity are empirically uncorrelated ($\rho \approx 0.01$) and mechanistically distinct: similarity metrics collapse after removing the top principal components, while stability retains sensitivity to fine-grained manifold structure. This distinction yields actionable insights: for safety monitoring, stability acts as a functional geometric canary, detecting structural drift nearly 2$\times$ more sensitively than CKA while filtering out the non-functional noise that triggers false alarms in rigid distance metrics; for controllability, supervised stability predicts linear steerability ($\rho = 0.89$-$0.96$); for model selection, stability dissociates from transferability, revealing a geometric tax that transfer optimization incurs. Beyond machine learning, stability predicts CRISPR perturbation coherence and neural-behavioral coupling. By quantifying $how$ $reliably$ systems maintain structure, geometric stability provides a necessary complement to similarity for auditing representations across biological and computational systems.

stat.ML updates on arXiv.org

📋 論文タイトル一覧