arXiv論文一覧 - stat.ML updates on arXiv.org

#1 Adaptive Learning via Off-Model Training and Importance Sampling for Fully Non-Markovian Optimal Stochastic Control. Complete version

著者: Dorival Le\~ao, Alberto Ohashi, Simone Scotti, Adolfo M. D da Silva

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.13147

要約:
This paper studies continuous-time stochastic control problems whose controlled states are fully non-Markovian and depend on unknown model parameters. Such problems arise naturally in path-dependent stochastic differential equations, rough-volatility hedging, and systems driven by fractional Brownian motion. Building on the discrete skeleton approach developed in earlier work, we propose a Monte Carlo learning methodology for the associated embedded backward dynamic programming equation. Our main contribution is twofold. First, we construct explicit dominating training laws and Radon--Nikodym weights for several representative classes of non-Markovian controlled systems. This yields an off-model training architecture in which a fixed synthetic dataset is generated under a reference law, while the dynamic programming operators associated with a target model are recovered by importance sampling. Second, we use this structure to design an adaptive update mechanism under parametric model uncertainty, so that repeated recalibration can be performed by reweighting the same training sample rather than regenerating new trajectories. For fixed parameters, we establish non-asymptotic error bounds for the approximation of the embedded dynamic programming equation via deep neural networks. For adaptive learning, we derive quantitative estimates that separate Monte Carlo approximation error from model-risk error. Numerical experiments illustrate both the off-model training mechanism and the adaptive importance-sampling update in structured linear-quadratic examples.

#2 Rare Event Analysis via Stochastic Optimal Control

著者: Yuanqi Du, Jiajun He, Dinghuai Zhang, Eric Vanden-Eijnden, Carles Domingo-Enrich

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.13213

要約:
Rare events such as conformational changes in biomolecules, phase transitions, and chemical reactions are central to the behavior of many physical systems, yet they are extremely difficult to study computationally because unbiased simulations seldom produce them. Transition Path Theory (TPT) provides a rigorous statistical framework for analyzing such events: it characterizes the ensemble of reactive trajectories between two designated metastable states (reactant and product), and its central object--the committor function, which gives the probability that the system will next reach the product rather than the reactant--encodes all essential kinetic and thermodynamic information. We introduce a framework that casts committor estimation as a stochastic optimal control (SOC) problem. In this formulation the committor defines a feedback control--proportional to the gradient of its logarithm--that actively steers trajectories toward the reactive region, thereby enabling efficient sampling of reactive paths. To solve the resulting hitting-time control problem we develop two complementary objectives: a direct backpropagation loss and a principled off-policy Value Matching loss, for which we establish first-order optimality guarantees. We further address metastability, which can trap controlled trajectories in intermediate basins, by introducing an alternative sampling process that preserves the reactive current while lowering effective energy barriers. On benchmark systems, the framework yields markedly more accurate committor estimates, reaction rates, and equilibrium constants than existing methods.

#3 Identifiability of Potentially Degenerate Gaussian Mixture Models With Piecewise Affine Mixing

著者: Danru Xu, S\'ebastien Lachapelle, Sara Magliacane

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.13218

要約:
Causal representation learning (CRL) aims to identify the underlying latent variables from high-dimensional observations, even when variables are dependent with each other. We study this problem for latent variables that follow a potentially degenerate Gaussian mixture distribution and that are only observed through the transformation via a piecewise affine mixing function. We provide a series of progressively stronger identifiability results for this challenging setting in which the probability density functions are ill-defined because of the potential degeneracy. For identifiability up to permutation and scaling, we leverage a sparsity regularization on the learned representation. Based on our theoretical results, we propose a two-stage method to estimate the latent variables by enforcing sparsity and Gaussianity in the learned representations. Experiments on synthetic and image data highlight our method's effectiveness in recovering the ground-truth latent variables.

#4 Joint Representation Learning and Clustering via Gradient-Based Manifold Optimization

著者: Sida Liu, Yangzi Guo, Mingyuan Wang

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.13484

要約:
Clustering and dimensionality reduction have been crucial topics in machine learning and computer vision. Clustering high-dimensional data has been challenging for a long time due to the curse of dimensionality. For that reason, a more promising direction is the joint learning of dimension reduction and clustering. In this work, we propose a Manifold Learning Framework that learns dimensionality reduction and clustering simultaneously. The proposed framework is able to jointly learn the parameters of a dimension reduction technique (e.g. linear projection or a neural network) and cluster the data based on the resulting features (e.g. under a Gaussian Mixture Model framework). The framework searches for the dimension reduction parameters and the optimal clusters by traversing a manifold,using Gradient Manifold Optimization. The obtained The proposed framework is exemplified with a Gaussian Mixture Model as one simple but efficient example, in a process that is somehow similar to unsupervised Linear Discriminant Analysis (LDA). We apply the proposed method to the unsupervised training of simulated data as well as a benchmark image dataset (i.e. MNIST). The experimental results indicate that our algorithm has better performance than popular clustering algorithms from the literature.

#5 Robust Low-Rank Tensor Completion based on M-product with Weighted Correlated Total Variation and Sparse Regularization

著者: Biswarup Karmakar, Ratikanta Behera

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.13525

要約:
The robust low-rank tensor completion problem addresses the challenge of recovering corrupted high-dimensional tensor data with missing entries, outliers, and sparse noise commonly found in real-world applications. Existing methodologies have encountered fundamental limitations due to their reliance on uniform regularization schemes, particularly the tensor nuclear norm and $\ell_1$ norm regularization approaches, which indiscriminately apply equal shrinkage to all singular values and sparse components, thereby compromising the preservation of critical tensor structures. The proposed tensor weighted correlated total variation (TWCTV) regularizer addresses these shortcomings through an $M$-product framework that combines a weighted Schatten-$p$ norm on gradient tensors for low-rankness with smoothness enforcement and weighted sparse components for noise suppression. The proposed weighting scheme adaptively reduces the thresholding level to preserve both dominant singular values and sparse components, thus improving the reconstruction of critical structural elements and nuanced details in the recovered signal. Through a systematic algorithmic approach, we introduce an enhanced alternating direction method of multipliers (ADMM) that offers both computational efficiency and theoretical substantiation, with convergence properties comprehensively analyzed within the $M$-product framework.Comprehensive numerical evaluations across image completion, denoising, and background subtraction tasks validate the superior performance of this approach relative to established benchmark methods.

#6 Covariance-adapting algorithm for semi-bandits with application to sparse rewards

著者: Pierre Perrault, Vianney Perchet, Michal Valko

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.13738

要約:
We investigate stochastic combinatorial semi-bandits, where the entire joint distribution of outcomes impacts the complexity of the problem instance (unlike in the standard bandits). Typical distributions considered depend on specific parameter values, whose prior knowledge is required in theory but quite difficult to estimate in practice; an example is the commonly assumed sub-Gaussian family. We alleviate this issue by instead considering a new general family of sub-exponential distributions, which contains bounded and Gaussian ones. We prove a new lower bound on the expected regret on this family, that is parameterized by the unknown covariance matrix of outcomes, a tighter quantity than the sub-Gaussian matrix. We then construct an algorithm that uses covariance estimates, and provide a tight asymptotic analysis of the regret. Finally, we apply and extend our results to the family of sparse outcomes, which has applications in many recommender systems.

#7 Generalization Guarantees on Data-Driven Tuning of Gradient Descent with Langevin Updates

著者: Saumya Goyal, Rohith Rongali, Ritabrata Ray, Barnab\'as P\'oczos

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.13130

要約:
We study learning to learn for regression problems through the lens of hyperparameter tuning. We propose the Langevin Gradient Descent Algorithm (LGD), which approximates the mean of the posterior distribution defined by the loss function and regularizer of a convex regression task. We prove the existence of an optimal hyperparameter configuration for which the LGD algorithm achieves the Bayes' optimal solution for squared loss. Subsequently, we study generalization guarantees on meta-learning optimal hyperparameters for the LGD algorithm from a given set of tasks in the data-driven setting. For a number of parameters $d$ and hyperparameter dimension $h$, we show a pseudo-dimension bound of $O(dh)$, upto logarithmic terms under mild assumptions on LGD. This matches the dimensional dependence of the bounds obtained in prior work for the elastic net, which only allows for $h=2$ hyperparameters, and extends their bounds to regression on convex loss. Finally, we show empirical evidence of the success of LGD and the meta-learning procedure for few-shot learning on linear regression using a few synthetically created datasets.

#8 Bias-Corrected Adaptive Conformal Inference for Multi-Horizon Time Series Forecasting

著者: Ankit Lade, Sai Krishna J., Indar Kumar

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.13253

要約:
Adaptive Conformal Inference (ACI) provides distribution-free prediction intervals with asymptotic coverage guarantees for time series under distribution shift. However, ACI only adapts the quantile threshold -- it cannot shift the interval center. When a base forecaster develops persistent bias after a regime change, ACI compensates by widening intervals symmetrically, producing unnecessarily conservative bands. We propose Bias-Corrected ACI (BC-ACI), which augments standard ACI with an online exponentially weighted moving average (EWM) estimate of forecast bias. BC-ACI corrects nonconformity scores before quantile computation and re-centers prediction intervals, addressing the root cause of miscalibration rather than its symptom. An adaptive dead-zone threshold suppresses corrections when estimated bias is indistinguishable from noise, ensuring no degradation on well-calibrated data. In controlled experiments across 688 runs spanning two base models, four synthetic regimes, and three real datasets, BC-ACI reduces Winkler interval scores by 13--17% under mean and compound distribution shifts (Wilcoxon p < 0.001) while maintaining equivalent performance on stationary data (ratio 1.002x). We provide finite-sample analysis showing that coverage guarantees degrade gracefully with bias estimation error.

#9 Some Theoretical Limitations of t-SNE

著者: Rupert Li, Elchanan Mossel

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.13295

要約:
t-SNE has gained popularity as a dimension reduction technique, especially for visualizing data. It is well-known that all dimension reduction techniques may lose important features of the data. We provide a mathematical framework for understanding this loss for t-SNE by establishing a number of results in different scenarios showing how important features of data are lost by using t-SNE.

#10 A short proof of near-linear convergence of adaptive gradient descent under fourth-order growth and convexity

著者: Damek Davis, Dmitriy Drusvyatskiy

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.13393

要約:
Davis, Drusvyatskiy, and Jiang showed that gradient descent with an adaptive stepsize converges locally at a nearly-linear rate for smooth functions that grow at least quartically away from their minimizers. The argument is intricate, relying on monitoring the performance of the algorithm relative to a certain manifold of slow growth -- called the ravine. In this work, we provide a direct Lyapunov-based argument that bypasses these difficulties when the objective is in addition convex and a has a unique minimizer. As a byproduct of the argument, we obtain a more adaptive variant than the original algorithm with encouraging numerical performance.

#11 Estimating Continuous Treatment Effects with Two-Stage Kernel Ridge Regression

著者: Seok-Jin Kim, Kaizheng Wang

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.13410

要約:
We study the problem of estimating the effect function for a continuous treatment, which maps each treatment value to a population-averaged outcome. A central challenge in this setting is confounding: treatment assignment often depends on covariates, creating selection bias that makes direct regression of the response on treatment unreliable. To address this issue, we propose a two-stage kernel ridge regression method. In the first stage, we learn a model for the response as a function of both treatment and covariates; in the second stage, we use this model to construct pseudo-outcomes that correct for distribution shift, and then fit a second model to estimate the treatment effect. Although the response varies with both treatment and covariates, the induced effect function obtained by averaging over covariates is typically much simpler, and our estimator adapts to this structure. Furthermore, we introduce a fully data-driven model selection procedure that achieves provable adaptivity to both the unknown degree of overlap and the regularity (eigenvalue decay) of the underlying kernel.

#12 Universality of Gaussian-Mixture Reverse Kernels in Conditional Diffusion

diffusion

著者: Nafiz Ishtiaque, Syed Arefinul Haque, Kazi Ashraful Alam, Fatima Jahara

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.13470

要約:
We prove that conditional diffusion models whose reverse kernels are finite Gaussian mixtures with ReLU-network logits can approximate suitably regular target distributions arbitrarily well in context-averaged conditional KL divergence, up to an irreducible terminal mismatch that typically vanishes with increasing diffusion horizon. A path-space decomposition reduces the output error to this mismatch plus per-step reverse-kernel errors; assuming each reverse kernel factors through a finite-dimensional feature map, each step becomes a static conditional density approximation problem, solved by composing Norets' Gaussian-mixture theory with quantitative ReLU bounds. Under exact terminal matching the resulting neural reverse-kernel class is dense in conditional KL.

#13 Ordinary Least Squares is a Special Case of Transformer

著者: Xiaojun Tan, Yuchen Zhao

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.13656

要約:
The statistical essence of the Transformer architecture has long remained elusive: Is it a universal approximator, or a neural network version of known computational algorithms? Through rigorous algebraic proof, we show that the latter better describes Transformer's basic nature: Ordinary Least Squares (OLS) is a special case of the single-layer Linear Transformer. Using the spectral decomposition of the empirical covariance matrix, we construct a specific parameter setting where the attention mechanism's forward pass becomes mathematically equivalent to the OLS closed-form projection. This means attention can solve the problem in one forward pass, not by iterating. Building upon this prototypical case, we further uncover a decoupled slow and fast memory mechanism within Transformers. Finally, the evolution from our established linear prototype to standard Transformers is discussed. This progression facilitates the transition of the Hopfield energy function from linear to exponential memory capacity, thereby establishing a clear continuity between modern deep architectures and classical statistical inference.

#14 Spectral Thompson sampling

著者: Tomas Kocak, Michal Valko, Remi Munos, Shipra Agrawal

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.13739

要約:
Thompson Sampling (TS) has attracted a lot of interest due to its good empirical performance, in particular in the computational advertising. Though successful, the tools for its performance analysis appeared only recently. In this paper, we describe and analyze SpectralTS algorithm for a bandit problem, where the payoffs of the choices are smooth given an underlying graph. In this setting, each choice is a node of a graph and the expected payoffs of the neighboring nodes are assumed to be similar. Although the setting has application both in recommender systems and advertising, the traditional algorithms would scale poorly with the number of choices. For that purpose we consider an effective dimension d, which is small in real-world graphs. We deliver the analysis showing that the regret of SpectralTS scales as d*sqrt(T ln N) with high probability, where T is the time horizon and N is the number of choices. Since a d*sqrt(T ln N) regret is comparable to the known results, SpectralTS offers a computationally more efficient alternative. We also show that our algorithm is competitive on both synthetic and real-world data.

#15 Online learning with noisy side observations

著者: Tom\'a\v{s} Koc\'ak, Gergely Neu, Michal Valko

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.13740

要約:
We propose a new partial-observability model for online learning problems where the learner, besides its own loss, also observes some noisy feedback about the other actions, depending on the underlying structure of the problem. We represent this structure by a weighted directed graph, where the edge weights are related to the quality of the feedback shared by the connected nodes. Our main contribution is an efficient algorithm that guarantees a regret of $\widetilde{O}(\sqrt{\alpha^* T})$ after $T$ rounds, where $\alpha^*$ is a novel graph property that we call the effective independence number. Our algorithm is completely parameter-free and does not require knowledge (or even estimation) of $\alpha^*$. For the special case of binary edge weights, our setting reduces to the partial-observability models of Mannor and Shamir (2011) and Alon et al. (2013) and our algorithm recovers the near-optimal regret bounds.

#16 Forecasting Multivariate Time Series under Predictive Heterogeneity: A Validation-Driven Clustering Framework

著者: Ziling Ma, \'Angel L\'opez Oriona, Hernando Ombao, Ying Sun

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.13748

要約:
We study adaptive pooling under predictive heterogeneity in high-dimensional multivariate time series forecasting, where global models improve statistical efficiency but may fail to capture heterogeneous predictive structure, while naive specialization can induce negative transfer. We formulate adaptive pooling as a statistical decision problem and propose a validation-driven framework that determines when and how specialization should be applied. Rather than grouping series based on representation similarity, we define partitions through out-of-sample predictive performance, thereby aligning data organization with predictive risk, defined as expected out-of-sample loss and approximated via validation error. Cluster assignments are iteratively updated using validation losses for both point (Huber) and probabilistic (pinball) forecasting, improving robustness to heavy-tailed errors and local anomalies. To ensure reliability, we introduce a leakage-free fallback mechanism that reverts to a global model whenever specialization fails to improve validation performance, providing a safeguard against performance degradation under a strict training-validation-test protocol. Experiments on large-scale traffic datasets demonstrate consistent improvements over strong baselines while avoiding degradation when heterogeneity is weak. Overall, the proposed framework provides a principled and practically reliable approach to adaptive pooling in high-dimensional forecasting problems.

#17 Sandpile Economics: Theory, Identification, and Evidence

著者: Diego Vallarino

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.13890

要約:
Why do capitalist economies recurrently generate crises whose severity is disproportionate to the size of the triggering shock? This paper proposes a structural answer grounded in the evolutionary geometry of production networks. As economies evolve through specialization, integration, and competitive selection, their inter-sectoral linkages drift toward configurations of increasing geometric fragility, eventually crossing a threshold beyond which small disturbances generate disproportionately large cascades. We introduce Sandpile Economics, a formal framework that interprets macroeconomic instability as an emergent property of disequilibrium production networks. The key state variable is the Forman--Ricci curvature of the input--output graph, capturing local substitution possibilities when supply chains are disrupted. We show that when curvature falls below an endogenous threshold, the distribution of cascade sizes follows a power law with tail index $\alpha \in (1,2)$, implying a regime of unbounded amplification. The underlying mechanism is evolutionary: specialization reduces input substitutability, pushing the economy toward criticality, while crisis episodes induce endogenous network reconfiguration and path dependence. These dynamics are inherently non-ergodic and cannot be captured by representative-agent frameworks. Empirically, using global input--output data, we document that production networks operate in persistently negative curvature regimes and that curvature robustly predicts medium-run output dynamics. A one-standard-deviation increase in curvature is associated with higher cumulative growth over three-year horizons, and curvature systematically outperforms standard network metrics in explaining cross-country differences in resilience.

#18 BOAT: Navigating the Sea of In Silico Predictors for Antibody Design via Multi-Objective Bayesian Optimization

著者: Jackie Rao, Ferran Gonzalez Hernandez, Leon Gerard, Alexandra Gessner

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.13980

要約:
Antibody lead optimization is inherently a multi-objective challenge in drug discovery. Achieving a balance between different drug-like properties is crucial for the development of viable candidates, and this search becomes exponentially challenging as desired properties grow. The ever-growing zoo of sophisticated in silico tools for predicting antibody properties calls for an efficient joint optimization procedure to overcome resource-intensive sequential filtering pipelines. We present BOAT, a versatile Bayesian optimization framework for multi-property antibody engineering. Our `plug-and-play' framework couples uncertainty-aware surrogate modeling with a genetic algorithm to jointly optimize various predicted antibody traits while enabling efficient exploration of sequence space. Through systematic benchmarking against genetic algorithms and newer generative learning approaches, we demonstrate competitive performance with state-of-the-art methods for multi-objective protein optimization. We identify clear regimes where surrogate-driven optimization outperforms expensive generative approaches and establish practical limits imposed by sequence dimensionality and oracle costs.

#19 Two-Sided Bounds for Entropic Optimal Transport via a Rate-Distortion Integral

著者: Jingbo Liu

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.14061

要約:
We show that the maximum expected inner product between a random vector and the standard normal vector over all couplings subject to a mutual information constraint or regularization is equivalent to a truncated integral involving the rate-distortion function, up to universal multiplicative constants. The proof is based on a lifting technique, which constructs a Gaussian process indexed by a random subset of the type class of the probability distribution involved in the information-theoretic inequality, and then applying a form of the majorizing measure theorem.

#20 Multistage Conditional Compositional Optimization

著者: Buse \c{S}en, Yifan Hu, Daniel Kuhn

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.14075

要約:
We introduce Multistage Conditional Compositional Optimization (MCCO) as a new paradigm for decision-making under uncertainty that combines aspects of multistage stochastic programming and conditional stochastic optimization. MCCO minimizes a nest of conditional expectations and nonlinear cost functions. It has numerous applications and arises, for example, in optimal stopping, linear-quadratic regulator problems, distributionally robust contextual bandits, as well as in problems involving dynamic risk measures. The na\"ive nested sampling approach for MCCO suffers from the curse of dimensionality familiar from scenario tree-based multistage stochastic programming, that is, its scenario complexity grows exponentially with the number of nests. We develop new multilevel Monte Carlo techniques for MCCO whose scenario complexity grows only polynomially with the desired accuracy.

#21 Momentum Further Constrains Sharpness at the Edge of Stochastic Stability

著者: Arseniy Andreyev, Advikar Ananthkumar, Marc Walden, Tomaso Poggio, Pierfrancesco Beneventano

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.14108

要約:
Recent work suggests that (stochastic) gradient descent self-organizes near an instability boundary, shaping both optimization and the solutions found. Momentum and mini-batch gradients are widely used in practical deep learning optimization, but it remains unclear whether they operate in a comparable regime of instability. We demonstrate that SGD with momentum exhibits an Edge of Stochastic Stability (EoSS)-like regime with batch-size-dependent behavior that cannot be explained by a single momentum-adjusted stability threshold. Batch Sharpness (the expected directional mini-batch curvature) stabilizes in two distinct regimes: at small batch sizes it converges to a lower plateau $2(1-\beta)/\eta$, reflecting amplification of stochastic fluctuations by momentum and favoring flatter regions than vanilla SGD; at large batch sizes it converges to a higher plateau $2(1+\beta)/\eta$, where momentum recovers its classical stabilizing effect and favors sharper regions consistent with full-batch dynamics. We further show that this aligns with linear stability thresholds and discuss the implications for hyperparameter tuning and coupling.

#22 Nonparametric Sparse Online Learning of the Koopman Operator

著者: Boya Hou, Sina Sanjari, Nathan Dahlin, Alec Koppel, Subhonmesh Bose

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2405.07432

要約:
The Koopman operator provides a powerful framework for representing the dynamics of general nonlinear dynamical systems. However, existing data-driven approaches to learning the Koopman operator rely on batch data. In this work, we present a sparse online learning algorithm that learns the Koopman operator iteratively via stochastic approximation, with explicit control over model complexity and provable convergence guarantees. Specifically, we study the Koopman operator via its action on the reproducing kernel Hilbert space (RKHS), and address the mis-specified scenario where the dynamics may escape the chosen RKHS. In this mis-specified setting, we relate the Koopman operator to the conditional mean embeddings (CME) operator. We further establish both asymptotic and finite-time convergence guarantees for our learning algorithm in mis-specified setting, with trajectory-based sampling where the data arrive sequentially over time. Numerical experiments demonstrate the algorithm's capability to learn unknown nonlinear dynamics.

#23 Flow-based Generative Modeling of Potential Outcomes and Counterfactuals

著者: Dongze Wu, David I. Inouye, Yao Xie

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2505.16051

要約:
Predicting potential and counterfactual outcomes from observational data is central to individualized decision-making, particularly in clinical settings where treatment choices must be tailored to each patient rather than guided solely by population averages. We propose PO-Flow, a continuous normalizing flow (CNF) framework for causal inference that jointly models potential outcome distributions and factual-conditioned counterfactual outcomes. Trained via flow matching, PO-Flow provides a unified approach to individualized potential outcome prediction, conditional average treatment effect estimation, and counterfactual prediction. By encoding an observed factual outcome and decoding under an alternative treatment, PO-Flow provides an encode-decode mechanism for factual-conditioned counterfactual prediction. In addition, PO-Flow supports likelihood-based evaluation of potential outcomes, enabling uncertainty-aware assessment of predictions. A supporting recovery guarantee is established under certain assumptions, and empirical results on benchmark datasets demonstrate strong performance across a range of causal inference tasks within the potential outcomes framework.

#24 Random Walk Learning and the Pac-Man Attack

著者: Xingran Chen, Parimal Parag, Rohit Bhagat, Zonghong Liu, Salim El Rouayheb

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2508.05663

要約:
Random walk (RW)-based algorithms have long been popular in distributed systems due to low overheads and scalability, with recent growing applications in decentralized learning. However, their reliance on local interactions makes them inherently vulnerable to malicious behavior. In this work, we investigate an adversarial threat that we term the ``Pac-Man'' attack, in which a malicious node probabilistically terminates any RW that visits it. This stealthy behavior gradually eliminates active RWs from the network, effectively halting the learning process without triggering failure alarms. To counter this threat, we propose the Average Crossing (AC) algorithm--a fully decentralized mechanism for duplicating RWs to prevent RW extinction in the presence of Pac-Man. Our theoretical analysis establishes that (i) the RW population remains almost surely bounded under AC and (ii) RW-based stochastic gradient descent remains convergent under AC, even in the presence of Pac-Man, with a quantifiable deviation from the true optimum. Our extensive empirical results on both synthetic and real-world datasets corroborate our theoretical findings. Furthermore, they uncover a phase transition in the extinction probability as a function of the duplication threshold. We offer theoretical insights by analyzing a simplified variant of the AC, which sheds light on the observed phase transition.

#25 Mini-Batch Covariance, Diffusion Limits, and Oracle Complexity in Stochastic Gradient Descent: A Sampling-Design Perspective

diffusion

著者: Daniel Zantedeschi, Kumar Muthuraman

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.02417

要約:
Stochastic gradient descent (SGD) is central to simulation optimization, stochastic programming, and online M-estimation, where sampling effort is a decision variable. We study the mini-batch gradient noise as a sampling-design object. Under exchangeable fresh-sampling mini-batches, the conditional covariance given the de Finetti directing measure mu is b^{-1} G_mu(theta), and under identifiability the projected population object is b^{-1} G*(theta) -- projected Fisher information for correctly specified likelihoods, the sandwich partner of the Hessian otherwise. This identification fixes the noise matrix entering the diffusion analysis of constant-step SGD: the raw iterate path has a deterministic fluid limit, and the sqrt(b/eta)-scaled fluctuations satisfy a functional CLT with noise covariance G*; near a nondegenerate optimum the limit is Ornstein-Uhlenbeck, and its Lyapunov covariance scaled by eta/b matches the linearized discrete recursion at leading order. Under a curvature-noise compatibility condition mu_F > 0, we prove 1/N mean-square upper bounds and an i.i.d. parametric Fisher van Trees lower bound of the same rate order, with oracle-complexity guarantees depending on an effective dimension d_eff and condition number kappa_F. Numerical experiments verify the identification and confirm the Lyapunov predictions in direct SGD.

#26 Cost-optimal Sequential Testing via Doubly Robust Q-learning

著者: Doudou Zhou, Yiran Zhang, Dian Jin, Yingye Zheng, Lu Tian, Tianxi Cai

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.11165

要約:
Clinical decision-making often involves selecting tests that are costly, invasive, or time-consuming, motivating individualized, sequential strategies for what to measure and when to stop ascertaining. We study the problem of learning cost-optimal sequential decision policies from retrospective data, where test availability depends on prior results, inducing informative missingness. Under a sequential missing-at-random mechanism, we develop a doubly robust Q-learning framework for estimating optimal policies. The method introduces path-specific inverse probability weights that account for heterogeneous test trajectories and satisfy a normalization property conditional on the observed history. By combining these weights with auxiliary contrast models, we construct orthogonal pseudo-outcomes that enable unbiased policy learning when either the acquisition model or the contrast model is correctly specified. We establish oracle inequalities for the stage-wise contrast estimators, along with convergence rates, regret bounds, and misclassification rates for the learned policy. Simulations demonstrate improved cost-adjusted performance over weighted and complete-case baselines, and an application to a prostate cancer cohort study illustrates how the method reduces testing cost without compromising predictive accuracy.

#27 A ghost mechanism: An analytical model of abrupt learning in recurrent networks

著者: Fatih Dinc, Ege Cirakman, Bariscan Kurtkaya, Mert Yuksekgonul, Yiqi Jiang, Mark J. Schnitzer, Hidenori Tanaka

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2501.02378

要約:
Abrupt learning is a common phenomenon in recurrent neural networks (RNNs) trained on working memory tasks. In such cases, the networks develop transient slow regions in state space that extend the effective timescales of computation. However, the mechanisms driving sudden performance improvements and their causal role remain unclear. To address this gap, we introduce the ghost mechanism, a process by which dynamical systems exhibit transient slowdown near the remnant of a saddle-node bifurcation. By reducing the high-dimensional dynamics near ghost points, we derive a one-dimensional canonical form that analytically captures learning as a process controlled by a single scale parameter. Using this model, we study a form of abrupt learning emerging from ghost points and identify a critical learning rate that scales as an inverse power law with the timescale of the learned computation. Beyond this rate, learning collapses through two interacting modes: (i) vanishing gradients and (ii) oscillatory gradients near minima. These features can lock the system into high-confidence but incorrect predictions when parameter updates trigger a no-learning zone, a region of parameter space where gradients vanish. We validate these predictions in low-rank RNNs, where ghost points precede abrupt transitions, and further demonstrate their generality in full-rank RNNs trained on canonical working memory tasks. Our theory offers two approaches to address these learning difficulties: increasing trainable ranks stabilizes learning trajectories, while reducing output confidence mitigates entrapment in no-learning zones. Overall, the ghost mechanism reveals how the computational demands of a task constrain the optimization landscape, demonstrating that well-known learning difficulties in RNNs partly arise from the dynamical systems they must learn to implement.

#28 The Gaussian Latent Machine: Efficient Prior and Posterior Sampling for Inverse Problems

著者: Muhamed Kuric, Martin Zach, Andreas Habring, Michael Unser, Thomas Pock

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2505.12836

要約:
We consider the problem of sampling from a product-of-experts-type model that encompasses many standard prior and posterior distributions commonly found in Bayesian imaging. We show that this model can be easily lifted into a novel latent variable model, which we refer to as a Gaussian latent machine. This leads to a general sampling approach that unifies and generalizes many existing sampling algorithms in the literature. Most notably, it yields a highly efficient and effective two-block Gibbs sampling approach in the general case, while also specializing to direct sampling algorithms in particular cases. Finally, we present detailed numerical experiments that demonstrate the efficiency and effectiveness of our proposed sampling approach across a wide range of prior and posterior sampling problems from Bayesian imaging.

#29 Scalable Spatiotemporal Inference with Biased Scan Attention Transformer Neural Processes

著者: Daniel Jenson, Jhonathan Navott, Piotr Grynfelder, Mengyan Zhang, Makkunda Sharma, Elizaveta Semenova, Seth Flaxman

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2506.09163

要約:
Neural Processes (NPs) are a rapidly evolving class of models designed to directly model the posterior predictive distribution of stochastic processes. While early architectures were developed primarily as a scalable alternative to Gaussian Processes (GPs), modern NPs tackle far more complex and data-hungry applications spanning geology, epidemiology, climate, and robotics. These applications have placed increasing pressure on the scalability of these models, with many architectures compromising accuracy for scalability. In this paper, we demonstrate that this trade-off is often unnecessary, particularly when modeling fully or partially translation-invariant processes. We propose a versatile new architecture, the Biased Scan Attention Transformer Neural Process (BSA-TNP), which introduces Kernel Regression Blocks (KRBlocks), group-invariant attention biases, and memory-efficient Biased Scan Attention (BSA). BSA-TNP is able to: (1) match or exceed the accuracy of the best models while often training in a fraction of the time, (2) exhibit translation invariance, enabling learning at multiple resolutions simultaneously, (3) transparently model processes that evolve in both space and time, (4) support high-dimensional fixed effects, and (5) scale gracefully, running inference on over 1M test points and 100K context points in under a minute on a single 24GB GPU. Code is provided as part of the `dl4bi` package.

#30 Heavy-Tailed Class-Conditional Priors for Long-Tailed Generative Modeling

著者: Aymene Mohammed Bouayed, Samuel Deslauriers-Gauthier, Adrian Iaccovelli, David Naccache

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2509.02154

要約:
Variational Autoencoders (VAEs) with global priors trained under an imbalanced empirical class distribution can lead to underrepresentation of tail classes in the latent space. While $t^3$VAE improves robustness via heavy-tailed Student's $t$-distribution priors, its single global prior still allocates mass proportionally to class frequency. We address this latent geometric bias by introducing C-$t^3$VAE, which assigns a per-class Student's $t$ joint prior over latent and output variables. This design promotes uniform prior mass across class-conditioned components. To optimize our model we derive a closed-form objective from the $\gamma$-power divergence, and we introduce an equal-weight latent mixture for class-balanced generation. On SVHN-LT, CIFAR100-LT, and CelebA datasets, C-$t^3$VAE consistently attains lower FID scores than $t^3$VAE and Gaussian-based VAE baselines under severe class imbalance while remaining competitive in balanced or mildly imbalanced settings. In per-class F1 evaluations, our model outperforms the conditional Gaussian VAE across highly imbalanced settings. Moreover, we identify the mild imbalance threshold $\rho < 5$, for which Gaussian-based models remain competitive. However, for $\rho \geq 5$ our approach yields improved class-balanced generation and mode coverage.

#31 Discrete Guidance Matching: Exact Guidance for Discrete Flow Matching

著者: Zhengyan Wan, Yidong Ouyang, Liyan Xie, Fang Fang, Hongyuan Zha, Guang Cheng

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2509.21912

要約:
Guidance provides a simple and effective framework for posterior sampling by steering the generation process towards the desired distribution. When modeling discrete data, existing approaches mostly focus on guidance with the first-order approximation to improve the sampling efficiency. However, such an approximation is inappropriate in discrete state spaces since the approximation error could be large. A novel guidance framework for discrete data is proposed to address this problem: we derive the exact transition rate for the desired distribution given a learned discrete flow matching model, leading to guidance that only requires a single forward pass in each sampling step, significantly improving efficiency. This unified novel framework is general enough, encompassing existing guidance methods as special cases, and it can also be seamlessly applied to the masked diffusion model. We demonstrate the effectiveness of our proposed guidance on energy-guided simulations and preference alignment on text-to-image generation and multimodal understanding tasks. The code is available at https://github.com/WanZhengyan/Discrete-Guidance-Matching.

#32 A Review of Diffusion-based Simulation-Based Inference: Foundations and Applications in Non-Ideal Data Scenarios

diffusion

著者: Haley Rosso, Talea Mayo

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2512.23748

要約:
For complex simulation problems, inferring parameters often precludes the use of classical likelihood-based techniques due to intractable likelihoods. Simulation-based inference (SBI) methods offer a likelihood-free approach to directly learn posterior distributions $p(\bftheta \mid \xobs)$ from simulator outputs. Recently, diffusion models have emerged as promising tools for SBI, addressing limitations of earlier neural methods such as neural likelihood/posterior estimation and normalizing flows. This review examines diffusion-based SBI from first principles to applications, emphasizing robustness in three non-ideal data scenarios common to scientific computing: model misspecification (simulator-reality mismatch), unstructured or infinite-dimensional observations, and missing data. We synthesize mathematical foundations and survey eight methods addressing these challenges, such as conditional diffusion for irregular data, guided diffusion for prior adaptation, sequential and factorized approaches for efficiency, and consistency models for fast sampling. Throughout, we maintain consistent notation and emphasize conditions required for accurate posteriors. We conclude with open problems and applications to geophysical uncertainty quantification, where these challenges are acute.

#33 A discrete Benamou-Brenier formulation of Optimal Transport on graphs

著者: Kieran Morris, Oliver Johnson

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2601.04193

要約:
We propose a discrete transport equation on graphs which connects distributions on both vertices and edges. We then derive a discrete analogue of the Benamou-Brenier formulation for Wasserstein-$1$ distance on a graph and as a result classify all $W_1$ geodesics on graphs.

#34 Diagnostics for Individual-Level Prediction Instability in Machine Learning for Healthcare

著者: Elizabeth W. Miller, Jeffrey D. Blume

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.00192

要約:
In healthcare, predictive models increasingly inform patient-level decisions, yet little attention is paid to the variability in individual risk estimates and its impact on treatment decisions. For overparameterized models, now standard in machine learning, a substantial source of variability often goes undetected. Even when the data and model architecture are held fixed, randomness introduced by optimization and initialization can lead to materially different risk estimates for the same patient. This problem is largely obscured by standard evaluation practices, which rely on aggregate performance metrics (e.g., log-loss, accuracy) that are agnostic to individual-level stability. As a result, models with indistinguishable aggregate performance can nonetheless exhibit substantial procedural arbitrariness, which can undermine clinical trust. We propose an evaluation framework that quantifies individual-level prediction instability by using two complementary diagnostics: empirical prediction interval width (ePIW), which captures variability in continuous risk estimates, and empirical decision flip rate (eDFR), which measures instability in threshold-based clinical decisions. We apply these diagnostics to simulated data and GUSTO-I clinical dataset. Across observed settings, we find that for flexible machine-learning models, randomness arising solely from optimization and initialization can induce individual-level variability comparable to that produced by resampling the entire training dataset. Neural networks exhibit substantially greater instability in individual risk predictions compared to logistic regression models. Risk estimate instability near clinically relevant decision thresholds can alter treatment recommendations. These findings that stability diagnostics should be incorporated into routine model validation for assessing clinical reliability.

#35 Spectral methods: crucial for machine learning, natural for quantum computers?

著者: Vasilis Belis, Joseph Bowles, Rishabh Gupta, Evan Peters, Maria Schuld

公開日: Thu, 16 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.24654

要約:
This article presents an argument for why quantum computers could unlock new methods for machine learning. We argue that spectral methods, in particular those that learn, regularise, or otherwise manipulate the Fourier spectrum of a machine learning model, are often natural for quantum computers. For example, if a generative machine learning model is represented by a quantum state, the Quantum Fourier Transform allows us to manipulate the Fourier spectrum of the state using the entire toolbox of quantum routines, an operation that is usually prohibitive for classical models. At the same time, spectral methods are surprisingly fundamental to machine learning: A spectral bias has recently been hypothesised to be the core principle behind the success of deep learning; support vector machines have been known for decades to regularise in Fourier space, and convolutional neural nets build filters in the Fourier space of images. Could, then, quantum computing open fundamentally different, much more direct and resource-efficient ways to design the spectral properties of a model? We discuss this potential in detail here, hoping to stimulate a direction in quantum machine learning research that puts the question of ``why quantum?'' first.

stat.ML updates on arXiv.org

📋 論文タイトル一覧