arXiv論文一覧 - stat.ML updates on arXiv.org

#1 Pseudo-Labeling for Unsupervised Domain Adaptation with Kernel GLMs

著者: Nathan Weill, Kaizheng Wang

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.19422

要約:
We propose a principled framework for unsupervised domain adaptation under covariate shift in kernel Generalized Linear Models (GLMs), encompassing kernelized linear, logistic, and Poisson regression with ridge regularization. Our goal is to minimize prediction error in the target domain by leveraging labeled source data and unlabeled target data, despite differences in covariate distributions. We partition the labeled source data into two batches: one for training a family of candidate models, and the other for building an imputation model. This imputation model generates pseudo-labels for the target data, enabling robust model selection. We establish non-asymptotic excess-risk bounds that characterize adaptation performance through an "effective labeled sample size", explicitly accounting for the unknown covariate shift. Experiments on synthetic and real datasets demonstrate consistent performance gains over source-only baselines.

#2 Subspace Projection Methods for Fast Spectral Embeddings of Evolving Graphs

著者: Mohammad Eini, Abdullah Karaaslanli, Vassilis Kalantzis, Panagiotis A. Traganitis

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.19439

要約:
Several graph data mining, signal processing, and machine learning downstream tasks rely on information related to the eigenvectors of the associated adjacency or Laplacian matrix. Classical eigendecomposition methods are powerful when the matrix remains static but cannot be applied to problems where the matrix entries are updated or the number of rows and columns increases frequently. Such scenarios occur routinely in graph analytics when the graph is changing dynamically and either edges and/or nodes are being added and removed. This paper puts forth a new algorithmic framework to update the eigenvectors associated with the leading eigenvalues of an initial adjacency or Laplacian matrix as the graph evolves dynamically. The proposed algorithm is based on Rayleigh-Ritz projections, in which the original eigenvalue problem is projected onto a restricted subspace which ideally encapsulates the invariant subspace associated with the sought eigenvectors. Following ideas from eigenvector perturbation analysis, we present a new methodology to build the projection subspace. The proposed framework features lower computational and memory complexity with respect to competitive alternatives while empirical results show strong qualitative performance, both in terms of eigenvector approximation and accuracy of downstream learning tasks of central node identification and node clustering.

#3 Near-Equivalent Q-learning Policies for Dynamic Treatment Regimes

著者: Sophia Yazzourh, Erica E. M. Moodie

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.19440

要約:
Precision medicine aims to tailor therapeutic decisions to individual patient characteristics. This objective is commonly formalized through dynamic treatment regimes, which use statistical and machine learning methods to derive sequential decision rules adapted to evolving clinical information. In most existing formulations, these approaches produce a single optimal treatment at each stage, leading to a unique decision sequence. However, in many clinical settings, several treatment options may yield similar expected outcomes, and focusing on a single optimal policy may conceal meaningful alternatives. We extend the Q-learning framework for retrospective data by introducing a worst-value tolerance criterion controlled by a hyperparameter $\varepsilon$, which specifies the maximum acceptable deviation from the optimal expected value. Rather than identifying a single optimal policy, the proposed approach constructs sets of $\varepsilon$-optimal policies whose performance remains within a controlled neighborhood of the optimum. This formulation shifts Q-learning from a vector-valued representation to a matrix-valued one, allowing multiple admissible value functions to coexist during backward recursion. The approach yields families of near-equivalent treatment strategies and explicitly identifies regions of treatment indifference where several decisions achieve comparable outcomes. We illustrate the framework in two settings: a single-stage problem highlighting indifference regions around the decision boundary, and a multi-stage decision process based on a simulated oncology model describing tumor size and treatment toxicity dynamics.

#4 On the role of memorization in learned priors for geophysical inverse problems

privacy

著者: Ali Siahkoohi, Davide Sabeddu

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.19629

要約:
Learned priors based on deep generative models offer data-driven regularization for seismic inversion, but training them requires a dataset of representative subsurface models -- a resource that is inherently scarce in geoscience applications. Since the training objective of most generative models can be cast as maximum likelihood on a finite dataset, any such model risks converging to the empirical distribution -- effectively memorizing the training examples rather than learning the underlying geological distribution. We show that the posterior under such a memorized prior reduces to a reweighted empirical distribution -- i.e., a likelihood-weighted lookup among the stored training examples. For diffusion models specifically, memorization yields a Gaussian mixture prior in closed form, and linearizing the forward operator around each training example gives a Gaussian mixture posterior whose components have widths and shifts governed by the local Jacobian. We validate these predictions on a stylized inverse problem and demonstrate the consequences of memorization through diffusion posterior sampling for full waveform inversion.

#5 Model Selection and Parameter Estimation of Multi-dimensional Gaussian Mixture Model

著者: Xinyu Liu, Hai Zhang

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.19657

要約:
In this paper, we study the problem of learning multi-dimensional Gaussian Mixture Models (GMMs), with a specific focus on model order selection and efficient mixing distribution estimation. We first establish an information-theoretic lower bound on the critical sample complexity required for reliable model selection. More specifically, we show that distinguishing a $k$-component mixture from a simpler model necessitates a sample size scaling of $\Omega(\Delta^{-(4k-4)})$. We then propose a thresholding-based estimation algorithm that evaluates the spectral gap of an empirical covariance matrix constructed from random Fourier measurement vectors. This parameter-free estimator operates with an efficient time complexity of $\mathcal{O}(k^2 n)$, scaling linearly with the sample size. We demonstrate that the sample complexity of our method matches the established lower bound, confirming its minimax optimality with respect to the component separation distance $\Delta$. Conditioned on the estimated model order, we subsequently introduce a gradient-based minimization method for parameter estimation. To effectively navigate the non-convex objective landscape, we employ a data-driven, score-based initialization strategy that guarantees rapid convergence. We prove that this method achieves the optimal parametric convergence rate of $\mathcal{O}_p(n^{-1/2})$ for estimating the component means. To enhance the algorithm's efficiency in high-dimensional regimes where the ambient dimension exceeds the number of mixture components (i.e., $d > k$), we integrate principal component analysis (PCA) for dimension reduction. Numerical experiments demonstrate that our Fourier-based algorithmic framework outperforms conventional Expectation-Maximization (EM) methods in both estimation accuracy and computational time.

#6 A two-step sequential approach for hyperparameter selection in finite context models

著者: Jos\'e Contente, Ana Martins, Armando J. Pinho, S\'onia Gouveia

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.19736

要約:
Finite-context models (FCMs) are widely used for compressing symbolic sequences such as DNA, where predictive performance depends critically on the context length k and smoothing parameter {\alpha}. In practice, these hyperparameters are typically selected through exhaustive search, which is computationally expensive and scales poorly with model complexity. This paper proposes a statistically grounded two-step sequential approach for efficient hyperparameter selection in FCMs. The key idea is to decompose the joint optimization problem into two independent stages. First, the context length k is estimated using categorical serial dependence measures, including Cram\'er's {\nu}, Cohen's \k{appa} and partial mutual information (pami). Second, the smoothing parameter {\alpha} is estimated via maximum likelihood conditional on the selected context length k. Simulation experiments were conducted on synthetic symbolic sequences generated by FCMs across multiple (k, {\alpha}) configurations, considering a four-letter alphabet and different sample sizes. Results show that the dependence measures are substantially more sensitive to variations in k than in {\alpha}, supporting the sequential estimation strategy. As expected, the accuracy of the hyperparameter estimation improves with increasing sample size. Furthermore, the proposed method achieves compression performance comparable to exhaustive grid search in terms of average bitrate (bits per symbol), while substantially reducing computational cost. Overall, the results on simulated data show that the proposed sequential approach is a practical and computationally efficient alternative to exhaustive hyperparameter tuning in FCMs.

#7 Explainable cluster analysis: a bagging approach

著者: Federico Maria Quetti, Elena Ballante, Silvia Figini, Paolo Giudici

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.19840

要約:
A major limitation of clustering approaches is their lack of explainability: methods rarely provide insight into which features drive the grouping of similar observations. To address this limitation, we propose an ensemble-based clustering framework that integrates bagging and feature dropout to generate feature importance scores, in analogy with feature importance mechanisms in supervised random forests. By leveraging multiple bootstrap resampling schemes and aggregating the resulting partitions, the method improves stability and robustness of the cluster definition, particularly in small-sample or noisy settings. Feature importance is assessed through an information-theoretic approach: at each step, the mutual information between each feature and the estimated cluster labels is computed and weighted by a measure of clustering validity to emphasize well-formed partitions, before being aggregated into a final score. The method outputs both a consensus partition and a corresponding measure of feature importance, enabling a unified interpretation of clustering structure and variable relevance. Its effectiveness is demonstrated on multiple simulated and real-world datasets.

#8 Minimax Generalized Cross-Entropy

著者: Kartheek Bondugula, Santiago Mazuelas, Aritz P\'erez, Anqi Liu

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.19874

要約:
Loss functions play a central role in supervised classification. Cross-entropy (CE) is widely used, whereas the mean absolute error (MAE) loss can offer robustness but is difficult to optimize. Interpolating between the CE and MAE losses, generalized cross-entropy (GCE) has recently been introduced to provide a trade-off between optimization difficulty and robustness. Existing formulations of GCE result in a non-convex optimization over classification margins that is prone to underfitting, leading to poor performances with complex datasets. In this paper, we propose a minimax formulation of generalized cross-entropy (MGCE) that results in a convex optimization over classification margins. Moreover, we show that MGCEs can provide an upper bound on the classification error. The proposed bilevel convex optimization can be efficiently implemented using stochastic gradient computed via implicit differentiation. Using benchmark datasets, we show that MGCE achieves strong accuracy, faster convergence, and better calibration, especially in the presence of label noise.

#9 Deep Autocorrelation Modeling for Time-Series Forecasting: Progress and Prospects

著者: Hao Wang, Licheng Pan, Qingsong Wen, Jialin Yu, Zhichao Chen, Chunyuan Zheng, Xiaoxi Li, Zhixuan Chu, Chao Xu, Mingming Gong, Haoxuan Li, Yuan Lu, Zhouchen Lin, Philip Torr, Yan Liu

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.19899

要約:
Autocorrelation is a defining characteristic of time-series data, where each observation is statistically dependent on its predecessors. In the context of deep time-series forecasting, autocorrelation arises in both the input history and the label sequences, presenting two central research challenges: (1) designing neural architectures that model autocorrelation in history sequences, and (2) devising learning objectives that model autocorrelation in label sequences. Recent studies have made strides in tackling these challenges, but a systematic survey examining both aspects remains lacking. To bridge this gap, this paper provides a comprehensive review of deep time-series forecasting from the perspective of autocorrelation modeling. In contrast to existing surveys, this work makes two distinctive contributions. First, it proposes a novel taxonomy that encompasses recent literature on both model architectures and learning objectives -- whereas prior surveys neglect or inadequately discuss the latter aspect. Second, it offers a thorough analysis of the motivations, insights, and progression of the surveyed literature from a unified, autocorrelation-centric perspective, providing a holistic overview of the evolution of deep time-series forecasting. The full list of papers and resources is available at https://github.com/Master-PLC/Awesome-TSF-Papers.

#10 Graph-Informed Adversarial Modeling: Infimal Subadditivity of Interpolative Divergences

著者: Panagiota Birmpa (Heriot--Watt University, Maxwell Institute for Mathematical Sciences), Eric Joseph Hall (Heriot--Watt University, Maxwell Institute for Mathematical Sciences)

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.20025

要約:
We study adversarial learning when the target distribution factorizes according to a known Bayesian network. For interpolative divergences, including $(f,\Gamma)$-divergences, we prove a new infimal subadditivity principle showing that, under suitable conditions, a global variational discrepancy is controlled by an average of family-level discrepancies aligned with the graph. In an additive regime, this surrogate is exact. This provides a variational justification for replacing a graph-agnostic GAN with a monolithic discriminator by a graph-informed GAN with localized family-level discriminators. The result does not require the optimizer itself to factorize according to the graph. We also obtain parallel results for integral probability metrics and proximal optimal transport divergences, identify natural discriminator classes for which the theory applies, and present experiments showing improved stability and structural recovery relative to graph-agnostic baselines.

#11 A Visualization for Comparative Analysis of Regression Models

著者: Nassime Mountasir (ICube), Baptiste Lafabregue (ICube), Bruno Albert (ICube), Nicolas Lachiche (ICube)

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.19291

要約:
As regression is a widely studied problem, many methods have been proposed to solve it, each of them often requiring setting different hyper-parameters. Therefore, selecting the proper method for a given application may be very difficult and relies on comparing their performances. Performance is usually measured using various metrics such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), or R-squared (R${}^2$). These metrics provide a numerical summary of predictive accuracy by quantifying the difference between predicted and actual values. However, while these metrics are widely used in the literature for summarizing model performance and useful to distinguish between models performing poorly and well, they often aggregate too much information. This article addresses these limitations by introducing a novel visualization approach that highlights key aspects of regression model performance. The proposed method builds upon three main contributions: (1) considering the residuals in a 2D space, which allows for simultaneous evaluation of errors from two models, (2) leveraging the Mahalanobis distance to account for correlations and differences in scale within the data, and (3) employing a colormap to visualize the percentile-based distribution of errors, making it easier to identify dense regions and outliers. By graphically representing the distribution of errors and their correlations, this approach provides a more detailed and comprehensive view of model performance, enabling users to uncover patterns that traditional aggregate metrics may obscure. The proposed visualization method facilitates a deeper understanding of regression model performance differences and error distributions, enhancing the evaluation and comparison process.

#12 FalconBC: Flow matching for Amortized inference of Latent-CONditioned physiologic Boundary Conditions

著者: Chloe H. Choi, Alison L. Marsden, Daniele E. Schiavazzi

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.19331

要約:
Boundary condition tuning is a fundamental step in patient-specific cardiovascular modeling. Despite an increase in offline training cost, recent methods in data-driven variational inference can efficiently estimate the joint posterior distribution of boundary conditions, with amortization of training efforts over clinical targets. However, even the most modern approaches fall short in two important scenarios: open-loop models with known mean flow and assumed waveform shapes, and anatomies affected by vascular lesions where segmentation influences the reachability of pressure or flow split targets. In both cases, boundary conditions cannot be tuned in isolation. We introduce a general amortized inference framework based on probabilistic flow that treats clinical targets, inflow features, and point cloud embeddings of patient-specific anatomies as either conditioning variables or quantities to be jointly estimated. We demonstrate the approach on two patient-specific models: an aorto-iliac bifurcation with varying stenosis locations and severity, and a coronary arterial tree.

#13 Alternating Diffusion for Proximal Sampling with Zeroth Order Queries

diffusion

著者: Hirohane Takagi, Atsushi Nitanda

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.19633

要約:
This work introduces a new approximate proximal sampler that operates solely with zeroth-order information of the potential function. Prior theoretical analyses have revealed that proximal sampling corresponds to alternating forward and backward iterations of the heat flow. The backward step was originally implemented by rejection sampling, whereas we directly simulate the dynamics. Unlike diffusion-based sampling methods that estimate scores via learned models or by invoking auxiliary samplers, our method treats the intermediate particle distribution as a Gaussian mixture, thereby yielding a Monte Carlo score estimator from directly samplable distributions. Theoretically, when the score estimation error is sufficiently controlled, our method inherits the exponential convergence of proximal sampling under isoperimetric conditions on the target distribution. In practice, the algorithm avoids rejection sampling, permits flexible step sizes, and runs with a deterministic runtime budget. Numerical experiments demonstrate that our approach converges rapidly to the target distribution, driven by interactions among multiple particles and by exploiting parallel computation.

#14 Heavy-Tailed and Long-Range Dependent Noise in Stochastic Approximation: A Finite-Time Analysis

著者: Siddharth Chandak, Anuj Yadav, Ayfer Ozgur, Nicholas Bambos

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.19648

要約:
Stochastic approximation (SA) is a fundamental iterative framework with broad applications in reinforcement learning and optimization. Classical analyses typically rely on martingale difference or Markov noise with bounded second moments, but many practical settings, including finance and communications, frequently encounter heavy-tailed and long-range dependent (LRD) noise. In this work, we study SA for finding the root of a strongly monotone operator under these non-classical noise models. We establish the first finite-time moment bounds in both settings, providing explicit convergence rates that quantify the impact of heavy tails and temporal dependence. Our analysis employs a noise-averaging argument that regularizes the impact of noise without modifying the iteration. Finally, we apply our general framework to stochastic gradient descent (SGD) and gradient play, and corroborate our finite-time analysis through numerical experiments.

#15 Regularity of Solutions to Beckmann's Parametric Optimal Transport

著者: Hanno Gottschalk, Tobias J. Riedlinger

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.19755

要約:
Beckmann's problem in optimal transport minimizes the total squared flux in a continuous transport problem from a source to a target distribution. In this article, the regularity theory for solutions to Beckmann's problem in optimal transport is developed utilizing an unconstrained Lagrangian formulation and solving the variational first order optimality conditions. It turns out that the Lagrangian multiplier that enforces Beckmann's divergence constraint fulfills a Poisson equation and the flux vector field is obtained as the potential's gradient. Utilizing Schauder estimates from elliptic regularity theory, the exact H\"older regularity of the potential, the flux and the flow generating is derived on the basis of H\"older regularity of source and target densities on a bounded, regular domain. If the target distribution depends on parameters, as is the case in conditional (``promptable'') generative learning, we provide sufficient conditions for separate and joint H\"older continuity of the resulting vector field in the parameter and the data dimension. Following a recent result by Belomnestny et al., one can thus approximate such vector fields with deep ReQu neural networks in C^(k,alpha)-H\"older norm. We also show that this approach generalizes to other probability paths, like Fisher-Rao gradient flows.

#16 Scalable Learning of Multivariate Distributions via Coresets

著者: Zeyu Ding, Katja Ickstadt, Nadja Klein, Alexander Munteanu, Simon Omlor

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.19792

要約:
Efficient and scalable non-parametric or semi-parametric regression analysis and density estimation are of crucial importance to the fields of statistics and machine learning. However, available methods are limited in their ability to handle large-scale data. We address this issue by developing a novel coreset construction for multivariate conditional transformation models (MCTMs) to enhance their scalability and training efficiency. To the best of our knowledge, these are the first coresets for semi-parametric distributional models. Our approach yields substantial data reduction via importance sampling. It ensures with high probability that the log-likelihood remains within multiplicative error bounds of $(1\pm\varepsilon)$ and thereby maintains statistical model accuracy. Compared to conventional full-parametric models, where coresets have been incorporated before, our semi-parametric approach exhibits enhanced adaptability, particularly in scenarios where complex distributions and non-linear relationships are present, but not fully understood. To address numerical problems associated with normalizing logarithmic terms, we follow a geometric approximation based on the convex hull of input data. This ensures feasible, stable, and accurate inference in scenarios involving large amounts of data. Numerical experiments demonstrate substantially improved computational efficiency when handling large and complex datasets, thus laying the foundation for a broad range of applications within the statistics and machine learning communities.

#17 Uncertainty Quantification Via the Posterior Predictive Variance

著者: Sanjay Chaudhuri, Dean Dustin, Bertrand Clarke

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.19804

要約:
We use the law of total variance to generate multiple expansions for the posterior predictive variance. These expansions are sums of terms involving conditional expectations and conditional variances and provide a quantification of the sources of predictive uncertainty. Since the posterior predictive variance is fixed given the model, it represents a constant quantity that is conserved over these expansions. The terms in the expansions can be assessed in absolute or relative sense to understand the main contributors to the length of prediction intervals. We quantify the term-wise uncertainty across expansions varying in the number of terms and the order of conditionates. In particular, given that a specific term in one expansion is small or zero, we identify the other terms in other expansions that must also be small or zero. We illustrate this approach to predictive model assessment in several well-known models.

#18 Two-Time-Scale Learning Dynamics: A Population View of Neural Network Training

著者: Giacomo Borghi, Hyesung Im, Lorenzo Pareschi

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.19808

要約:
Population-based learning paradigms, including evolutionary strategies, Population-Based Training (PBT), and recent model-merging methods, combine fast within-model optimisation with slower population-level adaptation. Despite their empirical success, a general mathematical description of the resulting collective training dynamics remains incomplete. We introduce a theoretical framework for neural network training based on two-time-scale population dynamics. We model a population of neural networks as an interacting agent system in which network parameters evolve through fast noisy gradient updates of SGD/Langevin type, while hyperparameters evolve through slower selection--mutation dynamics. We prove the large-population limit for the joint distribution of parameters and hyperparameters and, under strong time-scale separation, derive a selection--mutation equation for the hyperparameter density. For each fixed hyperparameter, the fast parameter dynamics relaxes to a Boltzmann--Gibbs measure, inducing an effective fitness for the slow evolution. The averaged dynamics connects population-based learning with bilevel optimisation and classical replicator--mutator models, yields conditions under which the population mean moves toward the fittest hyperparameter, and clarifies the role of noise and diversity in balancing optimisation and exploration. Numerical experiments illustrate both the large-population regime and the reduced two-time-scale dynamics, and indicate that access to the effective fitness, either in closed form or through population-level estimation, can improve population-level updates.

#19 A Federated Many-to-One Hopfield model for associative Neural Networks

著者: Andrea Alessandrelli, Fabrizio Durante, Andrea Ladiana, Andrea Lepre

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.19902

要約:
Federated learning enables collaborative training without sharing raw data, but struggles under client heterogeneity and streaming distribution shifts, where drift and novel data can impair convergence and cause forgetting. We propose a federated associative-memory framework that learns shared archetypes in heterogeneous, continual settings, where client data are independent but not necessarily balanced. Each client encodes its experience as a low-rank Hebbian operator, sent to a central server for aggregation and factorization into global archetypes. This approach preserves privacy, avoids centralized replay buffers, and is robust to small, noisy, or evolving datasets. We cast aggregation as a low-rank-plus-noise spectral inference problem, deriving theoretical thresholds for detectability and retrieval robustness. An entropy-based controller balances stability and plasticity in streaming regimes. Experiments with heterogeneous clients, drift, and novelty show improved global archetype reconstruction and associative retrieval, supporting the spectral view of federated consolidation.

#20 The monotonicity of the Franz-Parisi potential is equivalent with Low-degree MMSE lower bounds

著者: Konstantinos Tsirkas, Leda Wang, Ilias Zadik

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.20070

要約:
Over the last decades, two distinct approaches have been instrumental to our understanding of the computational complexity of statistical estimation. The statistical physics literature predicts algorithmic hardness through local stability and monotonicity properties of the Franz--Parisi (FP) potential \cite{franz1995recipes,franz1997phase}, while the mathematically rigorous literature characterizes hardness via the limitations of restricted algorithmic classes, most notably low-degree polynomial estimators \cite{hopkins2017efficient}. For many inference models, these two perspectives yield strikingly consistent predictions, giving rise to a long-standing open problem of establishing a precise mathematical relationship between them. In this work, we show that for estimation problems the power of low-degree polynomials is equivalent to the monotonicity of the annealed FP potential for a broad family of Gaussian additive models (GAMs) with signal-to-noise ratio $\lambda$. In particular, subject to a low-degree conjecture for GAMs, our results imply that the polynomial-time limits of these models are directly implied by the monotonicity of the annealed FP potential, in conceptual agreement with predictions from the physics literature dating back to the 1990s.

#21 Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD

diffusion

著者: Emiel Hoogeboom, David Ruhe, Jonathan Heek, Thomas Mensink, Tim Salimans

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.20155

要約:
It is currently difficult to distill discrete diffusion models. In contrast, continuous diffusion literature has many distillation approaches methods that can reduce sampling steps to a handful. Our method, Discrete Moment Matching Distillation (D-MMD), leverages ideas that have been highly successful in the continuous domain. Whereas previous discrete distillation methods collapse, D-MMD maintains high quality and diversity (given sufficient sampling steps). This is demonstrated on both text and image datasets. Moreover, the newly distilled generators can outperform their teachers.

#22 Kolmogorov-Arnold causal generative models

著者: Alejandro Almod\'ovar, Mar Elizo, Patricia A. Apell\'aniz, Santiago Zazo, Juan Parras

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.20184

要約:
Causal generative models provide a principled framework for answering observational, interventional, and counterfactual queries from observational data. However, many deep causal models rely on highly expressive architectures with opaque mechanisms, limiting auditability in high-stakes domains. We propose KaCGM, a causal generative model for mixed-type tabular data where each structural equation is parameterized by a Kolmogorov--Arnold Network (KAN). This decomposition enables direct inspection of learned causal mechanisms, including symbolic approximations and visualization of parent--child relationships, while preserving query-agnostic generative semantics. We introduce a validation pipeline based on distributional matching and independence diagnostics of inferred exogenous variables, allowing assessment using observational data alone. Experiments on synthetic and semi-synthetic benchmarks show competitive performance against state-of-the-art methods. A real-world cardiovascular case study further demonstrates the extraction of simplified structural equations and interpretable causal effects. These results suggest that expressive causal generative modeling and functional transparency can be achieved jointly, supporting trustworthy deployment in tabular decision-making settings. Code: https://github.com/aalmodovares/kacgm

#23 A new paradigm for global sensitivity analysis

著者: Gildas Mazo (MaIAGE)

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2409.06271

要約:
It is well-known that Sobol indices, which count among the most popular sensitivity indices, are based on the Sobol decomposition. Here we challenge this construction by redefining Sobol indices without the Sobol decomposition. In fact, we show that Sobol indices are a particular instance of a more general concept which we call sensitivity measures. A sensitivity measure of a system taking inputs and returning outputs is a set function that is null at a subset of inputs if and only if, with probability one, the output actually does not depend on those inputs. A sensitivity measure evaluated at the whole set of inputs represents the uncertainty about the output. We show that measuring sensitivity to a particular subset is akin to measuring the expected output's uncertainty conditionally on the fact that the inputs belonging to that subset have been fixed to random values. By considering all of the possible combinations of inputs, sensitivity measures induce an implicit symmetric factorial experiment with two levels, the factorial effects of which can be calculated. This new paradigm generalizes many known sensitivity indices, can create new ones, and defines interaction effects independently of the choice of the sensitivity measure. No assumption about the distribution of the inputs is required.

#24 Learning Representations for Independence Testing

著者: Nathaniel Xu, Feng Liu, Danica J. Sutherland

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2409.06890

要約:
Many tools exist to detect dependence between random variables, a core question across a wide range of machine learning, statistical, and scientific endeavors. Although several statistical tests guarantee eventual detection of any dependence with enough samples, standard tests may require an exorbitant amount of samples for detecting subtle dependencies between high-dimensional random variables with complex distributions. In this work, we study two related ways to learn powerful independence tests. First, we show how to construct powerful statistical tests with finite-sample validity by using variational estimators of mutual information, such as the InfoNCE or NWJ estimators. Second, we establish a close connection between these variational mutual information-based tests and tests based on the Hilbert-Schmidt Independence Criterion (HSIC); in particular, learning a variational bound (typically parameterized by a deep network) for mutual information is closely related to learning a kernel for HSIC. Finally, we show how to, rather than selecting a representation to maximize the statistic itself, select a representation which can maximize the power of a test, in either setting; we term the former case a Neural Dependency Statistic (NDS). While HSIC power optimization has been recently considered in the literature, we correct some important misconceptions and expand to considering deep kernels. In our experiments, while all approaches can yield powerful tests with exact level control, optimized HSIC tests generally outperform the other approaches on difficult problems of detecting structured dependence.

#25 Unsupervised Feature Selection via Robust Autoencoder and Adaptive Graph Learning

著者: Feng Yu, MD Saifur Rahman Mazumder, Ying Su, Oscar Contreras Velasco

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2512.18720

要約:
Effective feature selection is essential for high-dimensional data analysis and machine learning. Unsupervised feature selection (UFS) aims to simultaneously cluster data and identify the most discriminative features. Most existing UFS methods linearly project features into a pseudo-label space for clustering, but they suffer from two critical limitations: (1) an oversimplified linear mapping that fails to capture complex feature relationships, and (2) an assumption of uniform cluster distributions, ignoring outliers prevalent in real-world data. To address these issues, we propose the Robust Autoencoder-based Unsupervised Feature Selection (RAEUFS) model, which leverages a deep autoencoder to learn nonlinear feature representations while inherently improving robustness to outliers. We further develop an efficient optimization algorithm for RAEUFS. Extensive experiments demonstrate that our method outperforms state-of-the-art UFS approaches in both clean and outlier-contaminated data settings.

#26 Learnability with Partial Labels and Adaptive Nearest Neighbors

著者: Nicolas A. Errandonea, Santiago Mazuelas, Jose A. Lozano, Sanjoy Dasgupta

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.15781

要約:
Prior work on partial labels learning (PLL) has shown that learning is possible even when each instance is associated with a bag of labels, rather than a single accurate but costly label. However, the necessary conditions for learning with partial labels remain unclear, and existing PLL methods are effective only in specific scenarios. In this work, we mathematically characterize the settings in which PLL is feasible. In addition, we present PL A-$k$NN, an adaptive nearest-neighbors algorithm for PLL that is effective in general scenarios and enjoys strong performance guarantees. Experimental results corroborate that PL A-$k$NN can outperform state-of-the-art methods in general PLL scenarios.

#27 In-and-Out: Algorithmic Diffusion for Sampling Convex Bodies

diffusion

著者: Yunbum Kook, Santosh S. Vempala, Matthew S. Zhang

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2405.01425

要約:
We present a new random walk for uniformly sampling high-dimensional convex bodies. It achieves state-of-the-art runtime complexity with stronger guarantees on the output than previously known, namely in R\'enyi divergence (which implies TV, $\mathcal{W}_2$, KL, $\chi^2$). The proof departs from known approaches for polytime algorithms for the problem -- we utilize a stochastic diffusion perspective to show contraction to the target distribution with the rate of convergence determined by functional isoperimetric constants of the target distribution.

#28 End-to-end guarantees for indirect data-driven control of bilinear systems with finite stochastic data

著者: Nicolas Chatzikiriakos, Robin Str\"asser, Frank Allg\"ower, Andrea Iannelli

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2409.18010

要約:
In this paper we propose an end-to-end algorithm for indirect data-driven control for bilinear systems with stability guarantees. We consider the case where the collected i.i.d. data is affected by probabilistic noise with possibly unbounded support and leverage tools from statistical learning theory to derive finite sample identification error bounds. To this end, we solve the bilinear identification problem by solving a set of linear and affine identification problems, by a particular choice of a control input during the data collection phase. We provide a priori as well as data-dependent finite sample identification error bounds on the individual matrices as well as ellipsoidal bounds, both of which are structurally suitable for control. Further, we integrate the structure of the derived identification error bounds in a robust controller design to obtain an exponentially stable closed-loop. By means of an extensive numerical study we showcase the interplay between the controller design and the derived identification error bounds. Moreover, we note appealing connections of our results to indirect data-driven control of general nonlinear systems through Koopman operator theory and discuss how our results may be applied in this setup.

#29 Simulation-based Inference with the Python Package sbijax

著者: Simon Dirmeier, Antonietta Mira, Carlo Albert

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2409.19435

要約:
Neural simulation-based inference (SBI) describes an emerging family of methods for Bayesian inference with intractable likelihood functions that use neural networks as surrogate models. Here we introduce sbijax, a Python package that implements a wide variety of state-of-the-art methods in neural simulation-based inference using a user-friendly programming interface. sbijax offers high-level functionality to quickly construct SBI estimators, and compute and visualize posterior distributions with only a few lines of code. In addition, the package provides functionality for conventional approximate Bayesian computation, to compute model diagnostics, and to automatically estimate summary statistics. By virtue of being entirely written in JAX, sbijax is extremely computationally efficient, allowing rapid training of neural networks and executing code automatically in parallel on both CPU and GPU.

#30 Automatic Debiased Machine Learning for Smooth Functionals of Nonparametric M-Estimands

著者: Lars van der Laan, Aurelien Bibaut, Nathan Kallus, Alex Luedtke

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2501.11868

要約:
We develop a unified framework for automatic debiased machine learning (autoDML) for inference on a broad class of statistical parameters. The framework applies to any smooth functional of a nonparametric M-estimand, defined as the minimizer of a population risk over an infinite-dimensional linear space. Examples include counterfactual regression, quantile, and survival functions, as well as conditional average treatment effects. Rather than requiring manual derivation of influence functions, our approach automates the construction of debiased estimators using three ingredients: the gradient and Hessian of the loss function and a linear approximation of the target functional. Estimation reduces to solving two risk minimization problems, one for the M-estimand and one for a Riesz representer. The framework accommodates Neyman-orthogonal loss functions that depend on nuisance parameters and extends to vector-valued M-estimands through joint risk minimization. We characterize the efficient influence function and construct efficient autoDML estimators via one-step correction, targeted minimum loss estimation, and sieve-based plug-in methods. Under quadratic risk, these estimators satisfy double robustness for linear functionals. We further show that they are robust to mild misspecification of the M-estimand model, incurring only second-order bias. We illustrate the method by estimating long-term survival probabilities under a semiparametric two-parameter beta-geometric failure model.

#31 Flow-based Conformal Prediction for Multi-dimensional Time Series

著者: Junghwan Lee, Chen Xu, Yao Xie

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2502.05709

要約:
Time series prediction underpins a broad range of downstream tasks across many scientific domains. Recent advances and increasing adoption of black-box machine learning models for time series prediction highlight the critical need for uncertainty quantification. While conformal prediction has gained attention as a reliable uncertainty quantification method, conformal prediction for time series faces two key challenges: (1) \textbf{leveraging correlations in observations and non-conformity scores to overcome the exchangeability assumption}, and (2) \textbf{constructing prediction sets for multi-dimensional outcomes}. To address these challenges, we propose a novel conformal prediction method for time series using flow with classifier-free guidance. We provide coverage guarantees by establishing exact non-asymptotic marginal coverage and a finite-sample bound on conditional coverage for the proposed method. Evaluations on real-world time series datasets demonstrate that our method constructs significantly smaller prediction sets than existing conformal prediction methods, maintaining target coverage.

#32 Interpretable Early Warnings using Machine Learning in an Online Game-experiment

著者: Guillaume Falmagne, Anna B. Stephenson, Simon A. Levin

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2502.09880

要約:
Stemming from physics and later applied to other fields such as ecology, the theory of critical transitions suggests that some regime shifts are preceded by statistical early warning signals. Reddit's r/place experiment, a large-scale social game, provides a unique opportunity to test these signals consistently across thousands of subsystems undergoing critical transitions. In r/place, millions of users collaboratively created ''compositions'', or pixel-art drawings, in which transitions occur when one composition rapidly replaces another. We develop a machine-learning-based early warning system that combines the predictive power of multiple system-specific time series via gradient-boosted decision trees with memory-retaining features. Our method significantly outperforms standard early warning indicators. Trained on the 2022 r/place data, our algorithm detects half of the transitions occurring within 20 min at a false positive rate of just 3.6%. Its performance remains robust when tested on the 2023 r/place event, demonstrating generalizability across different contexts. Using SHapley Additive exPlanations (SHAP) for interpreting the predictions, we investigate the underlying drivers of warnings, which could be relevant to other complex systems, especially online social systems. We reveal an interplay of patterns preceding transitions, such as critical slowing down or speeding up, a lack of innovation or coordination, turbulent histories, and a lack of image complexity. These findings show the potential of machine learning indicators in socio-ecological systems for predicting regime shifts and understanding their dynamics.

#33 Does Weak-to-strong Generalization Happen under Spurious Correlations?

著者: Chenruo Liu, Yijun Dong, Qi Lei

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2509.24005

要約:
We initiate a unified theoretical and algorithmic study of a key problem in weak-to-strong (W2S) generalization: when fine-tuning a strong pre-trained student with pseudolabels from a weaker teacher on a downstream task with spurious correlations, does W2S happen, and how to improve it upon failures? We consider two sources of spurious correlations caused by group imbalance: (i) a weak teacher fine-tuned on group-imbalanced labeled data with a minority group of fraction $\eta_\ell$, and (ii) a group-imbalanced unlabeled set pseudolabeled by the teacher with a minority group of fraction $\eta_u$. Theoretically, a precise characterization of W2S gain at the proportional asymptotic limit shows that W2S always happens with sufficient pseudolabels when $\eta_u = \eta_\ell$ but may fail when $\eta_u \ne \eta_\ell$, where W2S gain diminishes as $(\eta_u - \eta_\ell)^2$ increases. Our theory is corroborated by extensive experiments on various spurious correlation benchmarks and teacher-student pairs. To boost W2S performance upon failures, we further propose a simple, effective algorithmic remedy that retrains the strong student on its high-confidence data subset after W2S fine-tuning. Our algorithm is group-label-free and achieves consistent, substantial improvements over vanilla W2S fine-tuning.

#34 DAPS++: Rethinking Diffusion Inverse Problems with Decoupled Posterior Annealing

diffusion

著者: Hao Chen, Renzheng Zhang, Scott S. Howard

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2511.17038

要約:
From a Bayesian perspective, score-based diffusion solves inverse problems through joint inference, embedding the likelihood with the prior to guide the sampling process. However, this formulation fails to explain its practical behavior: the prior offers limited guidance, while reconstruction is largely driven by the measurement-consistency term, leading to an inference process that is effectively decoupled from the diffusion dynamics. We show that the diffusion prior in these solvers functions primarily as a warm initializer that places estimates near the data manifold, while reconstruction is driven almost entirely by measurement consistency. Based on this observation, we introduce \textbf{DAPS++}, which fully decouples diffusion-based initialization from likelihood-driven refinement, allowing the likelihood term to guide inference more directly while maintaining numerical stability and providing insight into why unified diffusion trajectories remain effective in practice. By requiring fewer function evaluations (NFEs) and measurement-optimization steps, \textbf{DAPS++} achieves high computational efficiency and robust reconstruction performance across diverse image restoration tasks.

#35 A Task-Centric Theory for Iterative Self-Improvement with Easy-to-Hard Curricula

著者: Chenruo Liu, Yijun Dong, Yiqiu Shen, Qi Lei

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2602.10014

要約:
Iterative self-improvement fine-tunes an autoregressive large language model (LLM) on reward-verified outputs generated by the LLM itself. In contrast to the empirical success of self-improvement, the theoretical foundation of this generative, iterative procedure in a practical, finite-sample setting remains limited. We make progress toward this goal by modeling each round of self-improvement as maximum-likelihood fine-tuning on a reward-filtered distribution and deriving finite-sample guarantees for the expected reward. Our analysis reveals an explicit feedback loop where better models accept more data per iteration, supporting sustained self-improvement while explaining eventual saturation of such improvement. Adopting a task-centric view by considering reasoning tasks with multiple difficulty levels, we further prove quantifiable conditions on model initialization, task difficulty, and sample budget where easy-to-hard curricula provably achieve better guarantees than training on fixed mixtures of tasks. Our analyses are validated through Monte-Carlo simulations and experiments spanning a synthetic graph-based reasoning task and multiple standard mathematical reasoning benchmarks.

#36 An Auditable AI Agent Loop for Empirical Economics: A Case Study in Forecast Combination

agent

著者: Minchul Shin

公開日: Mon, 23 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17381

要約:
AI coding agents make empirical specification search fast and cheap, but they also widen hidden researcher degrees of freedom. Building on an open-source agent-loop architecture, this paper adapts that framework to an empirical economics workflow and adds a post-search holdout evaluation. In a forecast-combination illustration, multiple independent agent runs outperform standard benchmarks in the original rolling evaluation, but not all continue to do so on a post-search holdout. Logged search and holdout evaluation together make adaptive specification search more transparent and help distinguish robust improvements from sample-specific discoveries.

stat.ML updates on arXiv.org

📋 論文タイトル一覧