arXiv論文一覧 - stat.ML updates on arXiv.org

#1 Distributional Computational Graphs: Error Bounds

著者: Olof Hallqvist Elias, Michael Selby, Phillip Stanley-Marbell

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.16250

要約:
We study a general framework of distributional computational graphs: computational graphs whose inputs are probability distributions rather than point values. We analyze the discretization error that arises when these graphs are evaluated using finite approximations of continuous probability distributions. Such an approximation might be the result of representing a continuous real-valued distribution using a discrete representation or from constructing an empirical distribution from samples (or might be the output of another distributional computational graph). We establish non-asymptotic error bounds in terms of the Wasserstein-1 distance, without imposing structural assumptions on the computational graph.

#2 Perfect Clustering for Sparse Directed Stochastic Block Models

著者: Behzad Aalipur, Yichen Qin

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.16427

要約:
Exact recovery in stochastic block models (SBMs) is well understood in undirected settings, but remains considerably less developed for directed and sparse networks, particularly when the number of communities diverges. Spectral methods for directed SBMs often lack stability in asymmetric, low-degree regimes, and existing non-spectral approaches focus primarily on undirected or dense settings. We propose a fully non-spectral, two-stage procedure for community detection in sparse directed SBMs with potentially growing numbers of communities. The method first estimates the directed probability matrix using a neighborhood-smoothing scheme tailored to the asymmetric setting, and then applies $K$-means clustering to the estimated rows, thereby avoiding the limitations of eigen- or singular value decompositions in sparse, asymmetric networks. Our main theoretical contribution is a uniform row-wise concentration bound for the smoothed estimator, obtained through new arguments that control asymmetric neighborhoods and separate in- and out-degree effects. These results imply the exact recovery of all community labels with probability tending to one, under mild sparsity and separation conditions that allow both $\gamma_n \to 0$ and $K_n \to \infty$. Simulation studies, including highly directed, sparse, and non-symmetric block structures, demonstrate that the proposed procedure performs reliably in regimes where directed spectral and score-based methods deteriorate. To the best of our knowledge, this provides the first exact recovery guarantee for this class of non-spectral, neighborhood-smoothing methods in the sparse, directed setting.

#3 Efficient Learning of Stationary Diffusions with Stein-type Discrepancies

diffusion

著者: Fabian Bleile, Sarah Lumpp, Mathias Drton

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.16597

要約:
Learning a stationary diffusion amounts to estimating the parameters of a stochastic differential equation whose stationary distribution matches a target distribution. We build on the recently introduced kernel deviation from stationarity (KDS), which enforces stationarity by evaluating expectations of the diffusion's generator in a reproducing kernel Hilbert space. Leveraging the connection between KDS and Stein discrepancies, we introduce the Stein-type KDS (SKDS) as an alternative formulation. We prove that a vanishing SKDS guarantees alignment of the learned diffusion's stationary distribution with the target. Furthermore, under broad parametrizations, SKDS is convex with an empirical version that is $\epsilon$-quasiconvex with high probability. Empirically, learning with SKDS attains comparable accuracy to KDS while substantially reducing computational cost and yields improvements over the majority of competitive baselines.

#4 Towards Latent Diffusion Suitable For Text

diffusion

著者: Nesta Midavaine, Christian A. Naesseth, Grigory Bartosh

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.16220

要約:
Language diffusion models aim to improve sampling speed and coherence over autoregressive LLMs. We introduce Neural Flow Diffusion Models for language generation, an extension of NFDM that enables the straightforward application of continuous diffusion models to discrete state spaces. NFDM learns a multivariate forward process from the data, ensuring that the forward process and generative trajectory are a good fit for language modeling. Our model substantially reduces the likelihood gap with autoregressive models of the same size, while achieving sample quality comparable to that of previous latent diffusion models.

#5 Long-Term Probabilistic Forecast of Vegetation Conditions Using Climate Attributes in the Four Corners Region

著者: Erika McPhillips, Hyeongseong Lee, Xiangyu Xie, Kathy Baylis, Chris Funk, Mengyang Gu

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.16347

要約:
Weather conditions can drastically alter the state of crops and rangelands, and in turn, impact the incomes and food security of individuals worldwide. Satellite-based remote sensing offers an effective way to monitor vegetation and climate variables on regional and global scales. The annual peak Normalized Difference Vegetation Index (NDVI), derived from satellite observations, is closely associated with crop development, rangeland biomass, and vegetation growth. Although various machine learning methods have been developed to forecast NDVI over short time ranges, such as one-month-ahead predictions, long-term forecasting approaches, such as one-year-ahead predictions of vegetation conditions, are not yet available. To fill this gap, we develop a two-phase machine learning model to forecast the one-year-ahead peak NDVI over high-resolution grids, using the Four Corners region of the Southwestern United States as a testbed. In phase one, we identify informative climate attributes, including precipitation and maximum vapor pressure deficit, and develop the generalized parallel Gaussian process that captures the relationship between climate attributes and NDVI. In phase two, we forecast these climate attributes using historical data at least one year before the NDVI prediction month, which then serve as inputs to forecast the peak NDVI at each spatial grid. We developed open-source tools that outperform alternative methods for both gross NDVI and grid-based NDVI one-year forecasts, providing information that can help farmers and ranchers make actionable plans a year in advance.

#6 Conformal prediction for full and sparse polynomial chaos expansions

著者: A. Hatstatt, X. Zhu, B. Sudret

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.16636

要約:
Polynomial Chaos Expansions (PCEs) are widely recognized for their efficient computational performance in surrogate modeling. Yet, a robust framework to quantify local model errors is still lacking. While the local uncertainty of PCE prediction can be captured using bootstrap resampling, other methods offering more rigorous statistical guarantees are needed, especially in the context of small training datasets. Recently, conformal predictions have demonstrated strong potential in machine learning, providing statistically robust and model-agnostic prediction intervals. Due to its generality and versatility, conformal prediction is especially valuable, as it can be adapted to suit a variety of problems, making it a compelling choice for PCE-based surrogate models. In this contribution, we explore its application to PCE-based surrogate models. More precisely, we present the integration of two conformal prediction methods, namely the full conformal and the Jackknife+ approaches, into both full and sparse PCEs. For full PCEs, we introduce computational shortcuts inspired by the inherent structure of regression methods to optimize the implementation of both conformal methods. For sparse PCEs, we incorporate the two approaches with appropriate modifications to the inference strategy, thereby circumventing the non-symmetrical nature of the regression algorithm and ensuring valid prediction intervals. Our developments yield better-calibrated prediction intervals for both full and sparse PCEs, achieving superior coverage over existing approaches, such as the bootstrap, while maintaining a moderate computational cost.

#7 Kernel smoothing on manifolds

著者: Eunseong Bae, Wolfgang Polonik

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.16777

要約:
Under the assumption that data lie on a compact (unknown) manifold without boundary, we derive finite sample bounds for kernel smoothing and its (first and second) derivatives, and we establish asymptotic normality through Berry-Esseen type bounds. Special cases include kernel density estimation, kernel regression and the heat kernel signature. Connections to the graph Laplacian are also discussed.

#8 Parametric Mean-Field empirical Bayes in high-dimensional linear regression

著者: Seunghyun Lee, Nabarun Deb

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.16842

要約:
In this paper, we consider the problem of parametric empirical Bayes estimation of an i.i.d. prior in high-dimensional Bayesian linear regression, with random design. We obtain the asymptotic distribution of the variational Empirical Bayes (vEB) estimator, which approximately maximizes a variational lower bound of the intractable marginal likelihood. We characterize a sharp phase transition behavior for the vEB estimator -- namely that it is information theoretically optimal (in terms of limiting variance) up to $p=o(n^{2/3})$ while it suffers from a sub-optimal convergence rate in higher dimensions. In the first regime, i.e., when $p=o(n^{2/3})$, we show how the estimated prior can be calibrated to enable valid coordinate-wise and delocalized inference, both under the \emph{empirical Bayes posterior} and the oracle posterior. In the second regime, we propose a debiasing technique as a way to improve the performance of the vEB estimator beyond $p=o(n^{2/3})$. Extensive numerical experiments corroborate our theoretical findings.

#9 Multigrade Neural Network Approximation

著者: Shijun Zhang, Zuowei Shen, Yuesheng Xu

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.16884

要約:
We study multigrade deep learning (MGDL) as a principled framework for structured error refinement in deep neural networks. While the approximation power of neural networks is now relatively well understood, training very deep architectures remains challenging due to highly non-convex and often ill-conditioned optimization landscapes. In contrast, for relatively shallow networks, most notably one-hidden-layer $\texttt{ReLU}$ models, training admits convex reformulations with global guarantees, motivating learning paradigms that improve stability while scaling to depth. MGDL builds upon this insight by training deep networks grade by grade: previously learned grades are frozen, and each new residual block is trained solely to reduce the remaining approximation error, yielding an interpretable and stable hierarchical refinement process. We develop an operator-theoretic foundation for MGDL and prove that, for any continuous target function, there exists a fixed-width multigrade $\texttt{ReLU}$ scheme whose residuals decrease strictly across grades and converge uniformly to zero. To the best of our knowledge, this work provides the first rigorous theoretical guarantee that grade-wise training yields provable vanishing approximation error in deep networks. Numerical experiments further illustrate the theoretical results.

#10 FedSGM: A Unified Framework for Constraint Aware, Bidirectionally Compressed, Multi-Step Federated Optimization

著者: Antesh Upadhyay, Sang Bin Moon, Abolfazl Hashemi

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.16897

要約:
We introduce FedSGM, a unified framework for federated constrained optimization that addresses four major challenges in federated learning (FL): functional constraints, communication bottlenecks, local updates, and partial client participation. Building on the switching gradient method, FedSGM provides projection-free, primal-only updates, avoiding expensive dual-variable tuning or inner solvers. To handle communication limits, FedSGM incorporates bi-directional error feedback, correcting the bias introduced by compression while explicitly understanding the interaction between compression noise and multi-step local updates. We derive convergence guarantees showing that the averaged iterate achieves the canonical $\boldsymbol{\mathcal{O}}(1/\sqrt{T})$ rate, with additional high-probability bounds that decouple optimization progress from sampling noise due to partial participation. Additionally, we introduce a soft switching version of FedSGM to stabilize updates near the feasibility boundary. To our knowledge, FedSGM is the first framework to unify functional constraints, compression, multiple local updates, and partial client participation, establishing a theoretically grounded foundation for constrained federated learning. Finally, we validate the theoretical guarantees of FedSGM via experimentation on Neyman-Pearson classification and constrained Markov decision process (CMDP) tasks.

#11 Group-realizable multi-group learning by minimizing empirical risk

著者: Navid Ardeshir, Samuel Deng, Daniel Hsu, Jingwen Liu

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.16922

要約:
The sample complexity of multi-group learning is shown to improve in the group-realizable setting over the agnostic setting, even when the family of groups is infinite so long as it has finite VC dimension. The improved sample complexity is obtained by empirical risk minimization over the class of group-realizable concepts, which itself could have infinite VC dimension. Implementing this approach is also shown to be computationally intractable, and an alternative approach is suggested based on improper learning.

#12 A Scalable Measure of Loss Landscape Curvature for Analyzing the Training Dynamics of LLMs

著者: Dayal Singh Kalra, Jean-Christophe Gagnon-Audet, Andrey Gromov, Ishita Mediratta, Kelvin Niu, Alexander H Miller, Michael Shvartsman

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.16979

要約:
Understanding the curvature evolution of the loss landscape is fundamental to analyzing the training dynamics of neural networks. The most commonly studied measure, Hessian sharpness ($\lambda_{\max}^H$) -- the largest eigenvalue of the loss Hessian -- determines local training stability and interacts with the learning rate throughout training. Despite its significance in analyzing training dynamics, direct measurement of Hessian sharpness remains prohibitive for Large Language Models (LLMs) due to high computational cost. We analyze $\textit{critical sharpness}$ ($\lambda_c$), a computationally efficient measure requiring fewer than $10$ forward passes given the update direction $\Delta \mathbf{\theta}$. Critically, this measure captures well-documented Hessian sharpness phenomena, including progressive sharpening and Edge of Stability. Using this measure, we provide the first demonstration of these sharpness phenomena at scale, up to $7$B parameters, spanning both pre-training and mid-training of OLMo-2 models. We further introduce $\textit{relative critical sharpness}$ ($\lambda_c^{1\to 2}$), which quantifies the curvature of one loss landscape while optimizing another, to analyze the transition from pre-training to fine-tuning and guide data mixing strategies. Critical sharpness provides practitioners with a practical tool for diagnosing curvature dynamics and informing data composition choices at scale. More broadly, our work shows that scalable curvature measures can provide actionable insights for large-scale training.

#13 Bayesian Joint Additive Factor Models for Multiview Learning

著者: Niccolo Anceschi, Federico Ferrari, David B. Dunson, Himel Mallick

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2406.00778

要約:
It is increasingly common to collect data of multiple different types on the same set of samples. Our focus is on studying relationships between such multiview features and responses. A motivating application arises in the context of precision medicine where multi-omics data are collected to correlate with clinical outcomes. It is of interest to infer dependence within and across views while combining multimodal information to improve the prediction of outcomes. The signal-to-noise ratio can vary substantially across views, motivating more nuanced statistical tools beyond standard late and early fusion. This challenge comes with the need to preserve interpretability, select features, and obtain accurate uncertainty quantification. To address these challenges, we introduce two complementary factor regression models. A baseline Joint Factor Regression (\textsc{jfr}) captures combined variation across views via a single factor set, and a more nuanced Joint Additive FActor Regression (\textsc{jafar}) that decomposes variation into shared and view-specific components. For \textsc{jfr}, we use independent cumulative shrinkage process (\textsc{i-cusp}) priors, while for \textsc{jafar} we develop a dependent version (\textsc{d-cusp}) designed to ensure identifiability of the components. We develop Gibbs samplers that exploit the model structure and accommodate flexible feature and outcome distributions. Prediction of time-to-labor onset from immunome, metabolome, and proteome data illustrates performance gains against state-of-the-art competitors. Our open-source software (\texttt{R} package) is available at https://github.com/niccoloanceschi/jafar.

#14 Local minima of the empirical risk in high dimension: General theorems and convex examples

著者: Kiana Asgari, Andrea Montanari, Basil Saeed

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2502.01953

要約:
We consider a general model for high-dimensional empirical risk minimization whereby the data $\mathbf{x}_i$ are $d$-dimensional Gaussian vectors, the model is parametrized by $\mathbf{\Theta}\in\mathbb{R}^{d\times k}$, and the loss depends on the data via the projection $\mathbf{\Theta}^\mathsf{T}\mathbf{x}_i$. This setting covers as special cases classical statistics methods (e.g. multinomial regression and other generalized linear models), but also two-layer fully connected neural networks with $k$ hidden neurons. We use the Kac-Rice formula from Gaussian process theory to derive a bound on the expected number of local minima of this empirical risk, under the proportional asymptotics in which $n,d\to\infty$, with $n\asymp d$. Via Markov's inequality, this bound allows to determine the positions of these minimizers (with exponential deviation bounds) and hence derive sharp asymptotics on the estimation and prediction error. As a special case, we apply our characterization to convex losses. We show that our approach is tight and allows to prove previously conjectured results. In addition, we characterize the spectrum of the Hessian at the minimizer. A companion paper applies our general result to non-convex examples.

#15 Dynamic Pricing with Adversarially-Censored Demands

著者: Jianyu Xu, Yining Wang, Xi Chen, Yu-Xiang Wang

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2502.06168

要約:
We study an online dynamic pricing problem where the potential demand at each time period $t=1,2,\ldots, T$ is stochastic and dependent on the price. However, a perishable inventory is imposed at the beginning of each time $t$, censoring the potential demand if it exceeds the inventory level. To address this problem, we introduce a pricing algorithm based on the optimistic estimates of derivatives. We show that our algorithm achieves $\tilde{O}(\sqrt{T})$ optimal regret even with adversarial inventory series. Our findings advance the state-of-the-art in online decision-making problems with censored feedback, offering a theoretically optimal solution against adversarial observations.

#16 Kernel Embeddings and the Separation of Measure Phenomenon

著者: Leonardo V. Santoro, Kartik G. Waghmare, Victor M. Panaretos

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.04613

要約:
We prove that kernel covariance embeddings lead to information-theoretically perfect separation of distinct continuous probability distributions. In statistical terms, we establish that testing for the \emph{equality} of two non-atomic (Borel) probability measures on a locally compact Polish space is \emph{equivalent} to testing for the \emph{singularity} between two centered Gaussian measures on a reproducing kernel Hilbert space. The corresponding Gaussians are defined via the notion of kernel covariance embedding of a probability measure, and the Hilbert space is that generated by the embedding kernel. Distinguishing singular Gaussians is structurally simpler from an information-theoretic perspective than non-parametric two-sample testing, particularly in complex or high-dimensional domains. This is because singular Gaussians are supported on essentially separate and affine subspaces. Our proof leverages the classical Feldman-H\'{a}jek dichotomy, and shows that even a small perturbation of a continuous distribution will be maximally magnified through its Gaussian embedding. This ``separation of measure phenomenon'' appears to be a blessing of infinite dimensionality, by means of embedding, with the potential to inform the design of efficient inference tools in considerable generality. The elicitation of this phenomenon also appears to crystallize, in a precise and simple mathematical statement, a core mechanism underpinning the empirical effectiveness of kernel methods.

#17 Kernel-Based Nonparametric Tests For Shape Constraints

著者: Rohan Sen

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.16745

要約:
We propose a kernel-based nonparametric framework for mean-variance optimization that enables inference on economically motivated shape constraints in finance, including positivity, monotonicity, and convexity. Many central hypotheses in financial econometrics are naturally expressed as shape relations on latent functions (e.g., term premia, CAPM relations, and the pricing kernel), yet enforcing such constraints during estimation can mask economically meaningful violations; our approach therefore separates learning from validation by first estimating an unconstrained solution and then testing shape properties. We establish statistical properties of the regularized sample estimator and derive rigorous guarantees, including asymptotic consistency, a functional central limit theorem, and a finite-sample deviation bound achieving the Monte Carlo rate up to a regularization term. Building on these results, we construct a joint Wald-type statistic to test shape constraints on finite grids. An efficient algorithm based on a pivoted Cholesky factorization yields scalability to large datasets. Numerical studies, including an options-based asset-pricing application, illustrate the usefulness of the proposed method for evaluating monotonicity and convexity restrictions.

#18 Variance-Reduced Diffusion Sampling via Target Score Identity

diffusion

著者: Alois Duston, Tan Bui-Thanh

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.01594

要約:
We study variance reduction for score estimation and diffusion-based sampling in settings where the clean (target) score is available or can be approximated. Starting from the Target Score Identity (TSI), which expresses the noisy marginal score as a conditional expectation of the target score under the forward diffusion, we develop: (i) a plug-and-play nonparametric self-normalized importance sampling estimator compatible with standard reverse-time solvers, (ii) a variance-minimizing \emph{state- and time-dependent} blending rule between Tweedie-type and TSI estimators together with an anti-correlation analysis, (iii) a data-only extension based on locally fitted proxy scores, and (iv) a likelihood-tilting extension to Bayesian inverse problems. We also propose a \emph{Critic--Gate} distillation scheme that amortizes the state-dependent blending coefficient into a neural gate. Experiments on synthetic targets and PDE-governed inverse problems demonstrate improved sample quality for a fixed simulation budget.

#19 Mass Distribution versus Density Distribution in the Context of Clustering

著者: Kai Ming Ting, Ye Zhu, Hang Zhang, Tianrun Liang

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.10759

要約:
This paper investigates two fundamental descriptors of data, i.e., density distribution versus mass distribution, in the context of clustering. Density distribution has been the de facto descriptor of data distribution since the introduction of statistics. We show that density distribution has its fundamental limitation -- high-density bias, irrespective of the algorithms used to perform clustering. Existing density-based clustering algorithms have employed different algorithmic means to counter the effect of the high-density bias with some success, but the fundamental limitation of using density distribution remains an obstacle to discovering clusters of arbitrary shapes, sizes and densities. Using the mass distribution as a better foundation, we propose a new algorithm which maximizes the total mass of all clusters, called mass-maximization clustering (MMC). The algorithm can be easily changed to maximize the total density of all clusters in order to examine the fundamental limitation of using density distribution versus mass distribution. The key advantage of the MMC over the density-maximization clustering is that the maximization is conducted without a bias towards dense clusters.

#20 ProSub: Probabilistic Open-Set Semi-Supervised Learning with Subspace-Based Out-of-Distribution Detection

著者: Erik Wallin, Lennart Svensson, Fredrik Kahl, Lars Hammarstrand

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2407.11735

要約:
In open-set semi-supervised learning (OSSL), we consider unlabeled datasets that may contain unknown classes. Existing OSSL methods often use the softmax confidence for classifying data as in-distribution (ID) or out-of-distribution (OOD). Additionally, many works for OSSL rely on ad-hoc thresholds for ID/OOD classification, without considering the statistics of the problem. We propose a new score for ID/OOD classification based on angles in feature space between data and an ID subspace. Moreover, we propose an approach to estimate the conditional distributions of scores given ID or OOD data, enabling probabilistic predictions of data being ID or OOD. These components are put together in a framework for OSSL, termed ProSub, that is experimentally shown to reach SOTA performance on several benchmark problems. Our code is available at https://github.com/walline/prosub.

#21 Spectral decomposition-assisted multi-study factor analysis

著者: Lorenzo Mauri, Niccol\`o Anceschi, David B. Dunson

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2502.14600

要約:
This article focuses on covariance estimation for multi-study data. Popular approaches employ factor-analytic terms with shared and study-specific loadings that decompose the variance into (i) a shared low-rank component, (ii) study-specific low-rank components, and (iii) a diagonal term capturing idiosyncratic variability. Our proposed methodology estimates the latent factors via spectral decompositions, with a novel approach for separating shared and specific factors, and infers the factor loadings and residual variances via surrogate Bayesian regressions. The resulting posterior has a simple product form across outcomes, bypassing the need for Markov chain Monte Carlo sampling and facilitating parallelization. The proposed methodology has major advantages over current Bayesian competitors in terms of computational speed, scalability and stability while also having strong frequentist guarantees. The theory and methods also add to the rich literature on frequentist methods for factor models with shared and group-specific components of variation. The approximation error decreases as the sample size and the data dimension diverge, formalizing a blessing of dimensionality. We show favorable asymptotic properties, including central limit theorems for point estimators and posterior contraction, and excellent empirical performance in simulations. The methods are applied to integrate three studies on gene associations among immune cells.

#22 Clustered random forests with correlated data for optimal estimation and inference under potential covariate shift

著者: Elliot H. Young, Peter B\"uhlmann

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2503.12634

要約:
We develop Clustered Random Forests, a random forests algorithm for clustered data, arising from independent groups that exhibit within-cluster dependence. The leaf-wise predictions for each decision tree making up clustered random forests takes the form of a weighted least squares estimator, which leverage correlations between observations for improved prediction accuracy and tighter confidence intervals when performing inference. We show that approximately linear time algorithms exist for fitting classes of clustered random forests, matching the computational complexity of standard random forests. Further, we observe that the optimality of a clustered random forest, with regards to how optimal weights are chosen within this framework i.e. those that minimise mean squared prediction error, vary under covariate distribution shift. In light of this, we advocate weight estimation to be determined by a user-chosen covariate distribution, or test dataset of covariates, with respect to which optimal prediction or inference is desired. This highlights a key distinction between correlated and independent data with regards to optimality of nonparametric conditional mean estimation under covariate shift. We demonstrate our theoretical findings numerically in a number of simulated and real-world settings.

#23 Estimation of discrete distributions in relative entropy, and the deviations of the missing mass

著者: Jaouad Mourtada

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2504.21787

要約:
We study the problem of estimating a distribution over a finite alphabet from an i.i.d. sample, with accuracy measured in relative entropy (Kullback-Leibler divergence). While optimal bounds on the expected risk are known, high-probability guarantees remain less well-understood. First, we analyze the classical Laplace (add-one) estimator, obtaining matching upper and lower bounds on its performance and establishing its optimality among confidence-independent estimators. We then characterize the minimax-optimal high-probability risk and show that it is achieved by a simple confidence-dependent smoothing technique. Notably, the optimal non-asymptotic risk incurs an additional logarithmic factor compared to the ideal asymptotic rate. Next, motivated by regimes in which the alphabet size exceeds the sample size, we investigate methods that adapt to the sparsity of the underlying distribution. We introduce an estimator using data-dependent smoothing, for which we establish a high-probability risk bound depending on two effective sparsity parameters. As part of our analysis, we also derive a sharp high-probability upper bound on the missing mass.

#24 Bayesian Ensembling: Insights from Online Optimization and Empirical Bayes

著者: Daniel Waxman, Fernando Llorente, Petar M. Djuri\'c

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.15638

要約:
We revisit the classical problem of Bayesian ensembles and address the challenge of learning optimal combinations of Bayesian models in an online, continual learning setting. To this end, we reinterpret existing approaches such as Bayesian model averaging (BMA) and Bayesian stacking through a novel empirical Bayes lens, shedding new light on the limitations and pathologies of BMA. Further motivated by insights from online optimization, we propose Online Bayesian Stacking (OBS), a method that optimizes the log-score over predictive distributions to adaptively combine Bayesian models. A key contribution of our work is establishing a novel connection between OBS and portfolio selection, bridging Bayesian ensemble learning with a rich, well-studied theoretical framework that offers efficient algorithms and extensive regret analysis. We further clarify the relationship between OBS and online BMA, showing that they optimize related but distinct cost functions. Through theoretical analysis and empirical evaluation, we identify scenarios where OBS outperforms online BMA and provide principled methods and guidance on when practitioners should prefer one approach over the other.

#25 Statistical Analysis of Conditional Group Distributionally Robust Optimization with Cross-Entropy Loss

著者: Zijian Guo, Zhenyu Wang, Yifan Hu, Francis Bach

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2507.09905

要約:
In multi-source learning with discrete labels, distributional heterogeneity across domains poses a central challenge to developing predictive models that transfer reliably to unseen domains. We study multi-source unsupervised domain adaptation, where labeled data are available from multiple source domains and only unlabeled data are observed from the target domain. To address potential distribution shifts, we propose a novel Conditional Group Distributionally Robust Optimization (CG-DRO) framework that learns a classifier by minimizing the worst-case cross-entropy loss over the convex combinations of the conditional outcome distributions from sources domains. We develop an efficient Mirror Prox algorithm for solving the minimax problem and employ a double machine learning procedure to estimate the risk function, ensuring that errors in nuisance estimation contribute only at higher-order rates. We establish fast statistical convergence rates for the empirical CG-DRO estimator by constructing two surrogate minimax optimization problems that serve as theoretical bridges. A distinguishing challenge for CG-DRO is the emergence of nonstandard asymptotics: the empirical CG-DRO estimator may fail to converge to a standard limiting distribution due to boundary effects and system instability. To address this, we introduce a perturbation-based inference procedure that enables uniformly valid inference, including confidence interval construction and hypothesis testing.

#26 Differentiable Cyclic Causal Discovery Under Unmeasured Confounders

著者: Muralikrishnna G. Sethuraman, Faramarz Fekri

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2508.08450

要約:
Understanding causal relationships between variables is fundamental across scientific disciplines. Most causal discovery algorithms rely on two key assumptions: (i) all variables are observed, and (ii) the underlying causal graph is acyclic. While these assumptions simplify theoretical analysis, they are often violated in real-world systems, such as biological networks. Existing methods that account for confounders either assume linearity or struggle with scalability. To address these limitations, we propose DCCD-CONF, a novel framework for differentiable learning of nonlinear cyclic causal graphs in the presence of unmeasured confounders using interventional data. Our approach alternates between optimizing the graph structure and estimating the confounder distribution by maximizing the log-likelihood of the data. Through experiments on synthetic data and real-world gene perturbation datasets, we show that DCCD-CONF outperforms state-of-the-art methods in both causal graph recovery and confounder identification. Additionally, we also provide consistency guarantees for our framework, reinforcing its theoretical soundness.

#27 Flow Matching with Semidiscrete Couplings

著者: Alireza Mousavi-Hosseini, Stephen Y. Zhang, Michal Klein, Marco Cuturi

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.25519

要約:
Flow models parameterized as time-dependent velocity fields can generate data from noise by integrating an ODE. These models are often trained using flow matching, i.e. by sampling random pairs of noise and target points $(\mathbf{x}_0,\mathbf{x}_1)$ and ensuring that the velocity field is aligned, on average, with $\mathbf{x}_1-\mathbf{x}_0$ when evaluated along a segment linking $\mathbf{x}_0$ to $\mathbf{x}_1$. While these pairs are sampled independently by default, they can also be selected more carefully by matching batches of $n$ noise to $n$ target points using an optimal transport (OT) solver. Although promising in theory, the OT flow matching (OT-FM) approach is not widely used in practice. Zhang et al. (2025) pointed out recently that OT-FM truly starts paying off when the batch size $n$ grows significantly, which only a multi-GPU implementation of the Sinkhorn algorithm can handle. Unfortunately, the costs of running Sinkhorn can quickly balloon, requiring $O(n^2/\varepsilon^2)$ operations for every $n$ pairs used to fit the velocity field, where $\varepsilon$ is a regularization parameter that should be typically small to yield better results. To fulfill the theoretical promises of OT-FM, we propose to move away from batch-OT and rely instead on a semidiscrete formulation that leverages the fact that the target dataset distribution is usually of finite size $N$. The SD-OT problem is solved by estimating a dual potential vector using SGD; using that vector, freshly sampled noise vectors at train time can then be matched with data points at the cost of a maximum inner product search (MIPS). Semidiscrete FM (SD-FM) removes the quadratic dependency on $n/\varepsilon$ that bottlenecks OT-FM. SD-FM beats both FM and OT-FM on all training metrics and inference budget constraints, across multiple datasets, on unconditional/conditional generation, or when using mean-flow models.

#28 AutoSciDACT: Automated Scientific Discovery through Contrastive Embedding and Hypothesis Testing

著者: Samuel Bright-Thonney, Christina Reissel, Gaia Grosso, Nathaniel Woodward, Katya Govorkova, Andrzej Novak, Sang Eon Park, Eric Moreno, Philip Harris

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.21935

要約:
Novelty detection in large scientific datasets faces two key challenges: the noisy and high-dimensional nature of experimental data, and the necessity of making statistically robust statements about any observed outliers. While there is a wealth of literature on anomaly detection via dimensionality reduction, most methods do not produce outputs compatible with quantifiable claims of scientific discovery. In this work we directly address these challenges, presenting the first step towards a unified pipeline for novelty detection adapted for the rigorous statistical demands of science. We introduce AutoSciDACT (Automated Scientific Discovery with Anomalous Contrastive Testing), a general-purpose pipeline for detecting novelty in scientific data. AutoSciDACT begins by creating expressive low-dimensional data representations using a contrastive pre-training, leveraging the abundance of high-quality simulated data in many scientific domains alongside expertise that can guide principled data augmentation strategies. These compact embeddings then enable an extremely sensitive machine learning-based two-sample test using the New Physics Learning Machine (NPLM) framework, which identifies and statistically quantifies deviations in observed data relative to a reference distribution (null hypothesis). We perform experiments across a range of astronomical, physical, biological, image, and synthetic datasets, demonstrating strong sensitivity to small injections of anomalous data across all domains.

#29 Joint learning of a network of linear dynamical systems via total variation penalization

著者: Claire Donnat, Olga Klopp, Hemant Tyagi

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.18737

要約:
We consider the problem of joint estimation of the parameters of $m$ linear dynamical systems, given access to single realizations of their respective trajectories, each of length $T$. The linear systems are assumed to reside on the nodes of an undirected and connected graph $G = ([m], \mathcal{E})$, and the system matrices are assumed to either vary smoothly or exhibit small number of ``jumps'' across the edges. We consider a total variation penalized least-squares estimator and derive non-asymptotic bounds on the mean squared error (MSE) which hold with high probability. In particular, the bounds imply for certain choices of well connected $G$ that the MSE goes to zero as $m$ increases, even when $T$ is constant. The theoretical results are supported by extensive experiments on synthetic and real data.

#30 On Nonasymptotic Confidence Intervals for Treatment Effects in Randomized Experiments

著者: Ricardo J. Sandoval, Sivaraman Balakrishnan, Avi Feller, Michael I. Jordan, Ian Waudby-Smith

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.11744

要約:
We study nonasymptotic (finite-sample) confidence intervals for treatment effects in randomized experiments. In the existing literature, the effective sample sizes of nonasymptotic confidence intervals tend to be looser than the corresponding central-limit-theorem-based confidence intervals by a factor depending on the square root of the propensity score. We show that this performance gap can be closed, designing nonasymptotic confidence intervals that have the same effective sample size as their asymptotic counterparts. Our approach involves systematic exploitation of negative dependence or variance adaptivity (or both). We also show that the nonasymptotic rates that we achieve are unimprovable in an information-theoretic sense.

#31 Q-learning with Adjoint Matching

著者: Qiyang Li, Sergey Levine

公開日: Mon, 26 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.14234

要約:
We propose Q-learning with Adjoint Matching (QAM), a novel TD-based reinforcement learning (RL) algorithm that tackles a long-standing challenge in continuous-action RL: efficient optimization of an expressive diffusion or flow-matching policy with respect to a parameterized Q-function. Effective optimization requires exploiting the first-order information of the critic, but it is challenging to do so for flow or diffusion policies because direct gradient-based optimization via backpropagation through their multi-step denoising process is numerically unstable. Existing methods work around this either by only using the value and discarding the gradient information, or by relying on approximations that sacrifice policy expressivity or bias the learned policy. QAM sidesteps both of these challenges by leveraging adjoint matching, a recently proposed technique in generative modeling, which transforms the critic's action gradient to form a step-wise objective function that is free from unstable backpropagation, while providing an unbiased, expressive policy at the optimum. Combined with temporal-difference backup for critic learning, QAM consistently outperforms prior approaches on hard, sparse reward tasks in both offline and offline-to-online RL.

stat.ML updates on arXiv.org

📋 論文タイトル一覧