arXiv論文一覧 - stat.ML updates on arXiv.org

#1 Knowledge vs. Experience: Asymptotic Limits of Impatience in Edge Tenants

著者: Anthony Kiggundu, Bin Han, Hans D. Schotten

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.13763

要約:
We study how two information feeds, a closed-form Markov estimator of residual sojourn and an online trained actor-critic, affect reneging and jockeying in a dual M/M/1 system. Analytically, for unequal service rates and total-time patience, we show that total wait grows linearly so abandonment is inevitable and the probability of a successful jockey vanishes as the backlog approaches towards infinity. Furthermore, under a mild sub-linear error condition both information models yield the same asymptotic limits (robustness). We empirically validate these limits and quantify finite backlog differences. Our findings show that learned and analytic feeds produce different delays, reneging rates and transient jockeying behavior at practical sizes, but converge to the same asymptotic outcome implied by our theory. The results characterize when value-of-information matters (finite regimes) and when it does not (asymptotics), informing lightweight telemetry and decision-logic design for low-cost, jockeying-aware systems.

#2 Uncertainty-Calibrated Prediction of Randomly-Timed Biomarker Trajectories with Conformal Bands

著者: Vasiliki Tassopoulou, Charis Stamouli, Haochang Shou, George J. Pappas, Christos Davatzikos

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.13911

要約:
Despite recent progress in predicting biomarker trajectories from real clinical data, uncertainty in the predictions poses high-stakes risks (e.g., misdiagnosis) that limit their clinical deployment. To enable safe and reliable use of such predictions in healthcare, we introduce a conformal method for uncertainty-calibrated prediction of biomarker trajectories resulting from randomly-timed clinical visits of patients. Our approach extends conformal prediction to the setting of randomly-timed trajectories via a novel nonconformity score that produces prediction bands guaranteed to cover the unknown biomarker trajectories with a user-prescribed probability. We apply our method across a wide range of standard and state-of-the-art predictors for two well-established brain biomarkers of Alzheimer's disease, using neuroimaging data from real clinical studies. We observe that our conformal prediction bands consistently achieve the desired coverage, while also being tighter than baseline prediction bands. To further account for population heterogeneity, we develop group-conditional conformal bands and test their coverage guarantees across various demographic and clinically relevant subpopulations. Moreover, we demonstrate the clinical utility of our conformal bands in identifying subjects at high risk of progression to Alzheimer's disease. Specifically, we introduce an uncertainty-calibrated risk score that enables the identification of 17.5% more high-risk subjects compared to standard risk scores, highlighting the value of uncertainty calibration in real-world clinical decision making. Our code is available at github.com/vatass/ConformalBiomarkerTrajectories.

#3 Empirical Likelihood for Random Forests and Ensembles

著者: Harold D. Chiang, Yukitoshi Matsushita, Taisuke Otsu

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.13934

要約:
We develop an empirical likelihood (EL) framework for random forests and related ensemble methods, providing a likelihood-based approach to quantify their statistical uncertainty. Exploiting the incomplete $U$-statistic structure inherent in ensemble predictions, we construct an EL statistic that is asymptotically chi-squared when subsampling induced by incompleteness is not overly sparse. Under sparser subsampling regimes, the EL statistic tends to over-cover due to loss of pivotality; we therefore propose a modified EL that restores pivotality through a simple adjustment. Our method retains key properties of EL while remaining computationally efficient. Theory for honest random forests and simulations demonstrate that modified EL achieves accurate coverage and practical reliability relative to existing inference methods.

#4 Splat Regression Models

著者: Mara Daniels, Philippe Rigollet

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14042

要約:
We introduce a highly expressive class of function approximators called Splat Regression Models. Model outputs are mixtures of heterogeneous and anisotropic bump functions, termed splats, each weighted by an output vector. The power of splat modeling lies in its ability to locally adjust the scale and direction of each splat, achieving both high interpretability and accuracy. Fitting splat models reduces to optimization over the space of mixing measures, which can be implemented using Wasserstein-Fisher-Rao gradient flows. As a byproduct, we recover the popular Gaussian Splatting methodology as a special case, providing a unified theoretical framework for this state-of-the-art technique that clearly disambiguates the inverse problem, the model, and the optimization algorithm. Through numerical experiments, we demonstrate that the resulting models and algorithms constitute a flexible and promising approach for solving diverse approximation, estimation, and inverse problems involving low-dimensional data.

#5 SCOPE: Spectral Concentration by Distributionally Robust Joint Covariance-Precision Estimation

著者: Renjie Chen, Viet Anh Nguyen, Huifu Xu

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14146

要約:
We propose a distributionally robust formulation for simultaneously estimating the covariance matrix and the precision matrix of a random vector.The proposed model minimizes the worst-case weighted sum of the Frobenius loss of the covariance estimator and Stein's loss of the precision matrix estimator against all distributions from an ambiguity set centered at the nominal distribution. The radius of the ambiguity set is measured via convex spectral divergence. We demonstrate that the proposed distributionally robust estimation model can be reduced to a convex optimization problem, thereby yielding quasi-analytical estimators. The joint estimators are shown to be nonlinear shrinkage estimators. The eigenvalues of the estimators are shrunk nonlinearly towards a positive scalar, where the scalar is determined by the weight coefficient of the loss terms. By tuning the coefficient carefully, the shrinkage corrects the spectral bias of the empirical covariance/precision matrix estimator. By this property, we call the proposed joint estimator the Spectral concentrated COvariance and Precision matrix Estimator (SCOPE). We demonstrate that the shrinkage effect improves the condition number of the estimator. We provide a parameter-tuning scheme that adjusts the shrinkage target and intensity that is asymptotically optimal. Numerical experiments on synthetic and real data show that our shrinkage estimators perform competitively against state-of-the-art estimators in practical applications.

#6 Causal Discovery on Higher-Order Interactions

著者: Alessio Zanga, Marco Scutari, Fabio Stella

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14206

要約:
Causal discovery combines data with knowledge provided by experts to learn the DAG representing the causal relationships between a given set of variables. When data are scarce, bagging is used to measure our confidence in an average DAG obtained by aggregating bootstrapped DAGs. However, the aggregation step has received little attention from the specialized literature: the average DAG is constructed using only the confidence in the individual edges of the bootstrapped DAGs, thus disregarding complex higher-order edge structures. In this paper, we introduce a novel theoretical framework based on higher-order structures and describe a new DAG aggregation algorithm. We perform a simulation study, discussing the advantages and limitations of the proposed approach. Our proposal is both computationally efficient and effective, outperforming state-of-the-art solutions, especially in low sample size regimes and under high dimensionality settings.

#7 Skewness-Robust Causal Discovery in Location-Scale Noise Models

著者: Daniel Klippert, Alexander Marx

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14441

要約:
To distinguish Markov equivalent graphs in causal discovery, it is necessary to restrict the structural causal model. Crucially, we need to be able to distinguish cause $X$ from effect $Y$ in bivariate models, that is, distinguish the two graphs $X \to Y$ and $Y \to X$. Location-scale noise models (LSNMs), in which the effect $Y$ is modeled based on the cause $X$ as $Y = f(X) + g(X)N$, form a flexible class of models that is general and identifiable in most cases. Estimating these models for arbitrary noise terms $N$, however, is challenging. Therefore, practical estimators are typically restricted to symmetric distributions, such as the normal distribution. As we showcase in this paper, when $N$ is a skewed random variable, which is likely in real-world domains, the reliability of these approaches decreases. To approach this limitation, we propose SkewD, a likelihood-based algorithm for bivariate causal discovery under LSNMs with skewed noise distributions. SkewD extends the usual normal-distribution framework to the skew-normal setting, enabling reliable inference under symmetric and skewed noise. For parameter estimation, we employ a combination of a heuristic search and an expectation conditional maximization algorithm. We evaluate SkewD on novel synthetically generated datasets with skewed noise as well as established benchmark datasets. Throughout our experiments, SkewD exhibits a strong performance and, in comparison to prior work, remains robust under high skewness.

#8 DeepBlip: Estimating Conditional Average Treatment Effects Over Time

著者: Haorui Ma, Dennis Frauen, Stefan Feuerriegel

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14545

要約:
Structural nested mean models (SNMMs) are a principled approach to estimate the treatment effects over time. A particular strength of SNMMs is to break the joint effect of treatment sequences over time into localized, time-specific ``blip effects''. This decomposition promotes interpretability through the incremental effects and enables the efficient offline evaluation of optimal treatment policies without re-computation. However, neural frameworks for SNMMs are lacking, as their inherently sequential g-estimation scheme prevents end-to-end, gradient-based training. Here, we propose DeepBlip, the first neural framework for SNMMs, which overcomes this limitation with a novel double optimization trick to enable simultaneous learning of all blip functions. Our DeepBlip seamlessly integrates sequential neural networks like LSTMs or transformers to capture complex temporal dependencies. By design, our method correctly adjusts for time-varying confounding to produce unbiased estimates, and its Neyman-orthogonal loss function ensures robustness to nuisance model misspecification. Finally, we evaluate our DeepBlip across various clinical datasets, where it achieves state-of-the-art performance.

#9 Towards a Unified Analysis of Neural Networks in Nonparametric Instrumental Variable Regression: Optimization and Generalization

著者: Zonghao Chen, Atsushi Nitanda, Arthur Gretton, Taiji Suzuki

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14710

要約:
We establish the first global convergence result of neural networks for two stage least squares (2SLS) approach in nonparametric instrumental variable regression (NPIV). This is achieved by adopting a lifted perspective through mean-field Langevin dynamics (MFLD), unlike standard MFLD, however, our setting of 2SLS entails a \emph{bilevel} optimization problem in the space of probability measures. To address this challenge, we leverage the penalty gradient approach recently developed for bilevel optimization which formulates bilevel optimization as a Lagrangian problem. This leads to a novel fully first-order algorithm, termed \texttt{F$^2$BMLD}. Apart from the convergence bound, we further provide a generalization bound, revealing an inherent trade-off in the choice of the Lagrange multiplier between optimization and statistical guarantees. Finally, we empirically validate the effectiveness of the proposed method on an offline reinforcement learning benchmark.

#10 On the Gradient Complexity of Private Optimization with Private Oracles

privacy

著者: Michael Menart, Aleksandar Nikolov

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.13999

要約:
We study the running time, in terms of first order oracle queries, of differentially private empirical/population risk minimization of Lipschitz convex losses. We first consider the setting where the loss is non-smooth and the optimizer interacts with a private proxy oracle, which sends only private messages about a minibatch of gradients. In this setting, we show that expected running time $\Omega(\min\{\frac{\sqrt{d}}{\alpha^2}, \frac{d}{\log(1/\alpha)}\})$ is necessary to achieve $\alpha$ excess risk on problems of dimension $d$ when $d \geq 1/\alpha^2$. Upper bounds via DP-SGD show these results are tight when $d>\tilde{\Omega}(1/\alpha^4)$. We further show our lower bound can be strengthened to $\Omega(\min\{\frac{d}{\bar{m}\alpha^2}, \frac{d}{\log(1/\alpha)} \})$ for algorithms which use minibatches of size at most $\bar{m} < \sqrt{d}$. We next consider smooth losses, where we relax the private oracle assumption and give lower bounds under only the condition that the optimizer is private. Here, we lower bound the expected number of first order oracle calls by $\tilde{\Omega}\big(\frac{\sqrt{d}}{\alpha} + \min\{\frac{1}{\alpha^2}, n\}\big)$, where $n$ is the size of the dataset. Modifications to existing algorithms show this bound is nearly tight. Compared to non-private lower bounds, our results show that differentially private optimizers pay a dimension dependent runtime penalty. Finally, as a natural extension of our proof technique, we show lower bounds in the non-smooth setting for optimizers interacting with information limited oracles. Specifically, if the proxy oracle transmits at most $\Gamma$-bits of information about the gradients in the minibatch, then $\Omega\big(\min\{\frac{d}{\alpha^2\Gamma}, \frac{d}{\log(1/\alpha)}\}\big)$ oracle calls are needed. This result shows fundamental limitations of gradient quantization techniques in optimization.

#11 SmallML: Bayesian Transfer Learning for Small-Data Predictive Analytics

著者: Semen Leontev

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14049

要約:
Small and medium-sized enterprises (SMEs) represent 99.9% of U.S. businesses yet remain systematically excluded from AI due to a mismatch between their operational scale and modern machine learning's data requirements. This paper introduces SmallML, a Bayesian transfer learning framework achieving enterprise-level prediction accuracy with datasets as small as 50-200 observations. We develop a three-layer architecture integrating transfer learning, hierarchical Bayesian modeling, and conformal prediction. Layer 1 extracts informative priors from 22,673 public records using a SHAP-based procedure transferring knowledge from gradient boosting to logistic regression. Layer 2 implements hierarchical pooling across J=5-50 SMEs with adaptive shrinkage, balancing population patterns with entity-specific characteristics. Layer 3 provides conformal sets with finite-sample coverage guarantees P(y in C(x)) >= 1-alpha for distribution-free uncertainty quantification. Validation on customer churn data demonstrates 96.7% +/- 4.2% AUC with 100 observations per business -- a +24.2 point improvement over independent logistic regression (72.5% +/- 8.1%), with p < 0.000001. Conformal prediction achieves 92% empirical coverage at 90% target. Training completes in 33 minutes on standard CPU hardware. By enabling enterprise-grade predictions for 33 million U.S. SMEs previously excluded from machine learning, SmallML addresses a critical gap in AI democratization. Keywords: Bayesian transfer learning, hierarchical models, conformal prediction, small-data analytics, SME machine learning

#12 Radial Compensation: Stable and Semantically Decoupled Generative Models on Riemannian Manifolds

著者: Marios Papamichals, Regina Ruane

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14056

要約:
Generative models on curved spaces rely on charts to map Euclidean spaces to manifolds. Exponential maps preserve geodesics but have stiff, radius-dependent Jacobians, while volume-preserving charts maintain densities but distort geodesic distances. Both approaches entangle curvature with model parameters, inflating gradient variance. In high-dimensional latent normalizing flows, the wrapped exponential prior can stretch radii far beyond the curvature scale, leading to poor test likelihoods and stiff solvers. We introduce Radial Compensation (RC), an information-geometric method that selects the base density in the tangent space so that the likelihood depends only on geodesic distance from a pole, decoupling parameter semantics from curvature. RC lets radial parameters retain their usual meaning in geodesic units, while the chart can be tuned as a numerical preconditioner. We extend RC to manifolds with known geodesic polar volume and show that RC is the only construction for geodesic-radial likelihoods with curvature-invariant Fisher information. We derive the Balanced-Exponential (bExp) chart family, balancing volume distortion and geodesic error. Under RC, all bExp settings preserve the same manifold density and Fisher information, with smaller dial values reducing gradient variance and flow cost. Empirically, RC yields stable generative models across densities, VAEs, flows on images and graphs, and protein models. RC improves likelihoods, restores clean geodesic radii, and prevents radius blow-ups in high-dimensional flows, making RC-bExp a robust default for likelihood-trained generative models on manifolds.

#13 Synthetic Survival Control: Extending Synthetic Controls for "When-If" Decision

著者: Jessy Xinyi Han, Devavrat Shah

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14133

要約:
Estimating causal effects on time-to-event outcomes from observational data is particularly challenging due to censoring, limited sample sizes, and non-random treatment assignment. The need for answering such "when-if" questions--how the timing of an event would change under a specified intervention--commonly arises in real-world settings with heterogeneous treatment adoption and confounding. To address these challenges, we propose Synthetic Survival Control (SSC) to estimate counterfactual hazard trajectories in a panel data setting where multiple units experience potentially different treatments over multiple periods. In such a setting, SSC estimates the counterfactual hazard trajectory for a unit of interest as a weighted combination of the observed trajectories from other units. To provide formal justification, we introduce a panel framework with a low-rank structure for causal survival analysis. Indeed, such a structure naturally arises under classical parametric survival models. Within this framework, for the causal estimand of interest, we establish identification and finite sample guarantees for SSC. We validate our approach using a multi-country clinical dataset of cancer treatment outcomes, where the staggered introduction of new therapies creates a quasi-experimental setting. Empirically, we find that access to novel treatments is associated with improved survival, as reflected by lower post-intervention hazard trajectories relative to their synthetic counterparts. Given the broad relevance of survival analysis across medicine, economics, and public policy, our framework offers a general and interpretable tool for counterfactual survival inference using observational data.

#14 Attention via Synaptic Plasticity is All You Need: A Biologically Inspired Spiking Neuromorphic Transformer

著者: Kallol Mondal (Department of Electronics and Communication Engineering, National Institute of Technology Allahabad, Prayagraj, Centre for Nanotechnology, Indian Institute of Technology Roorkee), Ankush Kumar (Centre for Nanotechnology, Indian Institute of Technology Roorkee)

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14691

要約:
Attention is the brain's ability to selectively focus on a few specific aspects while ignoring irrelevant ones. This biological principle inspired the attention mechanism in modern Transformers. Transformers now underpin large language models (LLMs) such as GPT, but at the cost of massive training and inference energy, leading to a large carbon footprint. While brain attention emerges from neural circuits, Transformer attention relies on dot-product similarity to weight elements in the input sequence. Neuromorphic computing, especially spiking neural networks (SNNs), offers a brain-inspired path to energy-efficient intelligence. Despite recent work on attention-based spiking Transformers, the core attention layer remains non-neuromorphic. Current spiking attention (i) relies on dot-product or element-wise similarity suited to floating-point operations, not event-driven spikes; (ii) keeps attention matrices that suffer from the von Neumann bottleneck, limiting in-memory computing; and (iii) still diverges from brain-like computation. To address these issues, we propose the Spiking STDP Transformer (S$^{2}$TDPT), a neuromorphic Transformer that implements self-attention through spike-timing-dependent plasticity (STDP), embedding query--key correlations in synaptic weights. STDP, a core mechanism of memory and learning in the brain and widely studied in neuromorphic devices, naturally enables in-memory computing and supports non-von Neumann hardware. On CIFAR-10 and CIFAR-100, our model achieves 94.35\% and 78.08\% accuracy with only four timesteps and 0.49 mJ on CIFAR-100, an 88.47\% energy reduction compared to a standard ANN Transformer. Grad-CAM shows that the model attends to semantically relevant regions, enhancing interpretability. Overall, S$^{2}$TDPT illustrates how biologically inspired attention can yield energy-efficient, hardware-friendly, and explainable neuromorphic models.

#15 Look-Ahead Reasoning on Learning Platforms

著者: Haiqing Zhu, Tijana Zrnic, Celestine Mendler-D\"unner

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14745

要約:
On many learning platforms, the optimization criteria guiding model training reflect the priorities of the designer rather than those of the individuals they affect. Consequently, users may act strategically to obtain more favorable outcomes, effectively contesting the platform's predictions. While past work has studied strategic user behavior on learning platforms, the focus has largely been on strategic responses to a deployed model, without considering the behavior of other users. In contrast, look-ahead reasoning takes into account that user actions are coupled, and -- at scale -- impact future predictions. Within this framework, we first formalize level-$k$ thinking, a concept from behavioral economics, where users aim to outsmart their peers by looking one step ahead. We show that, while convergence to an equilibrium is accelerated, the equilibrium remains the same, providing no benefit of higher-level reasoning for individuals in the long run. Then, we focus on collective reasoning, where users take coordinated actions by optimizing through their joint impact on the model. By contrasting collective with selfish behavior, we characterize the benefits and limits of coordination; a new notion of alignment between the learner's and the users' utilities emerges as a key concept. We discuss connections to several related mathematical frameworks, including strategic classification, performative prediction, and algorithmic collective action.

#16 Sharp detection of low-dimensional structure in probability measures via dimensional logarithmic Sobolev inequalities

著者: Matthew T. C. Li, Tiangang Cui, Fengyi Li, Youssef Marzouk, Olivier Zahm

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2406.13036

要約:
Identifying low-dimensional structure in high-dimensional probability measures is an essential pre-processing step for efficient sampling. We introduce a method for identifying and approximating a target measure $\pi$ as a perturbation of a given reference measure $\mu$ along a few significant directions of $\mathbb{R}^{d}$. The reference measure can be a Gaussian or a nonlinear transformation of a Gaussian, as commonly arising in generative modeling. Our method extends prior work on minimizing majorizations of the Kullback--Leibler divergence to identify optimal approximations within this class of measures. Our main contribution unveils a connection between the \emph{dimensional} logarithmic Sobolev inequality (LSI) and approximations with this ansatz. Specifically, when the target and reference are both Gaussian, we show that minimizing the dimensional LSI is equivalent to minimizing the KL divergence restricted to this ansatz. For general non-Gaussian measures, the dimensional LSI produces majorants that uniformly improve on previous majorants for gradient-based dimension reduction. We further demonstrate the applicability of this analysis to the squared Hellinger distance, where analogous reasoning shows that the dimensional Poincar\'e inequality offers improved bounds.

#17 Continuum Dropout for Neural Differential Equations

privacy

著者: Jonghun Lee, YongKyung Oh, Sungil Kim, Dong-Young Lim

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.10446

要約:
Neural Differential Equations (NDEs) excel at modeling continuous-time dynamics, effectively handling challenges such as irregular observations, missing values, and noise. Despite their advantages, NDEs face a fundamental challenge in adopting dropout, a cornerstone of deep learning regularization, making them susceptible to overfitting. To address this research gap, we introduce Continuum Dropout, a universally applicable regularization technique for NDEs built upon the theory of alternating renewal processes. Continuum Dropout formulates the on-off mechanism of dropout as a stochastic process that alternates between active (evolution) and inactive (paused) states in continuous time. This provides a principled approach to prevent overfitting and enhance the generalization capabilities of NDEs. Moreover, Continuum Dropout offers a structured framework to quantify predictive uncertainty via Monte Carlo sampling at test time. Through extensive experiments, we demonstrate that Continuum Dropout outperforms existing regularization methods for NDEs, achieving superior performance on various time series and image classification tasks. It also yields better-calibrated and more trustworthy probability estimates, highlighting its effectiveness for uncertainty-aware modeling.

#18 Equivariant neural networks and equivarification

著者: Erkao Bao, Jingcheng Lu, Linqi Song, Nathan Hart-Hodgson, William Parson, Yanheng Zhou

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/1906.07172

要約:
Equivariant neural networks are a class of neural networks designed to preserve symmetries inherent in the data. In this paper, we introduce a general method for modifying a neural network to enforce equivariance, a process we refer to as equivarification. We further show that group convolutional neural networks (G-CNNs) arise as a special case of our framework.

#19 Virtual Human Generative Model: Masked Modeling Approach for Learning Human Characteristics

著者: Kenta Oono, Nontawat Charoenphakdee, Kotatsu Bito, Zhengyan Gao, Hideyoshi Igata, Masashi Yoshikawa, Yoshiaki Ota, Hiroki Okui, Kei Akita, Shoichiro Yamaguchi, Yohei Sugawara, Shin-ichi Maeda, Kunihiko Miyoshi, Yuki Saito, Koki Tsuda, Hiroshi Maruyama, Kohei Hayashi

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2306.10656

要約:
Virtual Human Generative Model (VHGM) is a generative model that approximates the joint probability over more than 2000 human healthcare-related attributes. This paper presents the core algorithm, VHGM-MAE, a masked autoencoder (MAE) tailored for handling high-dimensional, sparse healthcare data. VHGM-MAE tackles four key technical challenges: (1) heterogeneity of healthcare data types, (2) probability distribution modeling, (3) systematic missingness in the training dataset arising from multiple data sources, and (4) the high-dimensional, small-$n$-large-$p$ problem. To address these challenges, VHGM-MAE employs a likelihood-based approach to model distributions with heterogeneous types, a transformer-based MAE to capture complex dependencies among observed and missing attributes, and a novel training scheme that effectively leverages available samples with diverse missingness patterns to mitigate the small-n-large-p problem. Experimental results demonstrate that VHGM-MAE outperforms existing methods in both missing value imputation and synthetic data generation.

#20 Improved Sample Complexity Bounds for Diffusion Model Training

diffusion

著者: Shivam Gupta, Aditya Parulekar, Eric Price, Zhiyang Xun

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2311.13745

要約:
Diffusion models have become the most popular approach to deep generative modeling of images, largely due to their empirical performance and reliability. From a theoretical standpoint, a number of recent works have studied the iteration complexity of sampling, assuming access to an accurate diffusion model. In this work, we focus on understanding the sample complexity of training such a model; how many samples are needed to learn an accurate diffusion model using a sufficiently expressive neural network? Prior work showed bounds polynomial in the dimension, desired Total Variation error, and Wasserstein error. We show an exponential improvement in the dependence on Wasserstein error and depth, along with improved dependencies on other relevant parameters.

#21 Towards a Unified Theory for Semiparametric Data Fusion with Individual-Level Data

著者: Ellen Graham (University of Washington), Marco Carone (University of Washington), Andrea Rotnitzky (University of Washington)

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2409.09973

要約:
We address the goal of conducting inference about a smooth finite-dimensional parameter by utilizing individual-level data from various independent sources. Recent advancements have led to the development of a comprehensive theory capable of handling scenarios where different data sources align with, possibly distinct subsets of, conditional distributions of a single factorization of the joint target distribution. While this theory proves effective in many significant contexts, it falls short in certain common data fusion problems, such as two-sample instrumental variable analysis, settings that integrate data from epidemiological studies with diverse designs (e.g., prospective cohorts and retrospective case-control studies), and studies with variables prone to measurement error that are supplemented by validation studies. In this paper, we extend the aforementioned comprehensive theory to allow for the fusion of individual-level data from sources aligned with conditional distributions that do not correspond to a single factorization of the target distribution. Assuming conditional and marginal distribution alignments, we provide universal results that characterize the class of all influence functions of regular asymptotically linear estimators and the efficient influence function of any pathwise differentiable parameter, irrespective of the number of data sources, the specific parameter of interest, or the statistical model for the target distribution. This theory paves the way for machine-learning debiased, semiparametric efficient estimation.

#22 Gradient descent inference in empirical risk minimization

著者: Qiyang Han, Xiaocong Xu

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2412.09498

要約:
Gradient descent is one of the most widely used iterative algorithms in modern statistical learning. However, its precise algorithmic dynamics in high-dimensional settings remain only partially understood, which has limited its broader potential for statistical inference applications. This paper provides a precise, non-asymptotic joint distributional characterization of gradient descent iterates and their debiased statistics in a broad class of empirical risk minimization problems, in the so-called mean-field regime where the sample size is proportional to the signal dimension. Our non-asymptotic state evolution theory holds for both general non-convex loss functions and non-Gaussian data, and reveals the central role of two Onsager correction matrices that precisely characterize the non-trivial dependence among all gradient descent iterates in the mean-field regime. Leveraging the joint state evolution characterization, we show that the gradient descent iterate retrieves approximate normality after a debiasing correction via a linear combination of all past iterates, where the debiasing coefficients can be estimated by the proposed gradient descent inference algorithm. This leads to a new algorithmic statistical inference framework based on debiased gradient descent, which (i) applies to a broad class of models with both convex and non-convex losses, (ii) remains valid at each iteration without requiring algorithmic convergence, and (iii) exhibits a certain robustness to possible model misspecification. As a by-product, our framework also provides algorithmic estimates of the generalization error at each iteration. As canonical examples, we demonstrate our theory and inference methods in the single-index regression model and a generalized logistic regression model, where the natural loss functions may exhibit arbitrarily non-convex landscapes.

#23 Exploring Variance Reduction in Importance Sampling for Efficient DNN Training

著者: Takuro Kutsuna

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2501.13296

要約:
Importance sampling is widely used to improve the efficiency of deep neural network (DNN) training by reducing the variance of gradient estimators. However, efficiently assessing the variance reduction relative to uniform sampling remains challenging due to computational overhead. This paper proposes a method for estimating variance reduction during DNN training using only minibatches sampled under importance sampling. By leveraging the proposed method, the paper also proposes an effective minibatch size to enable automatic learning rate adjustment. An absolute metric to quantify the efficiency of importance sampling is also introduced as well as an algorithm for real-time estimation of importance scores based on moving gradient statistics. Theoretical analysis and experiments on benchmark datasets demonstrated that the proposed algorithm consistently reduces variance, improves training efficiency, and enhances model accuracy compared with current importance-sampling approaches while maintaining minimal computational overhead.

#24 Closed-Form Feedback-Free Learning with Forward Projection

著者: Robert O'Shea, Bipin Rajendran

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2501.16476

要約:
State-of-the-art methods for backpropagation-free learning employ local error feedback to direct iterative optimisation via gradient descent. In this study, we examine the more restrictive setting where retrograde communication from neuronal outputs is unavailable for pre-synaptic weight optimisation. To address this challenge, we propose Forward Projection (FP). This randomised closed-form training method requires only a single forward pass over the entire dataset for model fitting, without retrograde communication. Our method generates target values for pre-activation membrane potentials at each layer through randomised nonlinear projections of pre-synaptic inputs and the labels, thereby encoding information from both sources. Local loss functions are optimised over pre-synaptic inputs using closed-form regression, without feedback from neuronal outputs or downstream layers. Interpretability is a key advantage of FP training; membrane potentials of hidden neurons in FP-trained networks encode information which are interpretable layer-wise as label predictions. We demonstrate the effectiveness of FP across four biomedical datasets, comparing it with backpropagation and local learning techniques such as Forward-Forward training and Local Supervision in multi-layer perceptron and convolutional architectures. In some few-shot learning tasks, FP yielded more generalisable models than those optimised via backpropagation. In large-sample tasks, FP-based models achieve generalisation comparable to gradient descent-based local learning methods while requiring only a single forward propagation step, achieving significant speed up for training.

#25 1-Lipschitz Network Initialization for Certifiably Robust Classification Applications: A Decay Problem

著者: Marius F. R. Juston, Ramavarapu S. Sreenivas, William R. Norris, Dustin Nottage, Ahmet Soylemezoglu

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2503.00240

要約:
This paper discusses the weight parametrization of two standard 1-Lipschitz network architectures, the Almost-Orthogonal-Layers (AOL) and the SDP-based Lipschitz Layers (SLL). It examines their impact on initialization for deep 1-Lipschitz feedforward networks, and discusses underlying issues surrounding this initialization. These networks are mainly used in certifiably robust classification applications to combat adversarial attacks by limiting the impact of perturbations on the classification output. Exact and upper bounds for the parameterized weight variance were calculated assuming a standard Normal distribution initialization; additionally, an upper bound was computed assuming a Generalized Normal Distribution, generalizing the proof for Uniform, Laplace, and Normal distribution weight initializations. It is demonstrated that the weight variance holds no bearing on the output variance distribution and that only the dimension of the weight matrices matters. Additionally, this paper demonstrates that the weight initialization always causes deep 1-Lipschitz networks to decay to zero.

#26 Purifying Approximate Differential Privacy with Randomized Post-processing

privacy

著者: Yingyu Lin, Erchi Wang, Yi-An Ma, Yu-Xiang Wang

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2503.21071

要約:
We propose a framework to convert $(\varepsilon, \delta)$-approximate Differential Privacy (DP) mechanisms into $(\varepsilon', 0)$-pure DP mechanisms under certain conditions, a process we call ``purification.'' This algorithmic technique leverages randomized post-processing with calibrated noise to eliminate the $\delta$ parameter while achieving near-optimal privacy-utility tradeoff for pure DP. It enables a new design strategy for pure DP algorithms: first run an approximate DP algorithm with certain conditions, and then purify. This approach allows one to leverage techniques such as strong composition and propose-test-release that require $\delta>0$ in designing pure-DP methods with $\delta=0$. We apply this framework in various settings, including Differentially Private Empirical Risk Minimization (DP-ERM), stability-based release, and query release tasks. To the best of our knowledge, this is the first work with a statistically and computationally efficient reduction from approximate DP to pure DP. Finally, we illustrate the use of this reduction for proving lower bounds under approximate DP constraints with explicit dependence in $\delta$, avoiding the sophisticated fingerprinting code construction.

#27 Proofs as Explanations: Short Certificates for Reliable Predictions

著者: Avrim Blum, Steve Hanneke, Chirag Pabbaraju, Donya Saless

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2504.08377

要約:
We consider a model for explainable AI in which an explanation for a prediction $h(x)=y$ consists of a subset $S'$ of the training data (if it exists) such that all classifiers $h' \in H$ that make at most $b$ mistakes on $S'$ predict $h'(x)=y$. Such a set $S'$ serves as a proof that $x$ indeed has label $y$ under the assumption that (1) the target function $h^\star$ belongs to $H$, and (2) the set $S$ contains at most $b$ corrupted points. For example, if $b=0$ and $H$ is the family of linear classifiers in $\mathbb{R}^d$, and if $x$ lies inside the convex hull of the positive data points in $S$ (and hence every consistent linear classifier labels $x$ as positive), then Carath\'eodory's theorem states that $x$ lies inside the convex hull of $d+1$ of those points. So, a set $S'$ of size $d+1$ could be released as an explanation for a positive prediction, and would serve as a short proof of correctness of the prediction under the assumption of realizability. In this work, we consider this problem more generally, for general hypothesis classes $H$ and general values $b\geq 0$. We define the notion of the robust hollow star number of $H$ (which generalizes the standard hollow star number), and show that it precisely characterizes the worst-case size of the smallest certificate achievable, and analyze its size for natural classes. We also consider worst-case distributional bounds on certificate size, as well as distribution-dependent bounds that we show tightly control the sample size needed to get a certificate for any given test example. In particular, we define a notion of the certificate coefficient $\varepsilon_x$ of an example $x$ with respect to a data distribution $D$ and target function $h^\star$, and prove matching upper and lower bounds on sample size as a function of $\varepsilon_x$, $b$, and the VC dimension $d$ of $H$.

#28 Bayesian information theoretic model-averaging stochastic item selection for computer adaptive testing

著者: Tina Su, Edison Choe, Joshua C. Chang

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2504.15543

要約:
Computer Adaptive Testing (CAT) aims to accurately estimate an individual's ability using only a subset of an Item Response Theory (IRT) instrument. For many applications of CAT, one also needs to ensure diverse item exposure across different testing sessions, preventing any single item from being over or underutilized. In CAT, items are selected sequentially based on a running estimate of a respondent's ability. Prior methods almost universally see item selection through an optimization lens, motivating greedy item selection procedures. While efficient, these deterministic methods tend to have poor item exposure. Existing stochastic methods for item selection are ad-hoc, where item sampling weights lack theoretical justification. In this manuscript, we formulate stochastic CAT as a Bayesian model averaging problem. We seek item sampling probabilities, treated in the long run frequentist sense, that perform optimal model averaging for the ability estimate in a Bayesian sense. In doing so we derive a cross-entropy information criterion that yields optimal stochastic mixing. We tested our new method on the eight independent IRT models that comprise the Work Disability Functional Assessment Battery, comparing it to prior art. We found that our stochastic methodology had superior item exposure while not compromising in terms of test accuracy and efficiency.

#29 Bayes optimal learning of attention-indexed models

著者: Fabrizio Boncoraglio, Emanuele Troiani, Vittorio Erba, Lenka Zdeborov\'a

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2506.01582

要約:
We introduce the attention-indexed model (AIM), a theoretical framework for analyzing learning in deep attention layers. Inspired by multi-index models, AIM captures how token-level outputs emerge from layered bilinear interactions over high-dimensional embeddings. Unlike prior tractable attention models, AIM allows full-width key and query matrices, aligning more closely with practical transformers. Using tools from statistical mechanics and random matrix theory, we derive closed-form predictions for Bayes-optimal generalization error and identify sharp phase transitions as a function of sample complexity, model width, and sequence length. We propose a matching approximate message passing algorithm and show that gradient descent can reach optimal performance. AIM offers a solvable playground for understanding learning in self-attention layers, that are key components of modern architectures.

#30 Optimality and NP-Hardness of Transformers in Learning Markovian Dynamical Functions

著者: Yanna Ding, Songtao Lu, Yingdong Lu, Tomasz Nowicki, Jianxi Gao

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.18638

要約:
Transformer architectures can solve unseen tasks based on input-output pairs in a given prompt due to in-context learning (ICL). Existing theoretical studies on ICL have mainly focused on linear regression tasks, often with i.i.d. inputs. To understand how transformers express ICL when modeling dynamics-driven functions, we investigate Markovian function learning through a structured ICL setup, where we characterize the loss landscape to reveal underlying optimization behaviors. Specifically, we (1) provide the closed-form expression of the global minimizer (in an enlarged parameter space) for a single-layer linear self-attention (LSA) model; (2) prove that recovering transformer parameters that realize the optimal solution is NP-hard in general, revealing a fundamental limitation of one-layer LSA in representing structured dynamical functions; and (3) supply a novel interpretation of a multilayer LSA as performing preconditioned gradient descent to optimize multiple objectives beyond the square loss. These theoretical results are numerically validated using simplified transformers.

#31 Posterior Sampling by Combining Diffusion Models with Annealed Langevin Dynamics

diffusion

著者: Zhiyang Xun, Shivam Gupta, Eric Price

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.26324

要約:
Given a noisy linear measurement $y = Ax + \xi$ of a distribution $p(x)$, and a good approximation to the prior $p(x)$, when can we sample from the posterior $p(x \mid y)$? Posterior sampling provides an accurate and fair framework for tasks such as inpainting, deblurring, and MRI reconstruction, and several heuristics attempt to approximate it. Unfortunately, approximate posterior sampling is computationally intractable in general. To sidestep this hardness, we focus on (local or global) log-concave distributions $p(x)$. In this regime, Langevin dynamics yields posterior samples when the exact scores of $p(x)$ are available, but it is brittle to score--estimation error, requiring an MGF bound (sub-exponential error). By contrast, in the unconditional setting, diffusion models succeed with only an $L^2$ bound on the score error. We prove that combining diffusion models with an annealed variant of Langevin dynamics achieves conditional sampling in polynomial time using merely an $L^4$ bound on the score error.

#32 Geometric Decomposition of Statistical Inference through Gradient Flow and Co-Monotonicity Measures

著者: Pawel Gajer, Jacques Ravel

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.04599

要約:
Understanding feature-outcome associations in high-dimensional data remains challenging when relationships vary across subpopulations, yet standard methods assuming global associations miss context-dependent patterns, reducing statistical power and interpretability. We develop a geometric decomposition framework offering two strategies for partitioning inference problems into regional analyses on data-derived Riemannian graphs. Gradient flow decomposition uses path-monotonicity-validated discrete Morse theory to partition samples into gradient flow cells where outcomes exhibit monotonic behavior. Co-monotonicity decomposition leverages association structure: vertex-level coefficients measuring directional concordance between outcome and features, or between feature pairs, define embeddings of samples into association space. These embeddings induce Riemannian k-NN graphs on which biclustering identifies co-monotonicity cells (coherent regions) and feature modules. This extends naturally to multi-modal integration across multiple feature sets. Both strategies apply independently or jointly, with Bayesian posterior sampling providing credible intervals.

#33 A Closed-Form Framework for Schr\"odinger Bridges Between Arbitrary Densities

著者: Hanwen Huang

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.07786

要約:
Score-based generative models have recently attracted significant attention for their ability to generate high-fidelity data by learning maps from simple Gaussian priors to complex data distributions. A natural generalization of this idea to transformations between arbitrary probability distributions leads to the Schr\"odinger Bridge (SB) problem. However, SB solutions rarely admit closed-form expressios and are commonly obtained through iterative stochastic simulation procedures, which are computationally intensive and can be unstable. In this work, we introduce a unified closed-form framework for representing the stochastic dynamics of SB systems. Our formulation subsumes previously known analytical solutions including the Schr\"odinger F\"ollmer process and the Gaussian SB as specific instances. Notably, the classical Gaussian SB solution, previously derived using substantially more sophisticated tools such as Riemannian geometry and generator theory, follows directly from our formulation as an immediate corollary. Leveraging this framework, we develop a simulation-free algorithm that infers SB dynamics directly from samples of the source and target distributions. We demonstrate the versatility of our approach in two settings: (i) modeling developmental trajectories in single-cell genomics and (ii) solving image restoration tasks such as inpainting and deblurring. This work opens a new direction for efficient and scalable nonlinear diffusion modeling across scientific and machine learning applications.

#34 Adaptive Stepsizing for Stochastic Gradient Langevin Dynamics in Bayesian Neural Networks

著者: Rajit Rajpal, Benedict Leimkuhler, Yuanhao Jiang

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.11666

要約:
Bayesian neural networks (BNNs) require scalable sampling algorithms to approximate posterior distributions over parameters. Existing stochastic gradient Markov Chain Monte Carlo (SGMCMC) methods are highly sensitive to the choice of stepsize and adaptive variants such as pSGLD typically fail to sample the correct invariant measure without addition of a costly divergence correction term. In this work, we build on the recently proposed `SamAdams' framework for timestep adaptation (Leimkuhler, Lohmann, and Whalley 2025), introducing an adaptive scheme: SA-SGLD, which employs time rescaling to modulate the stepsize according to a monitored quantity (typically the local gradient norm). SA-SGLD can automatically shrink stepsizes in regions of high curvature and expand them in flatter regions, improving both stability and mixing without introducing bias. We show that our method can achieve more accurate posterior sampling than SGLD on high-curvature 2D toy examples and in image classification with BNNs using sharp priors.

stat.ML updates on arXiv.org

📋 論文タイトル一覧