arXiv論文一覧 - stat.ML updates on arXiv.org

#1 The Theory behind UMAP?

著者: David Wegmann

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03375

要約:
In 2018, McInnes et al. introduced a dimensionality reduction algorithm called UMAP, which enjoys wide popularity among data scientists. Their work introduces a finite variant of a functor called the metric realization, based on an unpublished draft by Spivak. This draft contains many errors, most of which are reproduced by McInnes et al. and subsequent publications. This article aims to repair these errors and provide a self-contained document with the full derivation of Spivak's functors and McInnes et al.'s finite variant. We contribute an explicit description of the metric realization and related functors. At the end, we discuss the UMAP algorithm, as well as claims about properties of the algorithm and the correspondence of McInnes et al.'s finite variant to the UMAP algorithm.

#2 Learning Order Forest for Qualitative-Attribute Data Clustering

著者: Mingjie Zhao, Sen Feng, Yiqun Zhang, Mengke Li, Yang Lu, Yiu-ming Cheung

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03387

要約:
Clustering is a fundamental approach to understanding data patterns, wherein the intuitive Euclidean distance space is commonly adopted. However, this is not the case for implicit cluster distributions reflected by qualitative attribute values, e.g., the nominal values of attributes like symptoms, marital status, etc. This paper, therefore, discovered a tree-like distance structure to flexibly represent the local order relationship among intra-attribute qualitative values. That is, treating a value as the vertex of the tree allows to capture rich order relationships among the vertex value and the others. To obtain the trees in a clustering-friendly form, a joint learning mechanism is proposed to iteratively obtain more appropriate tree structures and clusters. It turns out that the latent distance space of the whole dataset can be well-represented by a forest consisting of the learned trees. Extensive experiments demonstrate that the joint learning adapts the forest to the clustering task to yield accurate results. Comparisons of 10 counterparts on 12 real benchmark datasets with significance tests verify the superiority of the proposed method.

#3 Beyond Cross-Validation: Adaptive Parameter Selection for Kernel-Based Gradient Descents

著者: Xiaotong Liu, Yunwen Lei, Xiangyu Chang, Shao-Bo Lin

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03401

要約:
This paper proposes a novel parameter selection strategy for kernel-based gradient descent (KGD) algorithms, integrating bias-variance analysis with the splitting method. We introduce the concept of empirical effective dimension to quantify iteration increments in KGD, deriving an adaptive parameter selection strategy that is implementable. Theoretical verifications are provided within the framework of learning theory. Utilizing the recently developed integral operator approach, we rigorously demonstrate that KGD, equipped with the proposed adaptive parameter selection strategy, achieves the optimal generalization error bound and adapts effectively to different kernels, target functions, and error metrics. Consequently, this strategy showcases significant advantages over existing parameter selection methods for KGD.

#4 Surprisal-R\'enyi Free Energy

著者: Shion Matsumoto, Raul Castillo, Benjamin Prada, Ankur Arjun Mali

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03405

要約:
The forward and reverse Kullback-Leibler (KL) divergences arise as limiting objectives in learning and inference yet induce markedly different inductive biases that cannot be explained at the level of expectations alone. In this work, we introduce the Surprisal-R\'enyi Free Energy (SRFE), a log-moment-based functional of the likelihood ratio that lies outside the class of $f$-divergences. We show that SRFE recovers forward and reverse KL divergences as singular endpoint limits and derive local expansions around both limits in which the variance of the log-likelihood ratio appears as a first-order correction. This reveals an explicit mean-variance tradeoff governing departures from KL-dominated regimes. We further establish a Gibbs-type variational characterization of SRFE as the unique minimizer of a weighted sum of KL divergences and prove that SRFE directly controls large deviations of excess code-length via Chernoff-type bounds, yielding a precise Minimum Description Length interpretation. Together, these results identify SRFE as a variance- and tail-sensitive free-energy functional that clarifies the geometric and large-deviation structure underlying forward and reverse KL limits, without unifying or subsuming distinct learning frameworks.

#5 Scalable Contrastive Causal Discovery under Unknown Soft Interventions

著者: Mingxuan Zhang, Khushi Desai, Sopho Kevlishvili, Elham Azizi

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03411

要約:
Observational causal discovery is only identifiable up to the Markov equivalence class. While interventions can reduce this ambiguity, in practice interventions are often soft with multiple unknown targets. In many realistic scenarios, only a single intervention regime is observed. We propose a scalable causal discovery model for paired observational and interventional settings with shared underlying causal structure and unknown soft interventions. The model aggregates subset-level PDAGs and applies contrastive cross-regime orientation rules to construct a globally consistent maximal PDAG under Meek closure, enabling generalization to both in-distribution and out-of-distribution settings. Theoretically, we prove that our model is sound with respect to a restricted $\Psi$ equivalence class induced solely by the information available in the subset-restricted setting. We further show that the model asymptotically recovers the corresponding identifiable PDAG and can orient additional edges compared to non-contrastive subset-restricted methods. Experiments on synthetic data demonstrate improved causal structure recovery, generalization to unseen graphs with held-out causal mechanisms, and scalability to larger graphs, with ablations supporting the theoretical results.

#6 Empirical Evaluation of No Free Lunch Violations in Permutation-Based Optimization

著者: Grzegorz Sroka

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03613

要約:
The No Free Lunch (NFL) theorem guarantees equal average performance only under uniform sampling of a function space closed under permutation (c.u.p.). We ask when this averaging ceases to reflect what benchmarking actually reports. We study an iterative-search setting with sampling without replacement, where algorithms differ only in evaluation order. Binary objectives allow exhaustive evaluation in the fully enumerable case, and efficiency is defined by the first time the global minimum is reached. We then construct two additional benchmarks by algebraically recombining the same baseline functions through sums and differences. Function-algorithm relations are examined via correlation structure, hierarchical clustering, delta heatmaps, and PCA. A one-way ANOVA with Tukey contrasts confirms that algebraic reformulations induce statistically meaningful shifts in performance patterns. The uniformly sampled baseline remains consistent with the global NFL symmetry. In contrast, the algebraically modified benchmarks yield stable re-rankings and coherent clusters of functions and sampling policies. Composite objectives can also exhibit non-additive search effort despite being built from simpler components. Monte Carlo experiments indicate that order effects persist in larger spaces and depend on function class. Taken together, the results show how objective reformulation and benchmark design can generate structured local departures from NFL intuition. They motivate algorithm choice that is aware of both the problem class and the objective representation. This message applies to evolutionary computation as well as to statistical procedures based on relabeling, resampling, and permutation tests.

#7 Riemannian Langevin Dynamics: Strong Convergence of Geometric Euler-Maruyama Scheme

著者: Zhiyuan Zhan, Masashi Sugiyama

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03626

要約:
Low-dimensional structure in real-world data plays an important role in the success of generative models, which motivates diffusion models defined on intrinsic data manifolds. Such models are driven by stochastic differential equations (SDEs) on manifolds, which raises the need for convergence theory of numerical schemes for manifold-valued SDEs. In Euclidean space, the Euler--Maruyama (EM) scheme achieves strong convergence with order $1/2$, but an analogous result for manifold discretizations is less understood in general settings. In this work, we study a geometric version of the EM scheme for SDEs on Riemannian manifolds and prove strong convergence with order $1/2$ under geometric and regularity conditions. As an application, we obtain a Wasserstein bound for sampling on manifolds via the geometric EM discretization of Riemannian Langevin dynamics.

#8 Generalization Properties of Score-matching Diffusion Models for Intrinsically Low-dimensional Data

diffusion

著者: Saptarshi Chakraborty, Quentin Berthet, Peter L. Bartlett

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03700

要約:
Despite the remarkable empirical success of score-based diffusion models, their statistical guarantees remain underdeveloped. Existing analyses often provide pessimistic convergence rates that do not reflect the intrinsic low-dimensional structure common in real data, such as that arising in natural images. In this work, we study the statistical convergence of score-based diffusion models for learning an unknown distribution $\mu$ from finitely many samples. Under mild regularity conditions on the forward diffusion process and the data distribution, we derive finite-sample error bounds on the learned generative distribution, measured in the Wasserstein-$p$ distance. Unlike prior results, our guarantees hold for all $p \ge 1$ and require only a finite-moment assumption on $\mu$, without compact-support, manifold, or smooth-density conditions. Specifically, given $n$ i.i.d.\ samples from $\mu$ with finite $q$-th moment and appropriately chosen network architectures, hyperparameters, and discretization schemes, we show that the expected Wasserstein-$p$ error between the learned distribution $\hat{\mu}$ and $\mu$ scales as $\mathbb{E}\, \mathbb{W}_p(\hat{\mu},\mu) = \widetilde{O}\!\left(n^{-1 / d^\ast_{p,q}(\mu)}\right),$ where $d^\ast_{p,q}(\mu)$ is the $(p,q)$-Wasserstein dimension of $\mu$. Our results demonstrate that diffusion models naturally adapt to the intrinsic geometry of data and mitigate the curse of dimensionality, since the convergence rate depends on $d^\ast_{p,q}(\mu)$ rather than the ambient dimension. Moreover, our theory conceptually bridges the analysis of diffusion models with that of GANs and the sharp minimax rates established in optimal transport. The proposed $(p,q)$-Wasserstein dimension also extends classical Wasserstein dimension notions to distributions with unbounded support, which may be of independent theoretical interest.

#9 Observationally Informed Adaptive Causal Experimental Design

著者: Erdun Gao, Liang Zhang, Jake Fawkes, Aoqi Zuo, Wenqin Liu, Haoxuan Li, Mingming Gong, Dino Sejdinovic

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03785

要約:
Randomized Controlled Trials (RCTs) represent the gold standard for causal inference yet remain a scarce resource. While large-scale observational data is often available, it is utilized only for retrospective fusion, and remains discarded in prospective trial design due to bias concerns. We argue this "tabula rasa" data acquisition strategy is fundamentally inefficient. In this work, we propose Active Residual Learning, a new paradigm that leverages the observational model as a foundational prior. This approach shifts the experimental focus from learning target causal quantities from scratch to efficiently estimating the residuals required to correct observational bias. To operationalize this, we introduce the R-Design framework. Theoretically, we establish two key advantages: (1) a structural efficiency gap, proving that estimating smooth residual contrasts admits strictly faster convergence rates than reconstructing full outcomes; and (2) information efficiency, where we quantify the redundancy in standard parameter-based acquisition (e.g., BALD), demonstrating that such baselines waste budget on task-irrelevant nuisance uncertainty. We propose R-EPIG (Residual Expected Predictive Information Gain), a unified criterion that directly targets the causal estimand, minimizing residual uncertainty for estimation or clarifying decision boundaries for policy. Experiments on synthetic and semi-synthetic benchmarks demonstrate that R-Design significantly outperforms baselines, confirming that repairing a biased model is far more efficient than learning one from scratch.

#10 Invariance-Based Dynamic Regret Minimization

著者: Margherita Lazzaretto, Jonas Peters, Niklas Pfister

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03843

要約:
We consider stochastic non-stationary linear bandits where the linear parameter connecting contexts to the reward changes over time. Existing algorithms in this setting localize the policy by gradually discarding or down-weighting past data, effectively shrinking the time horizon over which learning can occur. However, in many settings historical data may still carry partial information about the reward model. We propose to leverage such data while adapting to changes, by assuming the reward model decomposes into stationary and non-stationary components. Based on this assumption, we introduce ISD-linUCB, an algorithm that uses past data to learn invariances in the reward model and subsequently exploits them to improve online performance. We show both theoretically and empirically that leveraging invariance reduces the problem dimensionality, yielding significant regret improvements in fast-changing environments when sufficient historical data is available.

#11 Exploiting Subgradient Sparsity in Max-Plus Neural Networks

著者: Ikhlas Enaieh (LTCI, S2A), Olivier Fercoq (S2A, LTCI)

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.04133

要約:
Deep Neural Networks are powerful tools for solving machine learning problems, but their training often involves dense and costly parameter updates. In this work, we use a novel Max-Plus neural architecture in which classical addition and multiplication are replaced with maximum and summation operations respectively. This is a promising architecture in terms of interpretability, but its training is challenging. A particular feature is that this algebraic structure naturally induces sparsity in the subgradients, as only neurons that contribute to the maximum affect the loss. However, standard backpropagation fails to exploit this sparsity, leading to unnecessary computations. In this work, we focus on the minimization of the worst sample loss which transfers this sparsity to the optimization loss. To address this, we propose a sparse subgradient algorithm that explicitly exploits the algebraic sparsity. By tailoring the optimization procedure to the non-smooth nature of Max-Plus models, our method achieves more efficient updates while retaining theoretical guarantees. This highlights a principled path toward bridging algebraic structure and scalable learning.

#12 Stable and Steerable Sparse Autoencoders with Weight Regularization

著者: Piotr Jedryszek, Oliver M. Crook

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.04198

要約:
Sparse autoencoders (SAEs) are widely used to extract human-interpretable features from neural network activations, but their learned features can vary substantially across random seeds and training choices. To improve stability, we studied weight regularization by adding L1 or L2 penalties on encoder and decoder weights, and evaluate how regularization interacts with common SAE training defaults. On MNIST, we observe that L2 weight regularization produces a core of highly aligned features and, when combined with tied initialization and unit-norm decoder constraints, it dramatically increases cross-seed feature consistency. For TopK SAEs trained on language model activations (Pythia-70M-deduped), adding a small L2 weight penalty increased the fraction of features shared across three random seeds and roughly doubles steering success rates, while leaving the mean of automated interpretability scores essentially unchanged. Finally, in the regularized setting, activation steering success becomes better predicted by auto-interpretability scores, suggesting that regularization can align text-based feature explanations with functional controllability.

#13 Beyond Mixtures and Products for Ensemble Aggregation: A Likelihood Perspective on Generalized Means

著者: Rapha\"el Razafindralambo, R\'emy Sun, Fr\'ed\'eric Precioso, Damien Garreau, Pierre-Alexandre Mattei

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.04204

要約:
Density aggregation is a central problem in machine learning, for instance when combining predictions from a Deep Ensemble. The choice of aggregation remains an open question with two commonly proposed approaches being linear pooling (probability averaging) and geometric pooling (logit averaging). In this work, we address this question by studying the normalized generalized mean of order $r \in \mathbb{R} \cup \{-\infty,+\infty\}$ through the lens of log-likelihood, the standard evaluation criterion in machine learning. This provides a unifying aggregation formalism and shows different optimal configurations for different situations. We show that the regime $r \in [0,1]$ is the only range ensuring systematic improvements relative to individual distributions, thereby providing a principled justification for the reliability and widespread practical use of linear ($r=1$) and geometric ($r=0$) pooling. In contrast, we show that aggregation rules with $r \notin [0,1]$ may fail to provide consistent gains with explicit counterexamples. Finally, we corroborate our theoretical findings with empirical evaluations using Deep Ensembles on image and text classification benchmarks.

#14 Semi-Supervised Generative Learning via Latent Space Distribution Matching

著者: Kwong Yu Chong, Long Feng

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.04223

要約:
We introduce Latent Space Distribution Matching (LSDM), a novel framework for semi-supervised generative modeling of conditional distributions. LSDM operates in two stages: (i) learning a low-dimensional latent space from both paired and unpaired data, and (ii) performing joint distribution matching in this space via the 1-Wasserstein distance, using only paired data. This two-step approach minimizes an upper bound on the 1-Wasserstein distance between joint distributions, reducing reliance on scarce paired samples while enabling fast one-step generation. Theoretically, we establish non-asymptotic error bounds and demonstrate a key benefit of unpaired data: enhanced geometric fidelity in generated outputs. Furthermore, by extending the scope of its two core steps, LSDM provides a coherent statistical perspective that connects to a broad class of latent-space approaches. Notably, Latent Diffusion Models (LDMs) can be viewed as a variant of LSDM, in which joint distribution matching is achieved indirectly via score matching. Consequently, our results also provide theoretical insights into the consistency of LDMs. Empirical evaluations on real-world image tasks, including class-conditional generation and image super-resolution, demonstrate the effectiveness of LSDM in leveraging unpaired data to enhance generation quality.

#15 Minimax Optimal Strategy for Delayed Observations in Online Reinforcement Learning

著者: Harin Lee, Kevin Jamieson

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03480

要約:
We study reinforcement learning with delayed state observation, where the agent observes the current state after some random number of time steps. We propose an algorithm that combines the augmentation method and the upper confidence bound approach. For tabular Markov decision processes (MDPs), we derive a regret bound of $\tilde{\mathcal{O}}(H \sqrt{D_{\max} SAK})$, where $S$ and $A$ are the cardinalities of the state and action spaces, $H$ is the time horizon, $K$ is the number of episodes, and $D_{\max}$ is the maximum length of the delay. We also provide a matching lower bound up to logarithmic factors, showing the optimality of our approach. Our analytical framework formulates this problem as a special case of a broader class of MDPs, where their transition dynamics decompose into a known component and an unknown but structured component. We establish general results for this abstract setting, which may be of independent interest.

#16 Solving adversarial examples requires solving exponential misalignment

著者: Alessandro Salvatore, Stanislav Fort, Surya Ganguli

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03507

要約:
Adversarial attacks - input perturbations imperceptible to humans that fool neural networks - remain both a persistent failure mode in machine learning, and a phenomenon with mysterious origins. To shed light, we define and analyze a network's perceptual manifold (PM) for a class concept as the space of all inputs confidently assigned to that class by the network. We find, strikingly, that the dimensionalities of neural network PMs are orders of magnitude higher than those of natural human concepts. Since volume typically grows exponentially with dimension, this suggests exponential misalignment between machines and humans, with exponentially many inputs confidently assigned to concepts by machines but not humans. Furthermore, this provides a natural geometric hypothesis for the origin of adversarial examples: because a network's PM fills such a large region of input space, any input will be very close to any class concept's PM. Our hypothesis thus suggests that adversarial robustness cannot be attained without dimensional alignment of machine and human PMs, and therefore makes strong predictions: both robust accuracy and distance to any PM should be negatively correlated with the PM dimension. We confirmed these predictions across 18 different networks of varying robust accuracy. Crucially, we find even the most robust networks are still exponentially misaligned, and only the few PMs whose dimensionality approaches that of human concepts exhibit alignment to human perception. Our results connect the fields of alignment and adversarial examples, and suggest the curse of high dimensionality of machine PMs is a major impediment to adversarial robustness.

#17 Controllable Generative Sandbox for Causal Inference

著者: Qi Zhang, Harsh Parikh, Ashley Naimi, Razieh Nabi, Christopher Kim, Timothy Lash

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03587

要約:
Method validation and study design in causal inference rely on synthetic data with known counterfactuals. Existing simulators trade off distributional realism, the ability to capture mixed-type and multimodal tabular data, against causal controllability, including explicit control over overlap, unmeasured confounding, and treatment effect heterogeneity. We introduce CausalMix, a variational generative framework that closes this gap by coupling a mixture of Gaussian latent priors with data-type-specific decoders for continuous, binary, and categorical variables. The model incorporates explicit causal controls: an overlap regularizer shaping propensity-score distributions, alongside direct parameterizations of confounding strength and effect heterogeneity. This unified objective preserves fidelity to the observed data while enabling factorial manipulation of causal mechanisms, allowing overlap, confounding strength, and treatment effect heterogeneity to be varied independently at design time. Across benchmarks, CausalMix achieves state-of-the-art distributional metrics on mixed-type tables while providing stable, fine-grained causal control. We demonstrate practical utility in a comparative safety study of metastatic castration-resistant prostate cancer treatments, using CausalMix to compare estimators under calibrated data-generating processes, tune hyperparameters, and conduct simulation-based power analyses under targeted treatment effect heterogeneity scenarios.

#18 Extending Neural Operators: Robust Handling of Functions Beyond the Training Set

著者: Blaine Quackenbush, Paul J. Atzberger

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03621

要約:
We develop a rigorous framework for extending neural operators to handle out-of-distribution input functions. We leverage kernel approximation techniques and provide theory for characterizing the input-output function spaces in terms of Reproducing Kernel Hilbert Spaces (RKHSs). We provide theorems on the requirements for reliable extensions and their predicted approximation accuracy. We also establish formal relationships between specific kernel choices and their corresponding Sobolev Native Spaces. This connection further allows the extended neural operators to reliably capture not only function values but also their derivatives. Our methods are empirically validated through the solution of elliptic partial differential equations (PDEs) involving operators on manifolds having point-cloud representations and handling geometric contributions. We report results on key factors impacting the accuracy and computational performance of the extension approaches.

#19 A Stein Identity for q-Gaussians with Bounded Support

著者: Sophia Sklaviadis, Thomas Moellenhoff, Andre F. T. Martins, Mario A. T. Figueiredo, Mohammad Emtiyaz Khan

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03673

要約:
Stein's identity is a fundamental tool in machine learning with applications in generative models, stochastic optimization, and other problems involving gradients of expectations under Gaussian distributions. Less attention has been paid to problems with non-Gaussian expectations. Here, we consider the class of bounded-support $q$-Gaussians and derive a new Stein identity leading to gradient estimators which have nearly identical forms to the Gaussian ones, and which are similarly easy to implement. We do this by extending the previous results of Landsman, Vanduffel, and Yao (2013) to prove new Bonnet- and Price-type theorems for q-Gaussians. We also simplify their forms by using escort distributions. Our experiments show that bounded-support distributions can reduce the variance of gradient estimators, which can potentially be useful for Bayesian deep learning and sharpness-aware minimization. Overall, our work simplifies the application of Stein's identity for an important class of non-Gaussian distributions.

#20 Inverse Contextual Bandits without Rewards: Learning from a Non-Stationary Learner via Suffix Imitation

著者: Yuqi Kong, Xiao Zhang, Weiran Shen

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03778

要約:
We study the Inverse Contextual Bandit (ICB) problem, in which a learner seeks to optimize a policy while an observer, who cannot access the learner's rewards and only observes actions, aims to recover the underlying problem parameters. During the learning process, the learner's behavior naturally transitions from exploration to exploitation, resulting in non-stationary action data that poses significant challenges for the observer. To address this issue, we propose a simple and effective framework called Two-Phase Suffix Imitation. The framework discards data from an initial burn-in phase and performs empirical risk minimization using only data from a subsequent imitation phase. We derive a predictive decision loss bound that explicitly characterizes the bias-variance trade-off induced by the choice of burn-in length. Despite the severe information deficit, we show that a reward-free observer can achieve a convergence rate of $\tilde O(1/\sqrt{N})$, matching the asymptotic efficiency of a fully reward-aware learner. This result demonstrates that a passive observer can effectively uncover the optimal policy from actions alone, attaining performance comparable to that of the learner itself.

#21 Direct Bayesian Additive Regression Trees for Conditional Average Treatment Effects in Regression Discontinuity Designs

著者: Daisuke Kondo, Shonosuke Sugasawa

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03819

要約:
Regression discontinuity designs (RDD) are widely used for causal inference. In many empirical applications, treatment effects vary substantially with covariates, and ignoring such heterogeneity can lead to misleading conclusions, which motivates flexible modeling of heterogeneous treatment effects in RDD. To this end, we propose a Bayesian nonparametric approach to estimating heterogeneous treatment effects based on Bayesian Additive Regression Trees (BART). The key feature of our method lies in adopting a general Bayesian framework using a pseudo-model defined through a loss function for fitting local linear models around the cutoff, which gives direct modeling of heterogeneous treatment effects by BART. Optimal selection of the bandwidth parameter for the local model is implemented using the Hyv\"arinen score. Through numerical experiments, we demonstrate that the proposed approach flexibly captures complicated structures of heterogeneous treatment effects as a function of covariates.

#22 Hierarchical Inference and Closure Learning via Adaptive Surrogates for ODEs and PDEs

著者: Pengyu Zhang, Arnaud Vadeboncoeur, Alex Glyn-Davies, Mark Girolami

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03922

要約:
Inverse problems are the task of calibrating models to match data. They play a pivotal role in diverse engineering applications by allowing practitioners to align models with reality. In many applications, engineers and scientists do not have a complete picture of i) the detailed properties of a system (such as material properties, geometry, initial conditions, etc.); ii) the complete laws describing all dynamics at play (such as friction laws, complicated damping phenomena, and general nonlinear interactions). In this paper, we develop a principled methodology for leveraging data from collections of distinct yet related physical systems to jointly estimate the individual model parameters of each system, and learn the shared unknown dynamics in the form of an ML-based closure model. To robustly infer the unknown parameters for each system, we employ a hierarchical Bayesian framework, which allows for the joint inference of multiple systems and their population-level statistics. To learn the closures, we use a maximum marginal likelihood estimate of a neural network embeded within the ODE/PDE formulation of the problem. To realize this framework we utilize the ensemble Metropolis-Adjusted Langevin Algorithm (MALA) for stable and efficient sampling. To mitigate the computational bottleneck of repetitive forward evaluations in solving inverse problems, we introduce a bilevel optimization strategy to simultaneously train a surrogate forward model alongside the inference. Within this framework, we evaluate and compare distinct surrogate architectures, specifically Fourier Neural Operators (FNO) and parametric Physics-Informed Neural Network (PINNs).

#23 Fixed-Budget Constrained Best Arm Identification in Grouped Bandits

著者: Raunak Mukherjee (Indian Institute of Technology, Bombay), Sharayu Moharir (Indian Institute of Technology, Bombay)

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.04007

要約:
We study fixed budget constrained best-arm identification in grouped bandits, where each arm consists of multiple independent attributes with stochastic rewards. An arm is considered feasible only if all its attributes' means are above a given threshold. The aim is to find the feasible arm with the largest overall mean. We first derive a lower bound on the error probability for any algorithm on this setting. We then propose Feasibility Constrained Successive Rejects (FCSR), a novel algorithm that identifies the best arm while ensuring feasibility. We show it attains optimal dependence on problem parameters up to constant factors in the exponent. Empirically, FCSR outperforms natural baselines while preserving feasibility guarantees.

#24 Testing Full Mediation of Treatment Effects and the Identifiability of Causal Mechanisms

著者: Martin Huber, Kevin Kloiber, Luk\'a\v{s} Laff\'ers

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.04109

要約:
In causal analysis, understanding the causal mechanisms through which an intervention or treatment affects an outcome is often of central interest. We propose a test to evaluate (i) whether the causal effect of a treatment that is randomly assigned conditional on covariates is fully mediated by, or operates exclusively through, observed intermediate outcomes (referred to as mediators or surrogate outcomes), and (ii) whether the various causal mechanisms operating through different mediators are identifiable conditional on covariates. We demonstrate that if both full mediation and identification of causal mechanisms hold, then the conditionally random treatment is conditionally independent of the outcome given the mediators and covariates. Furthermore, we extend our framework to settings with non-randomly assigned treatments. We show that, in this case, full mediation remains testable, while identification of causal mechanisms is no longer guaranteed. We propose a double machine learning framework for implementing the test that can incorporate high-dimensional covariates and is root-n consistent and asymptotically normal under specific regularity conditions. We also present a simulation study demonstrating good finite-sample performance of our method, along with two empirical applications revisiting randomized experiments on maternal mental health and social norms.

#25 Statistical Inference for Score Decompositions

著者: Timo Dimitriadis, Marius Puke

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.04275

要約:
We introduce inference methods for score decompositions, which partition scoring functions for predictive assessment into three interpretable components: miscalibration, discrimination, and uncertainty. Our estimation and inference relies on a linear recalibration of the forecasts, which is applicable to general multi-step ahead point forecasts such as means and quantiles due to its validity for both smooth and non-smooth scoring functions. This approach ensures desirable finite-sample properties, enables asymptotic inference, and establishes a direct connection to the classical Mincer-Zarnowitz regression. The resulting inference framework facilitates tests for equal forecast calibration or discrimination, which yield three key advantages. They enhance the information content of predictive ability tests by decomposing scores, deliver higher statistical power in certain scenarios, and formally connect scoring-function-based evaluation to traditional calibration tests, such as financial backtests. Applications demonstrate the method's utility. We find that for survey inflation forecasts, discrimination abilities can differ significantly even when overall predictive ability does not. In an application to financial risk models, our tests provide deeper insights into the calibration and information content of volatility and Value-at-Risk forecasts. By disentangling forecast accuracy from backtest performance, the method exposes critical shortcomings in current banking regulation.

#26 PTOPOFL: Privacy-Preserving Personalised Federated Learning via Persistent Homology

privacy

著者: Kelly L Vomo-Donfack, Adryel Hoszu, Gr\'egory Ginot, Ian Morilla

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.04323

要約:
Federated learning (FL) faces two structural tensions: gradient sharing enables data-reconstruction attacks, while non-IID client distributions degrade aggregation quality. We introduce PTOPOFL, a framework that addresses both challenges simultaneously by replacing gradient communication with topological descriptors derived from persistent homology (PH). Clients transmit only 48-dimensional PH feature vectors-compact shape summaries whose many-to-one structure makes inversion provably ill-posed-rather than model gradients. The server performs topology-guided personalised aggregation: clients are clustered by Wasserstein similarity between their PH diagrams, intra-cluster models are topology-weighted,and clusters are blended with a global consensus. We prove an information-contraction theorem showing that PH descriptors leak strictly less mutual information per sample than gradients under strongly convex loss functions, and we establish linear convergence of the Wasserstein-weighted aggregation scheme with an error floor strictly smaller than FedAvg. Evaluated against FedAvg, FedProx, SCAFFOLD, and pFedMe on a non-IID healthcare scenario (8 hospitals, 2 adversarial) and a pathological benchmark (10 clients), PTOPOFL achieves AUC 0.841 and 0.910 respectively-the highest in both settings-while reducing reconstruction risk by a factor of 4.5 relative to gradient sharing. Code is publicly available at https://github.com/MorillaLab/TopoFederatedL and data at https://doi.org/10.5281/zenodo.18827595.

#27 Sample-Optimal Locally Private Hypothesis Selection and the Provable Benefits of Interactivity

privacy

著者: Alireza F. Pour, Hassan Ashtiani, Shahab Asoodeh

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2312.05645

要約:
We study the problem of hypothesis selection under the constraint of local differential privacy. Given a class $\mathcal{F}$ of $k$ distributions and a set of i.i.d. samples from an unknown distribution $h$, the goal of hypothesis selection is to pick a distribution $\hat{f}$ whose total variation distance to $h$ is comparable with the best distribution in $\mathcal{F}$ (with high probability). We devise an $\varepsilon$-locally-differentially-private ($\varepsilon$-LDP) algorithm that uses $\Theta\left(\frac{k}{\alpha^2\min \{\varepsilon^2,1\}}\right)$ samples to guarantee that $d_{TV}(h,\hat{f})\leq \alpha + 9 \min_{f\in \mathcal{F}}d_{TV}(h,f)$ with high probability. This sample complexity is optimal for $\varepsilon<1$, matching the lower bound of Gopi et al. (2020). All previously known algorithms for this problem required $\Omega\left(\frac{k\log k}{\alpha^2\min \{ \varepsilon^2 ,1\}} \right)$ samples to work. Moreover, our result demonstrates the power of interaction for $\varepsilon$-LDP hypothesis selection. Namely, it breaks the known lower bound of $\Omega\left(\frac{k\log k}{\alpha^2\min \{ \varepsilon^2 ,1\}} \right)$ for the sample complexity of non-interactive hypothesis selection. Our algorithm breaks this barrier using only $\Theta(\log \log k)$ rounds of interaction. To prove our results, we define the notion of \emph{critical queries} for a Statistical Query Algorithm (SQA) which may be of independent interest. Informally, an SQA is said to use a small number of critical queries if its success relies on the accuracy of only a small number of queries it asks. We then design an LDP algorithm that uses a smaller number of critical queries.

#28 Low-Rank Contextual Reinforcement Learning from Heterogeneous Human Feedback

著者: Seong Jin Lee, Will Wei Sun, Yufeng Liu

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2412.19436

要約:
Reinforcement learning from human feedback (RLHF) has become a cornerstone for aligning large language models with human preferences. However, the heterogeneity of human feedback, driven by diverse individual contexts and preferences, poses significant challenges for reward learning. To address this, we propose a Low-rank Contextual RLHF (LoCo-RLHF) framework that integrates contextual information to better model heterogeneous feedback while maintaining computational efficiency. Our approach builds on a contextual preference model, leveraging the intrinsic low-rank structure of the interaction between user contexts and query-answer pairs to mitigate the high dimensionality of feature representations. Furthermore, we address the challenge of distributional shifts in feedback through our Pessimism in Reduced Subspace (PRS) policy, inspired by pessimistic offline reinforcement learning techniques. We theoretically demonstrate that our policy achieves a tighter sub-optimality gap compared to existing methods. Extensive experiments validate the effectiveness of LoCo-RLHF, showcasing its superior performance in personalized RLHF settings and its robustness to distribution shifts.

#29 Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning

著者: Korel Gundem, Juncheng Dong, Dennis Zhang, Vahid Tarokh, Zhengling Qi

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.23783

要約:
In-Context Learning (ICL) allows Large Language Models (LLMs) to adapt to new tasks with just a few examples, but their predictions often suffer from systematic biases, leading to unstable performance in classification. While calibration techniques are proposed to mitigate these biases, we show that, in the logit space, many of these methods are equivalent to merely shifting the LLM's decision boundary without having the ability to alter its orientation. This proves inadequate when biases cause the LLM to be severely misaligned. To address these limitations and provide a unifying framework, we propose Supervised Calibration (SC), a loss-minimization-based framework, which learns an optimal, per-class affine transformation of LLM's predictive probabilities in the logit space without requiring external data beyond the context. By using a more expressive functional class, SC not only subsumes many existing calibration methods in ICL as special cases but also enables the ability of altering and even completely reversing the orientation of the LLM's decision boundary. Furthermore, SC's loss-based nature facilitates the seamless integration of two purpose-built regularization techniques, context-invariance and directional trust-region regularizers. The former is designed to tackle the instability issue in ICL, while the latter is to control the degree of calibration. Finally, SC delivers state-of-the-art performance over calibration baselines in the 4-shot, 8-shot, and 16-shot settings across all nine datasets for Mistral-7B-Instruct-v0.3, Llama-2-7B-chat, and Qwen2-7B-Instruct.

#30 Finite-Dimensional Gaussian Approximation for Deep Neural Networks: Universality in Random Weights

著者: Krishnakumar Balasubramanian, Nathan Ross

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2507.12686

要約:
We study the Finite-Dimensional Distributions (FDDs) of deep neural networks with randomly initialized weights that have finite-order moments. Specifically, we establish Gaussian approximation bounds in the Wasserstein-$1$ norm between the FDDs and their Gaussian limit assuming a Lipschitz activation function and allowing the layer widths to grow to infinity at arbitrary relative rates. In the special case where all widths are proportional to a common scale parameter $n$ and there are $L-1$ hidden layers, we obtain convergence rates of order $n^{-({1}/{6})^{L-1} + \epsilon}$, for any $\epsilon > 0$.

#31 Best-of-$\infty$ -- Asymptotic Performance of Test-Time LLM Ensembling

著者: Junpei Komiyama, Daisuke Oba, Masafumi Oyamada

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.21091

要約:
We study best-of-$N$ for large language models (LLMs) where the selection is based on majority voting. In particular, we analyze the limit $N \to \infty$, which we denote as \boinflower. While this approach achieves impressive performance in the limit, it requires an infinite test-time budget. To address this, we propose an adaptive generation scheme that selects $N$ based on answer agreement, thereby efficiently allocating inference-time computation. Beyond adaptivity, we extend the framework to weighted ensembles of multiple LLMs, showing that such mixtures can outperform any individual model. The optimal ensemble weighting is formulated and efficiently computed as a mixed-integer linear program. Extensive experiments demonstrate the effectiveness of our approach.

#32 Implicit Bias of the JKO Scheme

著者: Peter Halmos, Boris Hanin

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14827

要約:
Wasserstein gradient flow provides a general framework for minimizing an energy functional $J$ over the space of probability measures on a Riemannian manifold $(M,g)$. Its canonical time-discretization, the Jordan-Kinderlehrer-Otto (JKO) scheme, produces for any step size $\eta>0$ a sequence of probability distributions $\rho_k^\eta$ that approximate to first order in $\eta$ Wasserstein gradient flow on $J$. But the JKO scheme also has many other remarkable properties not shared by other first order integrators, e.g. it preserves energy dissipation and exhibits unconditional stability for $\lambda$-geodesically convex functionals $J$. To better understand the JKO scheme we characterize its implicit bias at second order in $\eta$. We show that $\rho_k^\eta$ are approximated to order $\eta^2$ by Wasserstein gradient flow on a modified energy \[ J^{\eta}(\rho) = J(\rho) - \frac{\eta}{4}\int_M \Big\lVert \nabla_g \frac{\delta J}{\delta \rho} (\rho) \Big\rVert_{2}^{2} \,\rho(dx), \] obtained by subtracting from $J$ the squared metric curvature of $J$ times $\eta/4$. The JKO scheme therefore adds at second order in $\eta$ a deceleration in directions where the metric curvature of $J$ is rapidly changing. This corresponds to canonical implicit biases for common functionals: for entropy the implicit bias is the Fisher information, for KL-divergence it is the Fisher-Hyv{\"a}rinen divergence, and for Riemannian gradient descent it is the kinetic energy in the metric $g$. To understand the differences between minimizing $J$ and $J^\eta$ we study JKO-Flow, Wasserstein gradient flow on $J^\eta$, in several simple numerical examples. These include exactly solvable Langevin dynamics on the Bures-Wasserstein space and Langevin sampling from a quartic potential in 1D.

#33 Synthetic Augmentation in Imbalanced Learning: When It Helps, When It Hurts, and How Much to Add

著者: Zhengchi Ma, Anru R. Zhang

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.16120

要約:
Imbalanced classification often causes standard training procedures to prioritize the majority class and perform poorly on rare but important cases. A classic and widely used remedy is to augment the minority class with synthetic samples, but two basic questions remain under-resolved: when does synthetic augmentation actually help, and how many synthetic samples should be generated? We develop a unified statistical framework for synthetic augmentation in imbalanced learning, studying models trained on imbalanced data augmented with synthetic minority samples. Our theory shows that synthetic data is not always beneficial. In a "local symmetry" regime, imbalance is not the dominant source of error, so adding synthetic samples cannot improve learning rates and can even degrade performance by amplifying generator mismatch. When augmentation can help ("local asymmetry"), the optimal synthetic size depends on generator accuracy and on whether the generator's residual mismatch is directionally aligned with the intrinsic majority-minority shift. This structure can make the best synthetic size deviate from naive full balancing. Practically, we recommend Validation-Tuned Synthetic Size (VTSS): select the synthetic size by minimizing balanced validation loss over a range centered near the fully balanced baseline, while allowing meaningful departures. Extensive simulations and real data analysis further support our findings.

#34 Conformal Graph Prediction with Z-Gromov Wasserstein Distances

著者: Gabriel Melo, Thibaut de Saivre, Anna Calissano, Florence d'Alch\'e-Buc

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.02460

要約:
Supervised graph prediction addresses regression problems where the outputs are structured graphs. Although several approaches exist for graph-valued prediction, principled uncertainty quantification remains limited. We propose a conformal prediction framework for graph-valued outputs, providing distribution-free coverage guarantees in structured output spaces. Our method defines nonconformity via the Z-Gromov-Wasserstein distance, instantiated in practice through Fused Gromov-Wasserstein (FGW), enabling permutation invariant comparison between predicted and candidate graphs. To obtain adaptive prediction sets, we introduce Score Conformalized Quantile Regression (SCQR), an extension of Conformalized Quantile Regression (CQR) to handle complex output spaces such as graph-valued outputs. We evaluate the proposed approach on a synthetic task and a real problem of molecule identification.

#35 A Robust Multi-Item Auction Design with Statistical Learning

著者: Jiale Han, Xiaowu Dai

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2302.00941

要約:
We propose a novel statistical learning method for multi-item auctions that incorporates credible intervals. Our approach employs nonparametric density estimation to estimate credible intervals for bidder types based on historical data. We introduce two new strategies that leverage these credible intervals to reduce the time cost of implementing auctions. The first strategy screens potential winners' value regions within the credible intervals, while the second strategy simplifies the type distribution when the length of the interval is below a threshold value. These strategies are easy to implement and ensure fairness, dominant-strategy incentive compatibility, and dominant-strategy individual rationality with a high probability, while simultaneously reducing implementation costs. We demonstrate the effectiveness of our strategies using the Vickrey-Clarke-Groves mechanism and evaluate their performance through simulation experiments. Our results show that the proposed strategies consistently outperform alternative methods, achieving both revenue maximization and cost reduction objectives.

#36 List Sample Compression and Uniform Convergence

著者: Steve Hanneke, Shay Moran, Tom Waknine

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2403.10889

要約:
List learning is a variant of supervised classification where the learner outputs multiple plausible labels for each instance rather than just one. We investigate classical principles related to generalization within the context of list learning. Our primary goal is to determine whether classical principles in the PAC setting retain their applicability in the domain of list PAC learning. We focus on uniform convergence (which is the basis of Empirical Risk Minimization) and on sample compression (which is a powerful manifestation of Occam's Razor). In classical PAC learning, both uniform convergence and sample compression satisfy a form of `completeness': whenever a class is learnable, it can also be learned by a learning rule that adheres to these principles. We ask whether the same completeness holds true in the list learning setting. We show that uniform convergence remains equivalent to learnability in the list PAC learning setting. In contrast, our findings reveal surprising results regarding sample compression: we prove that when the label space is $Y=\{0,1,2\}$, then there are 2-list-learnable classes that cannot be compressed. This refutes the list version of the sample compression conjecture by Littlestone and Warmuth (1986). We prove an even stronger impossibility result, showing that there are $2$-list-learnable classes that cannot be compressed even when the reconstructed function can work with lists of arbitrarily large size. We prove a similar result for (1-list) PAC learnable classes when the label space is unbounded. This generalizes a recent result by arXiv:2308.06424.

#37 Parameter identification in linear non-Gaussian causal models under general confounding

著者: Daniele Tramontano, Mathias Drton, Jalal Etesami

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2405.20856

要約:
Linear non-Gaussian causal models postulate that each random variable is a linear function of parent variables and non-Gaussian exogenous error terms. We study identification of the linear coefficients when such models contain latent variables. Our focus is on the commonly studied acyclic setting, where each model corresponds to a directed acyclic graph (DAG). For this case, prior literature has demonstrated that connections to overcomplete independent component analysis yield effective criteria to decide parameter identifiability in latent variable models. However, this connection is based on the assumption that the observed variables linearly depend on the latent variables. Departing from this assumption, we treat models that allow for arbitrary non-linear latent confounding. Our main result is a graphical criterion that is necessary and sufficient for deciding the generic identifiability of direct causal effects. Moreover, we provide an algorithmic implementation of the criterion with a run time that is polynomial in the number of observed variables. Finally, we report on estimation heuristics based on the identification result and explore a generalization to models with feedback loops.

#38 Tracking solutions of time-varying variational inequalities

著者: H\'edi Hadiji (PUC), Sarah Sachs (PUC), Crist\'obal Guzm\'an (PUC)

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2406.14059

要約:
Tracking the solution of time-varying variational inequalities is an important problem with applications in game theory, optimization, and machine learning. Existing work considers time-varying games or time-varying optimization problems. For strongly convex optimization problems or strongly monotone games, these results provide tracking guarantees under the assumption that the variation of the time-varying problem is restrained, that is, problems with a sublinear solution path. In this work we extend existing results in two ways: In our first result, we provide tracking bounds for (1) variational inequalities with a sublinear solution path but not necessarily monotone functions, and (2) for periodic time-varying variational inequalities that do not necessarily have a sublinear solution path-length. Our second main contribution is an extensive study of the convergence behavior and trajectory of discrete dynamical systems of periodic time-varying VI. We show that these systems can exhibit provably chaotic behavior or can converge to the solution. Finally, we illustrate our theoretical results with experiments.

#39 DCENWCNet: A Deep CNN Ensemble Network for White Blood Cell Classification with LIME-Based Explainability

著者: Sibasish Dhibar

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2502.05459

要約:
White blood cells (WBC) are important parts of our immune system, and they protect our body against infections by eliminating viruses, bacteria, parasites and fungi. The number of WBC types and the total number of WBCs provide important information about our health status. A traditional method, convolutional neural networks (CNN), a deep learning architecture, can classify the blood cell from a part of an object and perform object recognition. Various CNN models exhibit potential; however, their development often involves ad-hoc processes that neglect unnecessary layers, leading to issues with unbalanced datasets and insufficient data augmentation. To address these challenges, we propose a novel ensemble approach that integrates three CNN architectures, each uniquely configured with different dropout and max-pooling layer settings to enhance feature learning. This ensemble model, named DCENWCNet, effectively balances the bias-variance trade-off. When evaluated on the widely recognized Rabbin-WBC dataset, our model outperforms existing state-of-the-art networks, achieving highest mean accuracy. Additionally, it demonstrates superior performance in precision, recall, F1-score, and Area Under the ROC Curve (AUC) across all categories. To delve deeper into the interpretability of classifiers, we employ reliable post-hoc explanation techniques, including Local Interpretable Model-Agnostic Explanations (LIME). These methods approximate the behavior of a black-box model by elucidating the relationships between feature values and predictions. Interpretable results enable users to comprehend and validate the model's predictions, thereby increasing their confidence in the automated diagnosis.

#40 Scalable physics-informed deep generative model for solving forward and inverse stochastic differential equations

privacy

著者: Shaoqian Zhou, Wen You, Ling Guo, Xuhui Meng

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2503.18012

要約:
Physics-informed deep learning approaches have been developed to solve forward and inverse stochastic differential equation (SDE) problems with high-dimensional stochastic space. However, the existing deep learning models have difficulties solving SDEs with high-dimensional spatial space. In the present study, we propose a scalable physics-informed deep generative model (sPI-GeM), which is capable of solving SDE problems with both high-dimensional stochastic and spatial space. The sPI-GeM consists of two deep learning models, i.e., (1) physics-informed basis networks (PI-BasisNet), which are used to learn the basis functions as well as the coefficients given data on a certain stochastic process or random field, and (2) physics-informed deep generative model (PI-GeM), which learns the distribution over the coefficients obtained from the PI-BasisNet. The new samples for the learned stochastic process can then be obtained using the inner product between the output of the generator and the basis functions from the trained PI-BasisNet. The sPI-GeM addresses the scalability in the spatial space in a similar way as in the widely used dimensionality reduction technique, i.e., principal component analysis (PCA). A series of numerical experiments, including approximation of Gaussian and non-Gaussian stochastic processes, forward and inverse SDE problems, are performed to demonstrate the accuracy of the proposed model. Furthermore, we also show the scalability of the sPI-GeM in both the stochastic and spatial space using an example of a forward SDE problem with 38- and 20-dimension stochastic and spatial space, respectively.

#41 Simulation-based inference for stochastic nonlinear mixed-effects models with applications in systems biology

著者: Henrik H\"aggstr\"om, Sebastian Persson, Marija Cvijovic, Umberto Picchini

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2504.11279

要約:
The analysis of data from multiple experiments, such as observations of several individuals, is commonly approached using mixed-effects models, which account for variation between individuals through hierarchical representations. This makes mixed-effects models widely applied in fields such as biology, pharmacokinetics, and sociology. In this work, we propose a novel methodology for scalable Bayesian inference in hierarchical mixed-effects models. Our framework first constructs amortized approximations of the likelihood and the posterior distribution, which are then rapidly refined for each individual dataset, to ultimately approximate the parameters posterior across many individuals. The framework is easily trainable, as it uses mixtures of experts but without neural networks, leading to parsimonious yet expressive surrogate models of the likelihood and the posterior. We demonstrate the effectiveness of our methodology using challenging stochastic models, such as mixed-effects stochastic differential equations emerging in systems biology-driven problems. However, the approach is broadly applicable and can accommodate both stochastic and deterministic models. We show that our approach can seamlessly handle inference for many parameters. Additionally, we applied our method to a real-data case study of mRNA transfection. When compared to exact pseudomarginal Bayesian inference, our approach proved to be both fast and competitive in terms of statistical accuracy.

#42 Optimal Best-Arm Identification under Fixed Confidence with Multiple Optima

著者: Lan V. Truong

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.15643

要約:
We study best-arm identification in stochastic multi-armed bandits under the fixed-confidence setting, focusing on instances with multiple optimal arms. Unlike prior work that addresses the unknown-number-of-optimal-arms case, we consider the setting where the number of optimal arms is known in advance. We derive a new information-theoretic lower bound on the expected sample complexity that leverages this structural knowledge and is strictly tighter than previous bounds. Building on the Track-and-Stop algorithm, we propose a modified, tie-aware stopping rule and prove that it achieves asymptotic instance-optimality, matching the new lower bound. Our results provide the first formal guarantee of optimality for Track-and-Stop in multi-optimal settings with known cardinality, offering both theoretical insights and practical guidance for efficiently identifying any optimal arm.

#43 Convergence, Sticking and Escape: Stochastic Dynamics Near Critical Points in SGD

著者: Dmitry Dudukalov, Artem Logachov, Vladimir Lotov, Timofei Prasolov, Evgeny Prokopenko, Anton Tarasenko

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.18535

要約:
We study the convergence properties and escape dynamics of Stochastic Gradient Descent (SGD) in one-dimensional landscapes, separately considering infinite- and finite-variance noise. Our main focus is to identify the time scales on which SGD reliably moves from an initial point to the local minimum in the same ''basin''. Under suitable conditions on the noise distribution, we prove that SGD converges to the basin's minimum unless the initial point lies too close to a local maximum. In that near-maximum scenario, we show that SGD can linger for a long time in its neighborhood. For initial points near a ''sharp'' maximum, we show that SGD does not remain stuck there, and we provide results to estimate the probability that it will reach each of the two neighboring minima. Overall, our findings present a nuanced view of SGD's transitions between local maxima and minima, influenced by both noise characteristics and the underlying function geometry.

#44 Honesty in Causal Forests: When It Helps and When It Hurts

著者: Yanfang Hou, Carlos Fern\'andez-Lor\'ia

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2506.13107

要約:
Causal forests estimate how treatment effects vary across individuals, guiding personalized interventions in areas like marketing, operations, and public policy. A standard modeling practice with this method is honest estimation: dividing the data into two samples, one to define subgroups and another to estimate treatment effects within them. This is intended to reduce overfitting and is the default in many software packages. But is it the right choice? In this paper, we show that honest estimation can reduce the accuracy of individual-level treatment effect estimates, especially when there are substantial differences in how individuals respond to treatment, and the data is rich enough to uncover those differences. The core issue is a classic bias-variance trade-off: honesty lowers the risk of overfitting but increases the risk of underfitting, because it limits the data available to detect and model heterogeneity. Across 7,500 benchmark datasets, we find that the cost of using honesty by default can be as high as requiring 25% more data to match the performance of models trained without it. We argue that honesty is best understood as a form of regularization and its use should be guided by application goals and empirical evaluation, not adopted reflexively.

#45 Federated ADMM from Bayesian Duality

著者: Thomas M\"ollenhoff, Siddharth Swaroop, Finale Doshi-Velez, Mohammad Emtiyaz Khan

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2506.13150

要約:
We propose a new Bayesian approach to generalize the federated Alternating Direction Method of Multipliers (ADMM). We show that the solutions of variational-Bayesian (VB) objectives are associated with a duality structure that not only resembles the structure of ADMM's fixed-points but also generalizes it. For example, ADMM-like updates are recovered when the VB objective is optimized over the isotropic-Gaussian family, and new non-trivial extensions are obtained for other exponential-family distributions. These extensions include a Newton-like variant that converges in one step on quadratic objectives and an Adam-like variant that yields up to 7% accuracy boosts for deep heterogeneous cases. Our work opens a new Bayesian way to generalize ADMM and other primal-dual methods.

#46 Buzz, Choose, Forget: A Meta-Bandit Framework for Bee-Like Decision Making

著者: Emmanuelle Claeys, Elena Kerjean, Jean-Michel Loubes

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.16462

要約:
This work introduces MAYA, a sequential imitation learning model based on multi-armed bandits, designed to reproduce and predict individual bees' decisions in contextualized foraging tasks. The model accounts for bees' limited memory through a temporal window $\tau$, whose optimal value is around 7 trials, with a slight dependence on weather conditions. Experimental results on real, simulated, and complementary (mice) datasets show that MAYA (particularly with the Wasserstein distance) outperforms imitation baselines and classical statistical models, while providing interpretability of individual learning strategies and enabling the inference of realistic trajectories for prospective ecological applications.

#47 Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime

著者: Beomhan Baek, Minhak Song, Chulhee Yun

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.26303

要約:
Adam [Kingma & Ba, 2015] is the de facto optimizer in deep learning, yet its theoretical understanding remains limited. Prior analyses show that Adam favors solutions aligned with $\ell_\infty$-geometry, but these results are restricted to the full-batch regime. In this work, we study the implicit bias of incremental Adam (using one sample per step) for logistic regression on linearly separable data, and show that its bias can deviate from the full-batch behavior. As an extreme example, we construct datasets on which incremental Adam provably converges to the $\ell_2$-max-margin classifier, in contrast to the $\ell_\infty$-max-margin bias of full-batch Adam. For general datasets, we characterize its bias using a proxy algorithm for the $\beta_2 \to 1$ limit. This proxy maximizes a data-adaptive Mahalanobis-norm margin, whose associated covariance matrix is determined by a data-dependent dual fixed-point formulation. We further present concrete datasets where this bias reduces to the standard $\ell_2$- and $\ell_\infty$-max-margin classifiers. As a counterpoint, we prove that Signum [Bernstein et al., 2018] converges to the $\ell_\infty$-max-margin classifier for any batch size. Overall, our results highlight that the implicit bias of Adam crucially depends on both the batching scheme and the dataset, while Signum remains invariant.

#48 Learning under Distributional Drift: Prequential Reproducibility as an Intrinsic Statistical Resource

著者: Sofiya Zaichyk

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.13506

要約:
Statistical learning under distributional drift remains poorly characterized, especially in closed-loop settings where learning alters the data-generating law. We introduce an intrinsic drift budget $C_T$ that quantifies the cumulative information-geometric motion of the data distribution along the realized learner-environment trajectory, measured in Fisher-Rao distance (the Riemannian metric induced by Fisher information on a statistical manifold of data-generating laws). The budget decomposes this motion into exogenous change (environmental drift that would occur without intervention) and policy-sensitive feedback contributions (drift induced by the learner's actions through the closed loop). This yields a rate-based characterization: in prequential reproducibility bounds -- where performance on the realized stream is used to predict one-step-ahead performance under the next distribution -- the drift contribution enters through the average drift rate $C_T/T$, i.e., normalized cumulative Fisher-Rao motion per time step. We prove a drift--feedback bound of order $T^{-1/2} + C_T/T$ (up to a controlled second-order remainder) and establish a matching minimax lower bound on a canonical subclass, showing this dependence is tight up to constants. Consequently, when $C_T/T$ is nonnegligible, one-step-ahead reproducibility admits an irreducible accuracy floor of the same order. Finally, the framework places exogenous drift, adaptive data analysis, and performative feedback within a common geometric account of distributional motion.

#49 Transfer learning for functional linear regression via control variates

著者: Yuping Yang, Zhiyang Zhou

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.17217

要約:
Transfer learning (TL) has emerged as a powerful tool for improving estimation and prediction performance by leveraging information from related datasets, with the offset TL (O-TL) being a prevailing implementation. In this paper, we adapt the control-variates (CVS) method for TL and develop CVS-based estimators for scalar-on-function regression, one of the most fundamental models in functional data analysis. These estimators rely exclusively on dataset-specific summary statistics, thereby avoiding the pooling of subject-level data and remaining applicable in privacy-restricted or decentralized settings. We establish, for the first time, a theoretical connection between O-TL and CVS-based TL, showing that these two seemingly distinct TL strategies adjust local estimators in fundamentally similar ways. We further derive convergence rates that explicitly account for the unavoidable but typically overlooked smoothing error arising from discretely observed functional predictors, and clarify how similarity among covariance functions across datasets governs the performance of TL. Numerical studies support the theoretical findings and demonstrate that the proposed methods achieve competitive estimation and prediction performance compared with existing alternatives.

#50 Universal Coefficients and Mayer-Vietoris Sequence for Groupoid Homology

著者: Luciano Melodia

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2602.08998

要約:
We study homology of ample groupoids via the compactly supported Moore complex of the nerve. Let $A$ be a topological abelian group. For $n\ge 0$ set $C_n(\mathcal G;A) := C_c(\mathcal G_n,A)$ and define $\partial_n^A=\sum_{i=0}^n(-1)^i(d_i)_*$. This defines $H_n(\mathcal G;A)$. The theory is functorial for continuous \'etale homomorphisms. It is compatible with standard reductions, including restriction to saturated clopen subsets. In the ample setting it is invariant under Kakutani equivalence. We reprove Matui type long exact sequences and identify the comparison maps at chain level. For discrete $A$ we prove a natural universal coefficient short exact sequence $$0\to H_n(\mathcal G)\otimes_{\mathbb Z}A\xrightarrow{\ \iota_n^{\mathcal G}\ }H_n(\mathcal G;A)\xrightarrow{\ \kappa_n^{\mathcal G}\ }\operatorname{Tor}_1^{\mathbb Z}\bigl(H_{n-1}(\mathcal G),A\bigr)\to 0.$$ The key input is the chain level isomorphism $C_c(\mathcal G_n,\mathbb Z)\otimes_{\mathbb Z}A\cong C_c(\mathcal G_n,A)$, which reduces the groupoid statement to the classical algebraic UCT for the free complex $C_c(\mathcal G_\bullet,\mathbb Z)$. We also isolate the obstruction for non-discrete coefficients. For a locally compact totally disconnected Hausdorff space $X$ with a basis of compact open sets, the image of $\Phi_X:C_c(X,\mathbb Z)\otimes_{\mathbb Z}A\to C_c(X,A)$ is exactly the compactly supported functions with finite image. Thus $\Phi_X$ is surjective if and only if every $f\in C_c(X,A)$ has finite image, and for suitable $X$ one can produce compactly supported continuous maps $X\to A$ with infinite image. Finally, for a clopen saturated cover $\mathcal G_0=U_1\cup U_2$ we construct a short exact sequence of Moore complexes and derive a Mayer-Vietoris long exact sequence for $H_\bullet(\mathcal G;A)$ for explicit computations.

#51 Rich Insights from Cheap Signals: Efficient Evaluations via Tensor Factorization

著者: Felipe Maia Polo, Aida Nematzadeh, Virginia Aglietti, Adam Fisch, Isabela Albuquerque

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.02029

要約:
Moving beyond evaluations that collapse performance across heterogeneous prompts toward fine-grained evaluation at the prompt level, or within relatively homogeneous subsets, is necessary to diagnose generative models' strengths and weaknesses. Such fine-grained evaluations, however, suffer from a data bottleneck: human gold-standard labels are too costly at this scale, while automated ratings are often misaligned with human judgment. To resolve this challenge, we propose a novel statistical model based on tensor factorization that merges cheap autorater data with a limited set of human gold-standard labels. Specifically, our approach uses autorater scores to pretrain latent representations of prompts and generative models, and then aligns those pretrained representations to human preferences using a small calibration set. This sample-efficient methodology is robust to autorater quality, more accurately predicts human preferences on a per-prompt basis than standard baselines, and provides tight confidence intervals for key statistical parameters of interest. We also showcase the practical utility of our method by constructing granular leaderboards based on prompt qualities and by estimating model performance solely from autorater scores, eliminating the need for additional human annotations.

#52 Causal Learning Should Embrace the Wisdom of the Crowd

著者: Ryan Feng Lin, Yuantao Wei, Huiling Liao, Xiaoning Qian, Shuai Huang

公開日: Thu, 05 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.02678

要約:
Learning causal structures typically represented by directed acyclic graphs (DAGs) from observational data is notoriously challenging due to the combinatorial explosion of possible graphs and inherent ambiguities in observations. This paper argues that causal learning is now ready for the emergence of a new paradigm supported by rapidly advancing technologies, fulfilling the long-standing vision of leveraging human causal knowledge. This paradigm integrates scalable crowdsourcing platforms for data collection, interactive knowledge elicitation for expert opinion modeling, robust aggregation techniques for expert reconciliation, and large language model (LLM)-based simulation for augmenting AI-driven information acquisition. In this paper, we focus on DAG learning for causal discovery and frame the problem as a distributed decision-making task, recognizing that each participant (human expert or LLM agent) possesses fragmented and imperfect knowledge about different subsets of the variables of interest in the causal graph. By proposing a systematic framework to synthesize these insights, we aim to enable the recovery of a global causal structure unachievable by any individual agent alone. We advocate for a new research frontier and outline a comprehensive framework for new research thrusts that range from eliciting, modeling, aggregating, and optimizing human causal knowledge contributions.

stat.ML updates on arXiv.org

📋 論文タイトル一覧