arXiv論文一覧 - stat.ML updates on arXiv.org

#1 Unbounded Density Ratio Estimation and Its Application to Covariate Shift Adaptation

著者: Ren-Rui Liu, Jun Fan, Lei Shi, Zheng-Chu Guo

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.29725

要約:
This paper focuses on the problem of unbounded density ratio estimation -- an understudied yet critical challenge in statistical learning -- and its application to covariate shift adaptation. Much of the existing literature assumes that the density ratio is either uniformly bounded or unbounded but known exactly. These conditions are often violated in practice, creating a gap between theoretical guarantees and real-world applicability. In contrast, this work directly addresses unbounded density ratios and integrates them into importance weighting for effective covariate shift adaptation. We propose a three-step estimation method that leverages unlabeled data from both the source and target distributions: (1) estimating a relative density ratio; (2) applying a truncation operation to control its unboundedness; and (3) transforming the truncated estimate back into the standard density ratio. The estimated density ratio is then employed as importance weights for regression under covariate shift. We establish rigorous, non-asymptotic convergence guarantees for both the proposed density ratio estimator and the resulting regression function estimator, demonstrating optimal or near-optimal convergence rates. Our findings offer new theoretical insights into density ratio estimation and learning under covariate shift, extending classical learning theory to more practical and challenging scenarios.

#2 mlr3mbo: Bayesian Optimization in R

著者: Marc Becker, Lennart Schneider, Martin Binder, Lars Kotthoff, Bernd Bischl

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.29730

要約:
We present mlr3mbo, a comprehensive and modular toolbox for Bayesian optimization in R. mlr3mbo supports single- and multi-objective optimization, multi-point proposals, batch and asynchronous parallelization, input and output transformations, and robust error handling. While it can be used for many standard Bayesian optimization variants in applied settings, researchers can also construct custom BO algorithms from its flexible building blocks. In addition to an introduction to the software, its design principles, and its building blocks, the paper presents two extensive empirical evaluations of the software on the surrogate-based benchmark suite YAHPO Gym. To identify robust default configurations for both numeric and mixed-hierarchical optimization regimes, and to gain further insights into the respective impacts of individual settings, we run a coordinate descent search over the mlr3mbo configuration space and analyze its results. Furthermore, we demonstrate that mlr3mbo achieves state-of-the-art performance by benchmarking it against a wide range of optimizers, including HEBO, SMAC3, Ax, and Optuna.

#3 Symmetrizing Bregman Divergence on the Cone of Positive Definite Matrices: Which Mean to Use and Why

著者: Tushar Sial, Abhishek Halder

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.28917

要約:
This work uncovers variational principles behind symmetrizing the Bregman divergences induced by generic mirror maps over the cone of positive definite matrices. We show that computing the canonical means for this symmetrization can be posed as minimizing the desired symmetrized divergences over a set of mean functionals defined axiomatically to satisfy certain properties. For the forward symmetrization, we prove that the arithmetic mean over the primal space is canonical for any mirror map over the positive definite cone. For the reverse symmetrization, we show that the canonical mean is the arithmetic mean over the dual space, pulled back to the primal space. Applying this result to three common mirror maps used in practice, we show that the canonical means for reverse symmetrization, in those cases, turn out to be the arithmetic, log-Euclidean and harmonic means. Our results improve understanding of existing symmetrization practices in the literature, and can be seen as a navigational chart to help decide which mean to use when.

#4 Multi-fidelity approaches for general constrained Bayesian optimization with application to aircraft design

著者: Oihan Cordelier, Youssef Diouane, Nathalie Bartoli, Eric Laurendeau

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.28987

要約:
Aircraft design relies heavily on solving challenging and computationally expensive Multidisciplinary Design Optimization problems. In this context, there has been growing interest in multi-fidelity models for Bayesian optimization to improve the MDO process by balancing computational cost and accuracy through the combination of high- and low-fidelity simulation models, enabling efficient exploration of the design process at a minimal computational effort. In the existing literature, fidelity selection focuses only on the objective function to decide how to integrate multiple fidelity levels, balancing precision and computational cost using variance reduction criteria. In this work, we propose novel multi-fidelity selection strategies. Specifically, we demonstrate how incorporating information from both the objective and the constraints can further reduce computational costs without compromising the optimality of the solution. We validate the proposed multi-fidelity optimization strategy by applying it to four analytical test cases, showcasing its effectiveness. The proposed method is used to efficiently solve a challenging aircraft wing aero-structural design problem. The proposed setting uses a linear vortex lattice method and a finite element method for the aerodynamic and structural analysis respectively. We show that employing our proposed multi-fidelity approach leads to $86\%$ to $200\%$ more constraint compliant solutions given a limited budget compared to the state-of-the-art approach.

#5 Transfer Learning in Bayesian Optimization for Aircraft Design

著者: Ali Tfaily, Youssef Diouane, Nathalie Bartoli, Michael Kokkolaras

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.28999

要約:
The use of transfer learning within Bayesian optimization addresses the disadvantages of the so-called \textit{cold start} problem by using source data to aid in the optimization of a target problem. We present a method that leverages an ensemble of surrogate models using transfer learning and integrates it in a constrained Bayesian optimization framework. We identify challenges particular to aircraft design optimization related to heterogeneous design variables and constraints. We propose the use of a partial-least-squares dimension reduction algorithm to address design space heterogeneity, and a \textit{meta} data surrogate selection method to address constraint heterogeneity. Numerical benchmark problems and an aircraft conceptual design optimization problem are used to demonstrate the proposed methods. Results show significant improvement in convergence in early optimization iterations compared to standard Bayesian optimization, with improved prediction accuracy for both objective and constraint surrogate models.

#6 Concept frustration: Aligning human concepts and machine representations

著者: Enrico Parisini, Christopher J. Soelistyo, Ahab Isaac, Alessandro Barp, Christopher R. S. Banerji

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.29654

要約:
Aligning human-interpretable concepts with the internal representations learned by modern machine learning systems remains a central challenge for interpretable AI. We introduce a geometric framework for comparing supervised human concepts with unsupervised intermediate representations extracted from foundation model embeddings. Motivated by the role of conceptual leaps in scientific discovery, we formalise the notion of concept frustration: a contradiction that arises when an unobserved concept induces relationships between known concepts that cannot be made consistent within an existing ontology. We develop task-aligned similarity measures that detect concept frustration between supervised concept-based models and unsupervised representations derived from foundation models, and show that the phenomenon is detectable in task-aligned geometry while conventional Euclidean comparisons fail. Under a linear-Gaussian generative model we derive a closed-form expression for Bayes-optimal concept-based classifier accuracy, decomposing predictive signal into known-known, known-unknown and unknown-unknown contributions and identifying analytically where frustration affects performance. Experiments on synthetic data and real language and vision tasks demonstrate that frustration can be detected in foundation model representations and that incorporating a frustrating concept into an interpretable model reorganises the geometry of learned concept representations, to better align human and machine reasoning. These results suggest a principled framework for diagnosing incomplete concept ontologies and aligning human and machine conceptual reasoning, with implications for the development and validation of safe interpretable AI for high-risk applications.

#7 Nonnegative Matrix Factorization in the Component-Wise L1 Norm for Sparse Data

著者: Giovanni Seraghiti, K\'evin Dubrulle, Arnaud Vandaele, Nicolas Gillis

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.29715

要約:
Nonnegative matrix factorization (NMF) approximates a nonnegative matrix, $X$, by the product of two nonnegative factors, $WH$, where $W$ has $r$ columns and $H$ has $r$ rows. In this paper, we consider NMF using the component-wise L1 norm as the error measure (L1-NMF), which is suited for data corrupted by heavy-tailed noise, such as Laplace noise or salt and pepper noise, or in the presence of outliers. Our first contribution is an NP-hardness proof for L1-NMF, even when $r=1$, in contrast to the standard NMF that uses least squares. Our second contribution is to show that L1-NMF strongly enforces sparsity in the factors for sparse input matrices, thereby favoring interpretability. However, if the data is affected by false zeros, too sparse solutions might degrade the model. Our third contribution is a new, more general, L1-NMF model for sparse data, dubbed weighted L1-NMF (wL1-NMF), where the sparsity of the factorization is controlled by adding a penalization parameter to the entries of $WH$ associated with zeros in the data. The fourth contribution is a new coordinate descent (CD) approach for wL1-NMF, denoted as sparse CD (sCD), where each subproblem is solved by a weighted median algorithm. To the best of our knowledge, sCD is the first algorithm for L1-NMF whose complexity scales with the number of nonzero entries in the data, making it efficient in handling large-scale, sparse data. We perform extensive numerical experiments on synthetic and real-world data to show the effectiveness of our new proposed model (wL1-NMF) and algorithm (sCD).

#8 Penalized GMM Framework for Inference on Functionals of Nonparametric Instrumental Variable Estimators

著者: Edvard Bakhitov

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.29889

要約:
This paper develops a penalized GMM (PGMM) framework for automatic debiased inference on functionals of nonparametric instrumental variable estimators. We derive convergence rates for the PGMM estimator and provide conditions for root-n consistency and asymptotic normality of debiased functional estimates, covering both linear and nonlinear functionals. Monte Carlo experiments on average derivative show that the PGMM-based debiased estimator performs on par with the analytical debiased estimator that uses the known closed-form Riesz representer, achieving 90-96% coverage while the plug-in estimator falls below 5%. We apply our procedure to estimate mean own-price elasticities in a semiparametric demand model for differentiated products. Simulations confirm near-nominal coverage while the plug-in severely undercovers. Applied to IRI scanner data on carbonated beverages, debiased semiparametric estimates are approximately 20% more elastic compared to the logit benchmark, and debiasing corrections are heterogeneous across products, ranging from negligible to several times the standard error.

#9 Do covariates explain why these groups differ? The choice of reference group can reverse conclusions in the Oaxaca-Blinder decomposition

著者: Manuel Quintero, Advik Shreekumar, William T. Stephenson, Tamara Broderick

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.29972

要約:
Scientists often want to explain why an outcome is different in two groups. For instance, differences in patient mortality rates across two hospitals could be due to differences in the patients themselves (covariates) or differences in medical care (outcomes given covariates). The Oaxaca--Blinder decomposition (OBD) is a standard tool to tease apart these factors. It is well known that the OBD requires choosing one of the groups as a reference, and the numerical answer can vary with the reference. To the best of our knowledge, there has not been a systematic investigation into whether the choice of OBD reference can yield different substantive conclusions and how common this issue is. In the present paper, we give existence proofs in real and simulated data that the OBD references can yield substantively different conclusions and that these differences are not entirely driven by model misspecification or small data. We prove that substantively different conclusions occur in up to half of the parameter space, but find these discrepancies rare in the real-data analyses we study. We explain this empirical rarity by examining how realistic data-generating processes can be biased towards parameters that do not change conclusions under the OBD.

#10 Aligning Validation with Deployment: Target-Weighted Cross-Validation for Spatial Prediction

著者: Alexander Brenning, Thomas Suesse

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.29981

要約:
Cross-validation (CV) is commonly used to estimate predictive risk when independent test data are unavailable. Its validity depends on the assumption that validation tasks are sampled from the same distribution as prediction tasks encountered during deployment. In spatial prediction and other settings with structured data, this assumption is frequently violated, leading to biased estimates of deployment risk. We propose Target-Weighted CV (TWCV), an estimator of deployment risk that accounts for discrepancies between validation and deployment task distributions, thus accounting for (1) covariate shift and (2) task-difficulty shift. We characterize prediction tasks by descriptors such as covariates and spatial configuration. TWCV assigns weights to validation losses such that the weighted empirical distribution of validation tasks matches the corresponding distribution over a target domain. The weights are obtained via calibration weighting, yielding an importance-weighted estimator that targets deployment risk. Since TWCV requires adequate coverage of the deployment distribution's support, we combine it with spatially buffered resampling that diversifies the task difficulty distribution. In a simulation study, conventional as well as spatial estimators exhibit substantial bias depending on sampling, whereas buffered TWCV remains approximately unbiased across scenarios. A case study in environmental pollution mapping further confirms that discrepancies between validation and deployment task distributions can affect performance assessment, and that buffered TWCV better reflects the prediction task over the target domain. These results establish task distribution mismatch as a primary source of CV bias in spatial prediction and show that calibration weighting combined with a suitable validation task generator provides a viable approach to estimating predictive risk under dataset shift.

#11 Refined Detection for Gumbel Watermarking

intellectual property

著者: Tor Lattimore

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.30017

要約:
We propose a simple detection mechanism for the Gumbel watermarking scheme proposed by Aaronson (2022). The new mechanism is proven to be near-optimal in a problem-dependent sense among all model-agnostic watermarking schemes under the assumption that the next-token distribution is sampled i.i.d.

#12 Cheap Bootstrap for Fast Uncertainty Quantification of Stochastic Gradient Descent

著者: Henry Lam, Zitong Wang

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2310.11065

要約:
Stochastic gradient descent (SGD) or stochastic approximation has been widely used in model training and stochastic optimization. While there is a huge literature on analyzing its convergence, inference on the obtained solutions from SGD has only been recently studied, yet it is important due to the growing need for uncertainty quantification. We investigate two computationally cheap resampling-based methods to construct confidence intervals for SGD solutions. One uses multiple, but few, SGDs in parallel via resampling with replacement from the data, and another operates this in an online fashion. Our methods can be regarded as enhancements of established bootstrap schemes to substantially reduce the computation effort in terms of resampling requirements, while bypassing the intricate mixing conditions in existing batching methods. We achieve these via a recent so-called cheap bootstrap idea and refinement of a Berry-Esseen-type bound for SGD.

#13 DeepRV: Accelerating Spatiotemporal Inference with Pre-trained Neural Priors

著者: Jhonathan Navott, Daniel Jenson, Seth Flaxman, Elizaveta Semenova

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2503.21473

要約:
Gaussian Processes (GPs) provide a flexible and statistically principled foundation for modelling spatiotemporal phenomena, but their $O(N^3)$ scaling makes them intractable for large datasets. Approximate methods such as variational inference (VI), inducing-point (sparse) GPs, low-rank kernel approximations (e.g., Nystrom methods and random Fourier features), and approximations such as INLA improve scalability but typically trade off accuracy, calibration, or modelling flexibility. We introduce DeepRV, a neural-network surrogate that replaces GP prior sampling, while closely matching full GP accuracy at inference including hyperparameter estimates, and reducing computational complexity to $O(N^2)$, increasing scalability and inference speed. DeepRV serves as a drop-in replacement for GP prior realisations in e.g. MCMC-based probabilistic programming pipelines, preserving full model flexibility. Across simulated benchmarks, non-separable spatiotemporal GPs, and a real-world application to education deprivation in London (n = 4,994 locations), DeepRV achieves the highest fidelity to exact GPs while substantially accelerating inference. Code is provided in the dl4bi Python package, with all experiments run on a single consumer-grade GPU to ensure accessibility for practitioners.

#14 Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives

著者: Brian Liu, Rahul Mazumder, Peter Radchenko

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2506.20114

要約:
Tree ensembles are non-parametric methods widely recognized for their accuracy and ability to capture complex interactions. While these models excel at prediction, they are difficult to interpret and may fail to uncover useful relationships in the data. We propose an estimator to extract compact sets of decision rules from tree ensembles. The extracted models are accurate and can be manually examined to reveal relationships between the predictors and the response. A key novelty of our estimator is the flexibility to jointly control the number of rules extracted and the interaction depth of each rule, which improves accuracy. We develop a tailored exact algorithm to efficiently solve optimization problems underlying our estimator and an approximate algorithm for computing regularization paths, sequences of solutions that correspond to varying model sizes. We also establish novel non-asymptotic prediction error bounds for our proposed approach, comparing it to an oracle that chooses the best data-dependent linear combination of the rules in the ensemble subject to the same complexity constraint as our estimator. The bounds illustrate that the large-sample predictive performance of our estimator is on par with that of the oracle. Through experiments, we demonstrate that our estimator outperforms existing algorithms for rule extraction.

#15 Bayesian Modeling and Estimation of Linear Time-Varying Systems using Neural Networks and Gaussian Processes

著者: Yaniv Shulman

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2507.12878

要約:
The identification of Linear Time-Varying (LTV) systems from input-output data is a fundamental yet challenging ill-posed inverse problem. This work introduces a unified Bayesian framework that models the system's impulse response, $h(t, \tau)$, as a stochastic process. We decompose the response into a posterior mean and a random fluctuation term, a formulation that provides a principled approach for quantifying uncertainty, unifies intrinsic channel variability and epistemic uncertainty through a common posterior representation, and naturally defines a new, useful system class we term Linear Time-Invariant in Expectation (LTIE). To perform inference, we leverage modern machine learning techniques, including Bayesian neural networks and Gaussian Processes, using scalable variational inference. We demonstrate through a series of experiments that our framework can infer the properties of an LTI system from a single noisy input-output pair, including under deliberate additive-noise misspecification, achieve a lower overall error floor than the classical CCF stacking baseline in a simulated ambient noise tomography setting, and track a continuously varying LTV impulse response by using a structured Gaussian Process prior. This work provides a flexible and robust methodology for uncertainty-aware system identification in dynamic environments.

#16 On computing and the complexity of computing higher-order $U$-statistics, exactly

著者: Xingyu Chen, Ruiqi Zhang, Lin Liu

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2508.12627

要約:
Higher-order $U$-statistics abound in fields such as statistics, machine learning, and computer science, but are known to be highly time-consuming to compute in practice. Despite their widespread appearance, a comprehensive study of their computational complexity is surprisingly lacking. This paper aims to fill this gap by presenting several results related to the computational aspect of $U$-statistics. First, we derive a useful decomposition from a $m$-th order $U$-statistic to a linear combination of $V$-statistics with orders not exceeding $m$, which are generally more feasible to compute. Second, we explore the connection between exactly computing $V$-statistics and Einstein summation, a tool often used in computational mathematics and quantum computing to accelerate tensor computations. Third, we provide an optimistic estimate of the time complexity for exactly computing $U$-statistics, based on the treewidth of a particular graph associated with the $U$-statistic kernel. The above ingredients lead to (1) a new, much more runtime-efficient algorithm to exactly compute general higher-order $U$-statistics, and (2) a more streamlined characterization of runtime complexity of computing $U$-statistics. We develop an accompanying open-source package called \texttt{u-stats} in both Python (https://github.com/zrq1706/U-Statistics-Python) and R (https://github.com/cxy0714/U-Statistics-R). We demonstrate through three examples in statistics that \texttt{u-stats} achieves impressive runtime performance compared to existing benchmarks. This paper also aspires to achieve two goals: (1) to capture the interest of researchers in both statistics and other related areas to further advance the algorithmic development of $U$-statistics and (2) to lift the burden of implementing higher-order $U$-statistics from practitioners.

#17 Bayesian Additive Regression Trees for functional ANOVA model

著者: Seokhun Park, Insung Kong, Yongdai Kim

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2509.03317

要約:
Bayesian Additive Regression Trees (BART) is a powerful statistical model that leverages the strengths of Bayesian inference and regression trees. It has received significant attention for capturing complex non-linear relationships and interactions among predictors. However, the accuracy of BART often comes at the cost of interpretability. To address this limitation, we propose ANOVA Bayesian Additive Regression Trees (ANOVA-BART), a novel extension of BART based on the functional ANOVA decomposition, which is used to decompose the variability of a function into different interactions, each representing the contribution of a different set of covariates or factors. Our proposed ANOVA-BART enhances interpretability, preserves and extends the theoretical guarantees of BART, and achieves comparable prediction performance. Specifically, we establish that the posterior concentration rate of ANOVA-BART is nearly minimax optimal, and further provides the same convergence rates for each interaction that are not available for BART. Moreover, comprehensive experiments confirm that ANOVA-BART is comparable to BART in both accuracy and uncertainty quantification, while also demonstrating its effectiveness in component selection. These results suggest that ANOVA-BART offers a compelling alternative to BART by balancing predictive accuracy, interpretability, and theoretical consistency.

#18 Local Causal Discovery for Statistically Efficient Causal Inference

著者: M\'aty\'as Schubert, Tom Claassen, Sara Magliacane

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2510.14582

要約:
Causal discovery methods can identify valid adjustment sets for causal effect estimation for a pair of target variables, even when the underlying causal graph is unknown. Global causal discovery methods focus on learning the whole causal graph and therefore enable the recovery of optimal adjustment sets, i.e., sets with the lowest asymptotic variance, but they quickly become computationally prohibitive as the number of variables grows. Local causal discovery methods offer a more scalable alternative by focusing on the local neighborhood of the target variables, but are restricted to statistically suboptimal adjustment sets. In this work, we propose Local Optimal Adjustments Discovery (LOAD), a sound and complete causal discovery approach that combines the computational efficiency of local methods with the statistical optimality of global methods. First, LOAD identifies the causal relation between the targets and tests if the causal effect is identifiable by using only local information. If it is identifiable, it finds the possible descendants of the treatment and infers the optimal adjustment set as the parents of the outcome in a modified forbidden projection. Otherwise, it returns the locally valid parent adjustment sets. In our experiments on synthetic and realistic data LOAD outperforms global methods in scalability, while providing more accurate effect estimation than local methods.

#19 Drift Estimation for Diffusion Processes Using Neural Networks Based on Discretely Observed Independent Paths

diffusion

著者: Yuzhen Zhao, Yating Liu, Marc Hoffmann

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2511.11161

要約:
This paper addresses the nonparametric estimation of the drift function over a compact domain for a time-homogeneous diffusion process, based on high-frequency discrete observations from $N$ independent trajectories. We propose a neural network-based estimator and derive a non-asymptotic convergence rate, decomposed into a training error, an approximation error, and a diffusion-related term scaling as ${\log N}/{N}$. For compositional drift functions, we establish an explicit rate. In the numerical experiments, we consider a drift function with local fluctuations generated by a double-layer compositional structure featuring local oscillations, and show that the empirical convergence rate becomes independent of the input dimension $d$. Compared to the $B$-spline method, the neural network estimator achieves better convergence rates and more effectively captures local features, particularly in higher-dimensional settings.

#20 Benchmarking Physics-Informed Time-Series Models for Operational Global Station Weather Forecasting

著者: Tao Han, Zhibin Wen, Zhenghao Chen, Dazhao Du, Song Guo, Lei Bai

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2406.14399

要約:
The development of Time-Series Forecasting (TSF) models is often constrained by the lack of comprehensive datasets, especially in Global Station Weather Forecasting (GSWF), where existing datasets are small, temporally short, and spatially sparse. To address this, we introduce WEATHER-5K, a large-scale observational weather dataset that better reflects real-world conditions, supporting improved model training and evaluation. While recent TSF methods perform well on benchmarks, they lag behind operational Numerical Weather Prediction systems in capturing complex weather dynamics and extreme events. We propose PhysicsFormer, a physics-informed forecasting model combining a dynamic core with a Transformer residual to predict future weather states. Physical consistency is enforced via pressure-wind alignment and energy-aware smoothness losses, ensuring plausible dynamics while capturing complex temporal patterns. We benchmark PhysicsFormer and other TSF models against operational systems across several weather variables, extreme event prediction, and model complexity, providing a comprehensive assessment of the gap between academic TSF models and operational forecasting. The dataset and benchmark implementation are available at: https://github.com/taohan10200/WEATHER-5K.

#21 Bayesian model-averaging stochastic item selection for adaptive testing

著者: Tina Su, Edison Choe, Joshua C. Chang

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2504.15543

要約:
Computer Adaptive Testing (CAT) aims to accurately estimate an individual's ability using only a subset of an Item Response Theory (IRT) instrument. Many applications also require diverse item exposure across testing sessions, preventing any single item from being over- or underutilized. In CAT, items are selected sequentially based on a running estimate of a respondent's ability. Prior methods almost universally see item selection through an optimization lens, motivating greedy item selection procedures. While efficient, these deterministic methods tend to have poor item exposure. Existing stochastic methods for item selection are ad-hoc, with item sampling weights that lack theoretical justification. We formulate stochastic CAT as a Bayesian model averaging problem. We seek item sampling probabilities, treated in the long-run frequentist sense, that perform optimal model averaging for the ability estimate in a Bayesian sense. The derivation yields an information criterion for optimal stochastic mixing: the expected entropy of the next posterior. We tested our method on seven publicly available psychometric instruments spanning personality, social attitudes, narcissism, and work preferences, in addition to the eight scales of the Work Disability Functional Assessment Battery. Across all instruments, accuracy differences between selection methods at a given test length are varied but minimal relative to the natural noise in ability estimation; however, the stochastic selector achieves full item bank exposure, resolving the longstanding tradeoff between measurement efficiency and item security at negligible accuracy cost.

#22 When fractional quasi p-norms concentrate

著者: Ivan Y. Tyukin, Bogdan Grechuk, Evgeny M. Mirkes, Alexander N. Gorban

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2505.19635

要約:
Concentration of distances in high dimension is an important factor for the development and design of stable and reliable data analysis algorithms. In this paper, we address the fundamental long-standing question about the concentration of distances in high dimension for fractional quasi $p$-norms, $p\in(0,1)$. The topic has been at the centre of various theoretical and empirical controversies. Here we, for the first time, identify conditions when fractional quasi $p$-norms concentrate and when they don't. We show that contrary to some earlier suggestions, for broad classes of distributions, fractional quasi $p$-norms admit exponential and uniform in $p$ concentration bounds. For these distributions, the results effectively rule out previously proposed approaches to alleviate concentration by "optimal" setting the values of $p$ in $(0,1)$. At the same time, we specify conditions and the corresponding families of distributions for which one can still control concentration rates by appropriate choices of $p$. We also show that in an arbitrarily small vicinity of a distribution from a large class of distributions for which uniform concentration occurs, there are uncountably many other distributions featuring anti-concentration properties. Importantly, this behavior enables devising relevant data encoding or representation schemes favouring or discouraging distance concentration. The results shed new light on this long-standing problem and resolve the tension around the topic in both theory and empirical evidence reported in the literature.

#23 Epistemic Errors of Imperfect Multitask Learners When Distributions Shift

著者: Sabina J. Sloman, Michele Caprio, Samuel Kaski

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2505.23496

要約:
Uncertainty-aware machine learners, such as Bayesian neural networks, output a quantification of uncertainty instead of a point prediction. We provide uncertainty-aware learners with a principled framework to characterize, and identify ways to eliminate, errors that arise from reducible (epistemic) uncertainty. We introduce a principled definition of epistemic error, and provide a decompositional epistemic error bound which operates in the very general setting of imperfect multitask learning under distribution shift. In this setting, the training (source) data may arise from multiple tasks, the test (target) data may differ systematically from the source data tasks, and/or the learner may not arrive at an accurate characterization of the source data. Our bound separately attributes epistemic errors to each of multiple aspects of the learning procedure and environment. As corollaries of the general result, we provide epistemic error bounds specialized to the settings of Bayesian transfer learning and distribution shift within $\epsilon$-neighborhoods.

#24 Fuzzy Prediction Sets: Conformal Prediction with E-values

著者: Nick W. Koning, Sam van Meer

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2509.13130

要約:
Prediction sets offer a binary inclusion/exclusion for each element at the same fixed confidence level. We generalize to fuzzy prediction sets, which exclude elements at their own data-driven confidence level. Our key insight is that a fuzzy prediction set \emph{is} an e-value, capturing precisely what e-values bring to predictive inference. Fuzzy prediction sets inherit the merging properties of their e-value, offer richer guarantees to decision-makers. We also show in what sense optimal e-values give rise to optimal (fuzzy) prediction sets. We apply our results to conformal prediction, deriving optimal fuzzy conformal prediction sets, and characterizing in what sense classical conformal prediction is optimal.

#25 Expressive Power of Implicit Models: Rich Equilibria and Test-Time Scaling

著者: Jialin Liu, Lisang Ding, Stanley Osher, Wotao Yin

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2510.03638

要約:
Implicit models, an emerging model class, compute outputs by iterating a single parameter block to a fixed point. This architecture realizes an infinite-depth, weight-tied network that trains with constant memory, significantly reducing memory needs for the same level of performance compared to explicit models. While it is empirically known that these compact models can often match or even exceed the accuracy of larger explicit networks by allocating more test-time compute, the underlying mechanism remains poorly understood. We study this gap through a nonparametric analysis of expressive power. We provide a strict mathematical characterization, showing that a simple and regular implicit operator can, through iteration, progressively express more complex mappings. We prove that for a broad class of implicit models, this process lets the model's expressive power scale with test-time compute, ultimately matching a much richer function class. The theory is validated across four domains: image reconstruction, scientific computing, operations research, and LLM reasoning, demonstrating that as test-time iterations increase, the complexity of the learned mapping rises, while the solution quality simultaneously improves and stabilizes.

#26 From Moments to Models: Graphon-Mixture Learning for Mixup and Contrastive Learning

著者: Ali Azizpour, Reza Ramezanpour, Santiago Segarra

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2510.03690

要約:
Real-world graph datasets often arise from mixtures of populations, where graphs are generated by multiple distinct underlying distributions. In this work, we propose a unified framework that explicitly models graph data as a mixture of probabilistic graph generative models represented by graphons. To characterize and estimate these graphons, we leverage graph moments (motif densities) to cluster graphs generated from the same underlying model. We establish a novel theoretical guarantee, deriving a tighter bound showing that graphs sampled from structurally similar graphons exhibit similar motif densities with high probability. This result enables principled estimation of graphon mixture components. We show how incorporating estimated graphon mixture components enhances two widely used downstream paradigms: graph data augmentation via mixup and graph contrastive learning. By conditioning these methods on the underlying generative models, we develop graphon-mixture-aware mixup (GMAM) and model-aware graph contrastive learning (MGCL). Extensive experiments on both simulated and real-world datasets demonstrate strong empirical performance. In supervised learning, GMAM outperforms existing augmentation strategies, achieving new state-of-the-art accuracy on 6 out of 7 datasets. In unsupervised learning, MGCL performs competitively across seven benchmark datasets and achieves the lowest average rank overall.

#27 The Effect of Attention Head Count on Transformer Approximation

著者: Penghao Yu, Haotian Jiang, Zeyu Bao, Ruoxi Yu, Qianxiao Li

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2510.06662

要約:
Transformer has become the dominant architecture for sequence modeling, yet a detailed understanding of how its structural parameters influence expressive power remains limited. In this work, we study the approximation properties of transformers, with particular emphasis on the role of the number of attention heads. Our analysis begins with the introduction of a generalized $D$-retrieval task, which we prove to be dense in the space of continuous functions, thereby providing the basis for our theoretical framework. We then establish both upper and lower bounds on the parameter complexity required for $\epsilon$-approximation. Specifically, we show that transformers with sufficiently many heads admit efficient approximation, whereas with too few heads, the number of parameters must scale at least as $O(1/\epsilon^{cT})$, for some constant $c$ and sequence length $T$. To the best of our knowledge, this constitutes the first rigorous lower bound of this type in a nonlinear and practically relevant setting. We further examine the single-head case and demonstrate that an embedding dimension of order $O(T)$ allows complete memorization of the input, where approximation is entirely achieved by the feed-forward block. Finally, we validate our theoretical findings with experiments on both synthetic data and real-world tasks, illustrating the practical relevance of our results.

#28 Design Stability in Adaptive Experiments: Implications for Treatment Effect Estimation

著者: Saikat Sengupta, Koulik Khamaru, Suvrojit Ghosh, Tirthankar Dasgupta

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2510.22351

要約:
We study the problem of estimating the average treatment effect (ATE) under sequentially adaptive treatment assignment mechanisms. In contrast to classical completely randomized designs, we consider a setting in which the probability of assigning treatment to each experimental unit may depend on prior assignments and observed outcomes. Within the potential outcomes framework, we propose and analyze two natural estimators for the ATE: the inverse propensity weighted (IPW) estimator and an augmented IPW (AIPW) estimator. The cornerstone of our analysis is the concept of design stability, which requires that as the number of units grows, either the assignment probabilities converge, or sample averages of the inverse propensity scores and of the inverse complement propensity scores converge in probability to fixed, non-random limits. Our main results establish central limit theorems for both the IPW and AIPW estimators under design stability and provide explicit expressions for their asymptotic variances. We further propose estimators for these variances, enabling the construction of asymptotically valid confidence intervals. Finally, we illustrate our theoretical results in the context of Wei's adaptive coin design and Efron's biased coin design, highlighting the applicability of the proposed methods to sequential experimentation with adaptive randomization.

#29 Provably Extracting the Features from a General Superposition

著者: Allen Liu

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2512.15987

要約:
It is widely believed that complex machine learning models generally encode features through linear representations. This is the foundational hypothesis behind a vast body of work on interpretability. A key challenge toward extracting interpretable features, however, is that they exist in superposition. In this work, we study the question of extracting features in superposition from a learning theoretic perspective. We start with the following fundamental setting: we are given query access to a function \[ f(x)=\sum_{i=1}^n \sigma_i(v_i^\top x), \] where each unit vector $v_i$ encodes a feature direction and $\sigma_i:\R\to\R$ is an arbitrary response function and our goal is to recover the $v_i$ and the function $f$. In learning-theoretic terms, superposition refers to the \emph{overcomplete regime}, when the number of features is larger than the underlying dimension (i.e. $n > d$), which has proven especially challenging for typical algorithmic approaches. Our main result is an efficient query algorithm that, from noisy oracle access to $f$, identifies all feature directions whose responses are non-degenerate and reconstructs the function $f$. Crucially, our algorithm works in a significantly more general setting than all related prior results. We allow for essentially arbitrary superpositions, only requiring that $v_i, v_j$ are not nearly identical for $i \neq j$, and allowing for general response functions $\sigma_i$. At a high level, our algorithm introduces an approach for searching in Fourier space by iteratively refining the search space to locate the hidden directions $v_i$.

#30 Joint Embedding Variational Bayes

著者: Amin Oji, Paul Fieguth

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2602.05639

要約:
We introduce Variational Joint Embedding (VJE), a reconstruction-free latent-variable framework for non-contrastive self-supervised learning in representation space. VJE maximizes a symmetric conditional evidence lower bound (ELBO) on paired encoder embeddings by defining a conditional likelihood directly on target representations, rather than optimizing a pointwise compatibility objective. The likelihood is instantiated as a heavy-tailed Student--$t$ distribution on a polar representation of the target embedding, where a directional--radial decomposition separates angular agreement from magnitude consistency and mitigates norm-induced pathologies. The directional factor operates on the unit sphere, yielding a valid variational bound for the associated spherical subdensity model. An amortized inference network parameterizes a diagonal Gaussian posterior whose feature-wise variances are shared with the directional likelihood, yielding anisotropic uncertainty without auxiliary projection heads. Across ImageNet-1K, CIFAR-10/100, and STL-10, VJE is competitive with standard non-contrastive baselines under linear and $k$-NN evaluation, while providing probabilistic semantics directly in representation space for downstream uncertainty-aware applications. We validate these semantics through out-of-distribution detection, where representation-space likelihoods yield strong empirical performance. These results position the framework as a principled variational formulation of non-contrastive learning, in which structured feature-wise uncertainty is represented directly in the learned embedding space.

#31 Variational inference via radial transport

著者: Luca Ghafourpour, Sinho Chewi, Alessio Figalli, Aram-Alexandre Pooladian

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2602.17525

要約:
In variational inference (VI), the practitioner approximates a high-dimensional distribution $\pi$ with a simple surrogate one, often a (product) Gaussian distribution. However, in many cases of practical interest, Gaussian distributions might not capture the correct radial profile of $\pi$, resulting in poor coverage. In this work, we approach the VI problem from the perspective of optimizing over these radial profiles. Our algorithm radVI is a cheap, effective add-on to many existing VI schemes, such as Gaussian (mean-field) VI and Laplace approximation. We provide theoretical convergence guarantees for our algorithm, owing to recent developments in optimization over the Wasserstein space--the space of probability distributions endowed with the Wasserstein distance--and new regularity properties of radial transport maps in the style of Caffarelli (2000).

#32 Kinetic Langevin Splitting Schemes for Constrained Sampling

著者: Neil K. Chada, Lu Yu

公開日: Wed, 01 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.23397

要約:
Constrained sampling is an important and challenging task in computational statistics, concerned with generating samples from a distribution under certain constraints. There are numerous types of algorithm aimed at this task, ranging from general Markov chain Monte Carlo, to unadjusted Langevin methods. In this article we propose a series of new sampling algorithms based on the latter of these, specifically the kinetic Langevin dynamics. Our series of algorithms are motivated on advanced numerical methods which are splitting order schemes, which include the BU and BAO families of splitting schemes.Their advantage lies in the fact that they have favorable strong order (bias) rates and computationally efficiency. In particular we provide a number of theoretical insights which include a Wasserstein contraction and convergence results. We are able to demonstrate favorable results, such as improved complexity bounds over existing non-splitting methodologies. Our results are verified through numerical experiments on a range of models with constraints, which include a toy example and Bayesian linear regression.

stat.ML updates on arXiv.org

📋 論文タイトル一覧