arXiv論文一覧 - stat.ML updates on arXiv.org

#1 SCOPE-FE: Structured Control of Operator and Pairwise Exploration for Feature Engineering

著者: Minhee Park, Seongyeon Son, Yonghyun Lee, Eunchan Kim

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.27025

要約:
Automatic feature engineering is an effective approach for improving predictive performance in tabular learning. However, expand-and-reduce methods, such as OpenFE, become increasingly computationally expensive as the input dimensionality grows. This limitation arises primarily from the combinatorial explosion of candidate features generated through operator-feature combinations. To address this issue, we propose SCOPE-FE, a structured search space control framework that improves efficiency by reducing the candidate space prior to feature generation. SCOPE-FE jointly regulates two major sources of combinatorial growth: the operator space and feature-pair space. First, OperatorProbing estimates the dataset-specific utility of candidate operators and eliminates low-contribution operators in advance. Second, FeatureClustering employs spectral embedding and fuzzy c-means clustering to group structurally related features, thereby restricting candidate generation to relevant within-cluster combinations. In addition, we introduce ReliabilityScoring, which incorporates variance across subsamples to stabilize pruning decisions. Experiments on ten benchmark datasets demonstrate that SCOPE-FE substantially reduces feature engineering time while maintaining competitive predictive performance relative to existing baselines. The efficiency gains are particularly pronounced for high-dimensional datasets. These results indicate that structured control of the search space is an effective strategy for scalable automatic feature engineering. The code will be made publicly available upon acceptance.

#2 A Novel Computational Framework for Causal Inference: Tree-Based Discretization with ILP-Based Matching

著者: Tianyu Yang, Md. Noor-E-Alam

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.27307

要約:
Causal inference is essential for data-driven decision-making, as it aims to uncover causal relationships from observational data. However, identifying causality remains challenging due to the potential for confounding and the distinction between correlation and causation. While recent advances in causal machine learning and matching algorithms have improved estimation accuracy, these methods often face trade-offs between interpretability and computational efficiency. This paper proposes a novel approach that combines a tree-based discretization technique, tailored for causal inference, with an integer linear programming-based matching algorithm. The discretization ensures approximately linear relationships for control datasets within strata, enabling effective matching, while the optimization framework optimizes for global balance. The resulting algorithm yields computational efficiency and less biased ATT estimates compared to state-of-the-art algorithms. Empirical evaluations demonstrate the proposed method's practical advantages over existing techniques in causal inference scenarios.

#3 Bayesian X-Learner: Calibrated Posterior Inference for Heterogeneous Treatment Effects under Heavy-Tailed Outcomes

著者: Eichi Uehara

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.27394

要約:
Conditional Average Treatment Effect (CATE) estimation in practice demands three properties simultaneously: heterogeneous effects $\tau(x)$, calibrated uncertainty over them, and robustness to the heavy tails that contaminate real outcome data. Meta-learners (K\"unzel et al., 2019) give (i); causal forests and BART give (i)-(ii) with Gaussian-tail assumptions; no widely used tool gives all three. We present Bayesian X-Learner, an X-Learner built on cross-fitted doubly robust pseudo-outcomes (Kennedy, 2020) with a full MCMC posterior over $\tau(x)$ via a Welsch redescending pseudo-likelihood. On Hill's IHDP benchmark the default configuration attains mean $\sqrt{\varepsilon_{\mathrm{PEHE}}} = 0.56$ on 5 replications (lowest mean; differences from S-/T-/X-learners, full-config Causal BART, and a causal forest baseline are not significant at $\alpha=0.05$, and rank ordering is unstable at 10 replications -- IHDP comparisons are competitive rather than dominant). On contaminated "whale" DGPs with up to 20-25% tail density, a one-flag extension (contamination_severity) that selects a Huber-$\delta$ nuisance loss per Huber's minimax-$\delta$ relation recovers RMSE $\approx 0.13$ with tight credible intervals (single-cross-fit 30-seed coverage 83% [Wilson 66%, 93%] at 20% density; modular-Bayes pooling with Bayesian-bootstrap nuisance draws restores nominal 95% coverage).

#4 Prediction-powered Inference by Mixture of Experts

著者: Yanwu Gu, Linglong Kong, Dong Xia

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.27892

要約:
The rapidly expanding artificial intelligence (AI) industry has produced diverse yet powerful prediction tools, each with its own network architecture, training strategy, data-processing pipeline, and domain-specific strengths. These tools create new opportunities for semi-supervised inference, in which labeled data are limited and expensive to obtain, whereas unlabeled data are abundant and widely available. Given a collection of predictors, we treat them as a mixture of experts (MOE) and introduce an MOE-powered semi-supervised inference framework built upon prediction-powered inference (PPI). Motivated by the variance reduction principle underlying PPI, the proposed framework seeks the mixture of experts that achieves the smallest possible variance. Compared with standard PPI, the MOE-powered inference framework adapts to the unknown performance of individual predictors, benefits from their collective predictive power, and enjoys a best-expert guarantee. The framework is flexible and applies to mean estimation, linear regression, quantile estimation, and general M-estimation. We develop non-asymptotic theory for the MOE-powered inference framework and establish upper bounds on the coverage error of the resulting confidence intervals. Numerical experiments demonstrate the practical effectiveness of MOE-powered inference and corroborate our theoretical findings.

#5 Value-Aware Product Recommendation by Customer Segmentation using a suitable High-Dimensional Similarity Measure

著者: Mar\'ia Florencia Acosta, Rodrigo Garc\'ia Arancibia, Pamela Llop, Mariel Lovatto, Lucas Mansilla

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.26983

要約:
This paper presents a novel value-aware approach to product recommendation that simultaneously addresses the high dimensionality and sparsity of user-item data while explicitly incorporating the contribution of each product and user to overall sales revenue. The proposed framework encodes revenue contributions in the user-item matrix and computes customer similarity directly on this basis using suitable distance measures. This enables the segmentation of users according to the revenue-based similarity of their purchase baskets and supports recommendations aligned with profitability objectives. We compare conventional similarity metrics with a novel alternative tailored to high-dimensional contexts and propose three recommendation strategies based on revenue share, product popularity, and expected profit generation. The effectiveness of the proposed method is validated through simulation experiments and a real-world application using the UCI Online Retail dataset.

#6 Adaptive Robust Confidence Intervals in Efron's Gaussian Two-Groups Model

著者: Qiaosen Wang, Shuwen Chai, Chao Gao

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.26992

要約:
Robust uncertainty quantification is increasingly important in modern data analysis and is often formalized under Huber's model, which allows an $\varepsilon$-fraction of arbitrary corruptions. In many experimental sciences, however, the measurement protocol is well controlled, and contamination is more plausibly introduced upstream. Motivated by this noise-oblivious nature of adversaries, we study confidence intervals for the null location parameter $\theta$ in Efron's Gaussian two-groups model, where an unknown fraction $\varepsilon$ of observations have arbitrarily shifted means, but all samples share the same law of additive Gaussian measurement noise with variance $\sigma^2$. We characterize the minimax-optimal length among confidence intervals with a prescribed coverage level uniformly over the unknown contamination proportion and all noise-oblivious adversaries. Although prior work has shown that the minimax point estimation rate of theta does not deteriorate when $\varepsilon$ becomes unknown, our results reveal that, with a given $\sigma^2$, the minimax-optimal length of confidence intervals that are adaptive to unknown $\varepsilon$ is of order $\sigma (n^{-1/4}+\varepsilon^{1/2}/\max\{1, \log(en \varepsilon^2)\}^{1/2})$, which is polynomially worse than the optimal length when $\varepsilon$ is known. When the variance $\sigma^2$ is also unknown, we show a further degradation: no adaptive robust confidence interval can be shorter than $\Omega(\sigma n^{-1/8})$. Algorithmically, we introduce a Fourier-based certification procedure built on Carath\'{e}odory's positive-semidefiniteness constraints. By scanning candidate points and accepting those whose residual characteristic function is certifiably consistent with a Gaussian location mixture, our algorithm attains the minimax lower bound in the known-variance setting and is computable in polynomial time.

#7 Validating the Clinical Utility of CineECG 3D Reconstructions through Cross-Modal Feature Attribution

privacy

著者: Karol Dobiczek, Maciej Mozolewski, Szymon Bobek, Micha{\l} Szafarczyk, Peter van Dam, Grzegorz J. Nalepa

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.27017

要約:
Deep learning models for 12-lead electrocardiogram (ECG) analysis achieve high diagnostic performance but lack the intuitive interpretability required for clinical integration. Standard feature attribution methods are limited by the inherent difficulty in mapping abstract waveform fluctuations to physical anatomical pathologies. To resolve this, we propose a cross-modal method that projects feature attributions from high-performance 12-lead ECG models onto the CineECG 3D anatomical space. Our study reveals that while models trained directly on CineECG signals suffer from reduced accuracy and incoherent attributions, the proposed mapping mechanism effectively recovers clinically relevant feature rankings. Validated against a ground-truth dataset of 20 cases annotated by domain experts, the mapped explanations yield a Dice score of 0.56, significantly outperforming the 0.47 baseline of standard 12-lead attributions. These findings indicate that cross-modal averaging mapping effectively filters attribution instability and improves the localization of pathological features, combining the diagnostic expressiveness of standard ECG with the intuitive clarity of anatomical visualization.

#8 Learning Rate Transfer in Normalized Transformers

著者: Boris Shigida, Boris Hanin, Andrey Gromov

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.27077

要約:
The Normalized Transformer, or nGPT (arXiv:2410.01131) achieves impressive training speedups and does not require weight decay or learning rate warmup. However, despite having hyperparameters that explicitly scale with model size, we observe that nGPT does not exhibit learning rate transfer across model dimension and token horizon. To rectify this, we combine numerical experiments with a principled use of alignment exponents (arXiv:2407.05872) to revisit and modify the $\mu$P approach to hyperparameter transfer (arXiv:2011.14522). The result is a novel nGPT parameterization we call $\nu$GPT. Through extensive empirical validation, we find $\nu$GPT exhibits learning rate transfer across width, depth, and token horizon.

#9 Linear Models, Variable Selection, Artificial Intelligence

著者: By Riyadh Alrawkan, Edward Boone, Ryad Ghanam, Anton Westveld

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.27191

要約:
Variable selection in linear regression models has been a problem since hypothesis testing began. Which variables to include or exclude from a model is not an easy task. Techniques such as Forward, Back ward, Stepwise Regression sequentially add or delete variables from a model. Penalized likelihood methods such as AIC, BIC, etc. seek to choose variables that have a significant contribution to the likelihood. Penalized sum of square methods such as LASSO and Elastic Net have been used to penalize small coefficients to only allow variables with large coefficients in the model. This work introduces an Artificial Intelligence approach to model selection where an ANN is trained to determine the significance of the variables based on OLS estimates. A simulation study shows the accuracy across various sample sizes and variances. Furthermore, a simulation study is conducted to compare the performance of the approach against Forward, Backward, AIC, BIC and LASSO. The approach is illustrated using a dataset from the World Health Organization regarding Life Expectancy. A github link is provided to the pretrained ANN that can handle up to 100 predictor variables, the original WHO dataset and the subset used in this work.

#10 The Bernstein-von Mises theorem for Bayesian one-pass online learning

著者: Jeyong Lee, Junhyeok Choi, Dongguen Kim, Minwoo Chae

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.27442

要約:
Bayesian online learning provides a coherent framework for sequential inference. However, its theoretical understanding remains limited, particularly in the one-pass setting. Existing theoretical guarantees typically require the mini-batch sample size to diverge, a condition that fails in the one-pass regime. In this paper, we propose a new Bayesian online learning algorithm tailored to the one-pass setting, which incorporates a warm-start phase to ensure stable sequential updates. For this algorithm, we show that the sequentially updated posterior attains the optimal convergence rate. Building on this, we establish an online analogue of the Bernstein-von Mises theorem, which guarantees valid uncertainty quantification without diverging mini-batch sample sizes. Our analysis is based on a novel theoretical framework that differs fundamentally from existing approaches in the online learning literature. Numerical experiments on generalized linear models show that the proposed method matches the performance of the batch estimator while outperforming existing online procedures.

#11 FoReco and FoRecoML: A Unified Toolbox for Forecast Reconciliation in R

著者: Daniele Girolimetto, Jeroen Rombouts, Ines Wilms, Yangzhuoran Fin Yang

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.27696

要約:
Forecast reconciliation has become key to improving the accuracy and coherence of forecasts for linearly constrained multiple time series, such as hierarchical and grouped series. Yet, comprehensive software that jointly covers cross-sectional, temporal, and cross-temporal reconciliation has so far been lacking. The R packages FoReco and FoRecoML address this gap by offering a comprehensive and unified framework. The packages respectively implement classical and regression-based linear reconciliation approaches, and non-linear approaches based on machine learning for cross-sectional, temporal and cross-temporal frameworks. Designed for accessibility and flexibility, these packages provide sensible default options that allow new users to apply reconciliation methods with minimal effort, while still giving expert users full control to explore state-of-the-art extensions through customized settings. With this dual focus, FoReco and FoRecoML are versatile tools for practitioners and researchers working on forecast reconciliation.

#12 Optimized Deferral for Imbalanced Settings

著者: Corinna Cortes, Anqi Mao, Mehryar Mohri, Yutao Zhong

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.27723

要約:
Learning algorithms can be significantly improved by routing complex or uncertain inputs to specialized experts, balancing accuracy with computational cost. This approach, known as learning to defer, is essential in domains like natural language generation, medical diagnosis, and computer vision, where an effective deferral can reduce errors at low extra resource consumption. However, the two-stage learning to defer setting, which leverages existing predictors such as a collection of LLMs or other classifiers, often faces challenges due to an expert imbalance problem. This imbalance can lead to suboptimal performance, with deferral algorithms favoring the majority expert. We present a comprehensive study of two-stage learning to defer in expert imbalance settings. We cast the deferral loss optimization as a novel cost-sensitive learning problem over the input-expert domain. We derive new margin-based loss functions and guarantees tailored to this setting, and develop novel algorithms for cost-sensitive learning. Leveraging these results, we design principled deferral algorithms, MILD (Margin-based Imbalanced Learning to Defer), specifically suited for expert imbalance settings. Extensive experiments demonstrate the effectiveness of our approach, showing clear improvements over existing baselines on both image classification and real-world Large Language Model (LLM) routing tasks.

#13 Mind the Gap: Structure-Aware Consistency in Preference Learning

著者: Mehryar Mohri, Yutao Zhong

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.27733

要約:
Preference learning has become the foundation of aligning Large Language Models (LLMs) with human intent. Popular methods, such as Direct Preference Optimization (DPO), minimize surrogate losses as proxies for the intractable pairwise ranking loss. However, we demonstrate that for the equicontinuous hypothesis sets typical of neural networks, these standard surrogates are theoretically inconsistent, yielding vacuous generalization guarantees. To resolve this, we formulate LLM alignment within a margin-shifted ranking framework. We derive rigorous $H$-consistency bounds that depend on enforcing a separation margin $\gamma$. Crucially, we extend this to Structure-Aware $H$-consistency, introducing a novel objective (SA-DPO) that adapts the margin based on the semantic distance between responses to handle synonyms and hard pairs. Finally, we analyze the trade-off between consistency and model limitations via the Margin-Capacity Profile, proving that heavy-tailed surrogates (such as the Polynomial Hinge family) offer superior consistency guarantees for capacity-bounded models compared to the standard logistic loss used in DPO.

#14 Linear-Core Surrogates: Smooth Loss Functions with Linear Rates for Classification and Structured Prediction

著者: Mehryar Mohri, Yutao Zhong

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.27742

要約:
The choice of loss function in classification involves a fundamental trade-off: smooth losses (like Cross-Entropy) enable fast optimization rates but yield slow square-root consistency bounds, while piecewise-linear losses (like Hinge) offer fast linear consistency rates but suffer from non-differentiability. We propose Linear-Core (LC) Surrogates, a new family of convex loss functions that resolve this tension by stitching a linear core to a smooth tail. We prove that these surrogates are differentiable everywhere while retaining strict linear $H$-consistency bounds, effectively combining the optimization benefits of smoothness with the statistical efficiency of margin-based losses. In the structured prediction setting, we show that this smoothness unlocks a massive computational and energy advantage: it allows for an unbiased stochastic gradient estimator that bypasses the quadratic complexity $O(|\mathscr{Y}|^2)$ of exact inference (e.g., Viterbi). Empirically, our method achieves a 23$\times$ speedup over Structured SVMs on large-vocabulary sequence tagging tasks and demonstrates superior robustness to instance-dependent label noise, outperforming Cross-Entropy by 2.6% on corrupted CIFAR-10.

#15 Decoupled Descent: Exact Test Error Tracking Via Approximate Message Passing

著者: Max Lovig

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.27883

要約:
In modern parametric model training, full-batch gradient descent (and its variants) suffers due to progressively stronger biasing towards the exact realization of training data; this drives the systematic ``generalization gap'', where the train error becomes an unreliable proxy for test error. Existing approaches either argue this gap is benign through complex analysis or sacrifice data to a validation set. In contrast, we introduce decoupled descent (DD), a novel theory-based training algorithm that satisfies a train-test identity -- enforcing the train error to asymptotically track the test error for stylized Gaussian mixture models. Within this specific regime, leveraging approximate message passing theory, DD iteratively cancels the biases due to data reuse, rigorously demonstrating the feasibility of zero-cost validation and $100\%$ data utilization. Moreover, DD is governed by a low-dimensional state evolution recursion, rendering the dynamics of the algorithm transparent and tractable. We validate DD on XOR classification, yielding superior performance compared to GD; additionally, we implement noisy MNIST and non-linear probing of CIFAR-10, demonstrating that even when our stylized assumptions are relaxed, DD narrows the generalization gap compared to GD.

#16 Kernelized Advantage Estimation: From Nonparametric Statistics to LLM Reasoning

著者: Shijin Gong, Kai Ye, Jin Zhu, Xinyu Zhang, Hongyi Zhou, Chengchun Shi

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.28005

要約:
Recent advances in large language models (LLMs) have increasingly relied on reinforcement learning (RL) to improve their reasoning capabilities. Three approaches have been widely adopted: (i) Proximal policy optimization and advantage actor-critic rely on a deep neural network to estimate the value function of the learning policy in order to reduce the variance of the policy gradient. However, estimating and maintaining such a value network incurs substantial computational and memory overhead. (ii) Group relative policy optimization (GRPO) avoids training a value network by approximating the value function using sample averages. However, GRPO samples a large number of reasoning traces per prompt to achieve accurate value function approximation, making it computationally expensive. (iii) REINFORCE-type algorithms sample only a single reasoning trajectory per prompt, which reduces computational cost but suffers from poor sample efficiency. In this work, we focus on a practical, resource-constrained setting in which only a small number of reasoning traces can be sampled per prompt, while low-variance gradient estimation remains essential for high-quality policy learning. To address this challenge, we bring classical nonparametric statistical methods, which are both computationally and statistically efficient, to LLM reasoning. We employ kernel smoothing as a concrete example for value function estimation and the subsequent policy optimization. Numerical and theoretical results demonstrate that our proposal achieves accurate value and gradient estimation, leading to improved policy optimization.

#17 Sequential Inference for Gaussian Processes: A Signal Processing Perspective

著者: Daniel Waxman, Fernando Llorente, Petar M. Djuri\'c

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.28163

要約:
The proliferation of capable and efficient machine learning (ML) models marks one of the strongest methodological shifts in signal processing (SP) in its nearly 100-year history. ML models support the development of SP systems that represent complex, nonlinear relationships with high predictive accuracy. Adapting these models often requires sequential inference, which differs both theoretically and methodologically from the usual paradigm of ML, where data are often assumed independent and identically distributed. Gaussian processes (GPs) are a flexible yet principled framework for modeling random functions, and they have become increasingly relevant to SP as statistical and ML methods assume a more prominent role. We provide a self-contained, tutorial-style overview of GPs, with a particular focus on recent methodological advances in sequential, incremental, or streaming inference. We introduce these techniques from a signal-processing perspective while bridging them to recent advances in ML. Many of the developments we survey have direct applications to state-space modeling, sequential regression and forecasting, anomaly detection in time series, sequential Bayesian optimization, adaptive and active sensing, and sequential detection and decision-making. By organizing these advances from a signal-processing perspective, we intend to equip practitioners with practical tools and a coherent roadmap for deploying sequential GP models in real-world systems.

#18 Modeling Spatial Extremal Dependence of Precipitation Using Distributional Neural Networks

著者: Christopher B\"ulte, Lisa Leimenstoll, Melanie Schienle

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2407.08668

要約:
In this work, we propose a simulation-based estimation approach using generative neural networks to determine dependencies of precipitation maxima and their underlying uncertainty in time and space. Within the common framework of max-stable processes for extremes under temporal and spatial dependence, our methodology allows estimating the process parameters and their respective uncertainty, but also delivers an explicit nonparametric estimate of the spatial dependence through the pairwise extremal coefficient function. We illustrate the effectiveness and robustness of our approach in a thorough finite sample study where we obtain good performance in complex settings for which closed-form likelihood estimation becomes intractable. We use the technique for studying monthly rainfall maxima in Western Germany for the period 2021-2023, which is of particular interest since it contains an extreme precipitation and consecutive flooding event in July 2021 that had a massive deadly impact. Beyond the considered setting, the presented methodology and its main generative ideas also have great potential for other applications.

#19 The Polynomial Stein Discrepancy for Assessing Moment Convergence

著者: Narayan Srinivasan, Matthew Sutton, Christopher Drovandi, Leah F South

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2412.05135

要約:
We propose a novel method for measuring the discrepancy between a set of samples and a desired posterior distribution for Bayesian inference. Classical methods for assessing sample quality like the effective sample size are not appropriate for scalable Bayesian sampling algorithms, such as stochastic gradient Langevin dynamics, that are asymptotically biased. Instead, the gold standard is to use the kernel Stein Discrepancy (KSD), which is itself not scalable given its quadratic cost in the number of samples. The KSD and its faster extensions also typically suffer from the curse of dimensionality and can require extensive tuning. To address these limitations, we develop the polynomial Stein discrepancy (PSD) and an associated goodness-of-fit test. While the new test is not fully convergence-determining, we prove that it detects differences in the first r moments for Gaussian targets. We empirically show that the test has higher power than its competitors in several examples, and at a lower computational cost. Finally, we demonstrate that the PSD can assist practitioners to select hyper-parameters of Bayesian sampling algorithms more efficiently than competitors.

#20 Foreclassing: A new machine learning perspective on human decision making with temporal data

著者: Daniel Andrew Coulson, Martin T. Wells

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2503.04956

要約:
Time series forecasts are widely used to inform decisions. Human decision-makers interpret these forecasts, incorporate prior experience and uncertainty about future outcomes, and then make a decision. In this paper, we propose a new machine learning problem, which we call Foreclassing, which addresses settings in which the aim is to automate human involvement in such decision-making processes. Our aim is to develop a unified end-to-end model that takes a time series as input, produces a forecast, accounts for its predictive uncertainty, and makes a downstream classification decision, enabling models to support or automate such temporal decision-making tasks. Related problems arise across a range of applications, yet the literature lacks both a unified methodology and a formal problem statement. By formalizing the task, we aim to stimulate research on such models and encourage cross-domain collaboration. To solve the Foreclassing problem, we propose a deep Bayesian neural network, ForeClassNet. As part of this framework, we introduce a new type of neural network layer, Boltzmann convolutions, which enable probabilistic learning of kernel sizes in convolutional layers. We evaluate the Foreclassing framework against standard time series classification methods and demonstrate the efficacy of ForeClassNet on real-world Foreclassing datasets from the weather, energy, and finance domains, achieving superior performance relative to state-of-the-art time series classifiers.

#21 Contextual Online Uncertainty-Aware Preference Learning for Human Feedback

著者: Nan Lu, Ethan Lee, Ethan X. Fang, Junwei Lu

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2504.19342

要約:
Reinforcement Learning from Human Feedback (RLHF) has become a pivotal paradigm in artificial intelligence to align large models with human preferences. In this paper, we propose a novel statistical framework to simultaneously conduct the online decision-making and statistical inference on the optimal model using human preference data based on dynamic contextual information. Our approach introduces an efficient decision strategy that achieves both the optimal regret bound and the asymptotic distribution of the estimators. A key challenge in RLHF is handling the dependent online human preference outcomes with dynamic contexts. To address this, in the methodological aspect, we propose a two-stage algorithm starting with $\epsilon$-greedy followed by exploitations; in the theoretical aspect, we tailor anti-concentration inequalities and matrix martingale concentration techniques to derive the uniform estimation rate and asymptotic normality of the estimators using dependent samples from both stages. Extensive simulation results demonstrate that our method outperforms state-of-the-art strategies. We apply the proposed framework to analyze the human preference data for ranking large language models on the Massive Multitask Language Understanding dataset, yielding insightful results on the performance of different large language models for medical anatomy knowledge.

#22 Signature Kernel Scoring Rule: A Spatio-Temporal Diagnostic for Probabilistic Weather Forecasting

著者: Archer Dodson, Ritabrata Dutta

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2510.19110

要約:
Modern weather forecasting has increasingly transitioned from numerical weather prediction (NWP) to data-driven machine learning forecasting techniques. While these new models produce probabilistic forecasts to quantify uncertainty, their training and evaluation may remain hindered by conventional scoring rules, primarily MSE, which are designed for single time point predictions and ignore the highly correlated data structures present in weather behaviour. This work introduces the signature kernel scoring rule to the domain of weather forecasting, which reframes weather variables as continuous paths to encode temporal and spatial dependencies through iterated integrals. Validated as strictly proper through the use of path augmentations to guarantee uniqueness, the signature kernel provides a theoretically robust metric for forecast verification and model training. Empirical evaluations through weather scorecards on WeatherBench 2 models demonstrate the signature kernel scoring rule's high discriminative power and unique capacity to capture path-dependent interactions. Following previous demonstration of successful adversarial-free probabilistic training, we train sliding window generative neural networks using a predictive-sequential scoring rule on ERA5 reanalysis weather data. Using a lightweight model, we demonstrate that signature kernel based training outperforms climatology for forecast paths of up to fifteen timesteps.

#23 Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks

著者: Parsa Rangriz

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2511.02258

要約:
This paper studies the high-dimensional scaling limits of online stochastic gradient descent (SGD). Building on the recent work of Ben Arous, Gheissari, and Jagannath on the effective dynamics of SGD, we study the critical scaling regime of the step size for single-layer networks. Below this critical regime, the effective dynamics are governed by deterministic (ballistic) limits, whereas at the critical scale, a new correction term emerges that changes the phase diagram. In this regime, near fixed points, the corresponding diffusive (SDE) limits of the effective dynamics reduce to an Ornstein-Uhlenbeck process under certain conditions. These results highlight how the information exponent controls sample complexity and illustrate the limitations of deterministic scaling limits in capturing stochastic fluctuations in high-dimensional learning dynamics.

#24 Bayesian Hierarchical Models and the Maximum Entropy Principle

著者: Brendon J. Brewer

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.10252

要約:
Bayesian hierarchical models are frequently used in practical data analysis contexts. One interpretation of these models is that they provide an indirect way of assigning a prior for unknown parameters, through the introduction of hyperparameters. The resulting marginal prior for the parameters (integrating over the hyperparameters) is usually dependent, so that learning one parameter provides some information about the others. In this contribution, I will demonstrate that, when the prior given the hyperparameters is a canonical distribution (a maximum entropy distribution with moment constraints), the dependent marginal prior also has a maximum entropy property, with a different constraint. This constraint is on the marginal distribution of some function of the unknown quantities. The results shed light on what information is actually being assumed when we assign a hierarchical model.

#25 EmDT: Embedding Diffusion Transformer for Tabular Data Generation in Fraud Detection

diffusion

著者: En-Ya Kuo, Sebastien Motsch

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13566

要約:
Imbalanced datasets pose a difficulty in fraud detection, as classifiers are often biased toward the majority class and perform poorly on rare fraudulent transactions. Synthetic data generation is therefore commonly used to mitigate this problem. In this work, we propose the Clustered Embedding Diffusion-Transformer (EmDT), a diffusion model designed to generate fraudulent samples. Our key innovation is to leverage UMAP clustering to identify distinct fraudulent patterns, and train a Transformer denoising network with sinusoidal positional embeddings to capture feature relationships throughout the diffusion process. Once the synthetic data has been generated, we employ a standard decision-tree-based classifier (e.g., XGBoost) for classification, as this type of model remains better suited to tabular datasets. Experiments on a credit card fraud detection dataset demonstrate that EmDT significantly improves downstream classification performance compared to existing oversampling and generative methods, while maintaining comparable privacy protection and preserving feature correlations present in the original data.

#26 DDO-RM: Distribution-Level Policy Improvement after Reward Learning

著者: Tiantian Zhang, Jierui Zuo, Michael Chen, Wenping Wang

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.11119

要約:
Recent theory suggests that reward-model-first methods can be more sample-efficient than direct policy fitting when the reward function is statistically simpler than the induced policy. We propose DDO-RM, a finite-candidate decision-optimization method that converts reward scores into an explicit target distribution. Unlike PPO-based RLHF or DPO, DDO-RM performs a KL-regularized mirror-descent update to project the policy toward a reward-improved distribution over a candidate set. Preliminary experiments on Pythia-410M show that DDO-RM outperforms DPO in pair accuracy (0.52 to 0.56) and mean margin (0.13 to 0.53). Our framework provides a principled connection between reward learning and mirror-descent policy improvement.

#27 Sliced-Regularized Optimal Transport

著者: Khai Nguyen

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.23944

要約:
We propose a new regularized optimal transport (OT) formulation, termed sliced-regularized optimal transport (SROT). Unlike entropic OT (EOT), which regularizes the transport plan toward an independent coupling, SROT regularizes it toward a smoothened sliced OT (SOT) plan. To the best of our knowledge, SROT is the first approach to leverage a version of SOT plan as a reference to improve classical OT. We provide a formal definition of SROT, derive its dual formulation, and provide a post-Bayesian interpretation of SROT. We then develop a Sinkhorn-style algorithm for efficient computation, retaining the same scalability advantages as EOT. By incorporating a scalable SOT plan as a prior, SROT yields more accurate approximations of the exact OT plan than EOT under the same level of regularization. Moreover, the resulting transport plan improves upon the reference SOT plan itself. We further introduce the corresponding OT divergence induced by SROT, named SROT divergence, and analyze its topological and computational properties. Finally, we validate our approach through experiments on synthetic datasets and color transfer tasks, demonstrating that SROT is better than both EOT and SOT in approximating exact OT. Additional experiments on gradient flows further highlight the advantages of SROT divergence.

#28 Algorithmically Designed Artificial Neural Networks (ADANNs): Higher order deep operator learning for parametric partial differential equations

privacy

著者: Arnulf Jentzen, Adrian Riekert, Philippe von Wurstemberger

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2302.03286

要約:
In this article we propose a new deep learning approach to approximate operators related to parametric partial differential equations (PDEs). In particular, we introduce a new strategy to design specific artificial neural network (ANN) architectures in conjunction with specific ANN initialization schemes which are tailor-made for the particular approximation problem under consideration. In the proposed approach we combine efficient classical numerical approximation techniques with deep operator learning methodologies. Specifically, we introduce customized adaptions of existing ANN architectures together with specialized initializations for these ANN architectures so that at initialization we have that the ANNs closely mimic a chosen efficient classical numerical algorithm for the considered approximation problem. The obtained ANN architectures and their initialization schemes are thus strongly inspired by numerical algorithms as well as by popular deep learning methodologies from the literature and in that sense we refer to the introduced ANNs in conjunction with their tailor-made initialization schemes as Algorithmically Designed Artificial Neural Networks (ADANNs). We numerically test the proposed ADANN methodology in the case of several parametric PDEs. In the tested numerical examples the ADANN methodology significantly outperforms existing classical approximation algorithms as well as existing deep operator learning methodologies from the literature.

#29 A Framework for Variational Inference of Lightweight Bayesian Neural Networks with Heteroscedastic Uncertainties

著者: David J. Schodt, Ryan Brown, Michael Merritt, Samuel Park, Delsin Menolascino, Mark A. Peot

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2402.14532

要約:
Obtaining heteroscedastic predictive uncertainties from a Bayesian Neural Network (BNN) is vital to many applications. Often, heteroscedastic aleatoric uncertainties are learned as outputs of the BNN in addition to the predictive means, however doing so may necessitate adding more learnable parameters to the network. In this work, we demonstrate that both the heteroscedastic aleatoric and epistemic variance can be embedded into the variances of learned BNN parameters, improving predictive performance for lightweight networks. By complementing this approach with a moment propagation approach to inference, we introduce a relatively simple framework for sampling-free variational inference suitable for lightweight BNNs.

#30 Minimax Regret Estimation for Generalizing Heterogeneous Treatment Effects with Multisite Data

著者: Yi Zhang, Melody Huang, Kosuke Imai

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2412.11136

要約:
To test scientific theories and develop individualized treatment rules, researchers often wish to learn heterogeneous treatment effects that can be consistently found across diverse populations and contexts. We consider the problem of generalizing heterogeneous treatment effects (HTE) based on data from multiple sites. A key challenge is that a target population may differ from the source sites in unknown and unobservable ways. This means that the estimates from site-specific models lack external validity, and a simple pooled analysis risks bias. We develop a robust CATE (conditional average treatment effect) estimation methodology with multisite data from heterogeneous populations. We propose a minimax-regret framework that learns a generalizable CATE model by minimizing the worst-case regret over a class of target populations whose CATE can be represented as convex combinations of site-specific CATEs. Using robust optimization, the proposed methodology accounts for distribution shifts in both individual covariates and treatment effect heterogeneity across sites. We show that the resulting CATE model has an interpretable closed-form solution, expressed as a weighted average of site-specific CATE models. Thus, researchers can utilize a flexible CATE estimation method within each site and aggregate site-specific estimates to produce the final model. Through simulations and a real-world application, we show that the proposed methodology improves the robustness and generalizability of existing approaches.

#31 Exploring Vision Neural Network Pruning via Screening Methodology

著者: Mingyuan Wang, Yangzi Guo, Sida Liu, Yuhang Liu

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2502.07189

要約:
The remarkable performance of modern deep neural networks (DNNs) is largely driven by their massive scale, often comprising tens to hundreds of millions-or even billions-of parameters. However, such a scale incurs substantial storage and computational costs, hindering deployment on platforms such as edge devices that require energy-efficient and real-time processing. In this paper, we propose a network pruning framework that reduces both storage and computation requirements by an order of magnitude while preserving model accuracy. Our approach eliminates non-essential parameters through a statistical analysis of component significance across classification categories. Specifically, we employ a F-statistic-based screening technique combined with a weighted evaluation scheme to quantify the contributions of connections and channels, enabling both unstructured and structured pruning within a unified framework. Extensive experiments on real-world vision datasets, covering both fully connected neural networks (FNNs) and convolutional neural networks (CNNs), demonstrate that the proposed framework produces compact and efficient models that are highly competitive with the state of art apporoaches.

#32 General Uncertainty Estimation with Delta Variances

著者: Simon Schmitt, John Shawe-Taylor, Hado van Hasselt

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2502.14698

要約:
Decision makers may suffer from uncertainty induced by limited data. This may be mitigated by accounting for epistemic uncertainty, which is however challenging to estimate efficiently for large neural networks. To this extent we investigate Delta Variances, a family of algorithms for epistemic uncertainty quantification, that is computationally efficient and convenient to implement. It can be applied to neural networks and more general functions composed of neural networks. As an example we consider a weather simulator with a neural-network-based step function inside -- here Delta Variances empirically obtain competitive results at the cost of a single gradient computation. The approach is convenient as it requires no changes to the neural network architecture or training procedure. We discuss multiple ways to derive Delta Variances theoretically noting that special cases recover popular techniques and present a unified perspective on multiple related methods. Finally we observe that this general perspective gives rise to a natural extension and empirically show its benefit.

#33 Stereographic Multiple-Try Metropolis

著者: Zhihao Wang, Jun Yang

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2505.12487

要約:
Multiple-proposal MCMC algorithms have recently gained attention for their potential to improve performance, especially through parallel implementation on modern hardware. We introduce Stereographic Multiple-Try Metropolis (SMTM), a novel family of gradient-free algorithms designed for sampling high-dimensional distributions. By integrating multiple-try Metropolis (MTM) with the stereographic MCMC framework, SMTM overcomes the traditional limitations of MTM, particularly its pathological convergence behavior often observed in high dimensions. For both light-tailed and heavy-tailed targets, SMTM not only outperforms classical MTM and the existing stereographic random-walk Metropolis but also demonstrates strong robustness to tuning. These advantages are supported by high-dimensional scaling analysis and validated through extensive simulation studies.

#34 Implicit bias produces neural scaling laws in learning curves, from perceptrons to deep networks

著者: Francesco D'Amico, Dario Bocchi, Matteo Negri

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2505.13230

要約:
Scaling laws in deep learning -- empirical power-law relationships linking model performance to resource growth -- have emerged as simple yet striking regularities across architectures, datasets, and tasks. These laws are particularly impactful in guiding the design of state-of-the-art models, since they quantify the benefits of increasing data or model size, and hint at the foundations of interpretability in machine learning. However, most studies focus on asymptotic behavior at the end of training. In this work, we describe a richer picture by analyzing the entire training dynamics: we identify two novel \textit{dynamical} scaling laws that govern how performance evolves as function of different norm-based complexity measures. Combined, our new laws recover the well-known scaling for test error at convergence. Our findings are consistent across CNNs, ResNets, and Vision Transformers trained on MNIST, CIFAR-10 and CIFAR-100. Furthermore, we provide analytical support using a single-layer perceptron trained with logistic loss, where we derive the new dynamical scaling laws, and we explain them through the implicit bias induced by gradient-based training.

#35 Invariant Feature Extraction Through Conditional Independence and the Optimal Transport Barycenter Problem: the Gaussian case

著者: Ian Bounos, Pablo Groisman, Mariela Sued, Esteban Tabak

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2512.20914

要約:
A methodology is developed to extract $d$ invariant features $W=f(X)$ that predict a response variable $Y$ without being confounded by variables $Z$ that may influence both $X$ and $Y$. The methodology's main ingredient is the penalization of any statistical dependence between $W$ and $Z$ conditioned on $Y$, replaced by the more readily implementable plain independence between $W$ and the random variable $Z_Y = T(Z,Y)$ that solves the [Monge] Optimal Transport Barycenter Problem for $Z\mid Y$. In the Gaussian case considered in this article, the two statements are equivalent. When the true confounders $Z$ are unknown, other measurable contextual variables $S$ can be used as surrogates, a replacement that involves no relaxation in the Gaussian case if the covariance matrix $\Sigma_{ZS}$ has full range. The resulting linear feature extractor adopts a closed form in terms of the first $d$ eigenvectors of a known matrix. The procedure extends with little change to more general, non-Gaussian / non-linear cases.

#36 DeepWeightFlow: Re-Basined Flow Matching for Generating Neural Network Weights

著者: Saumya Gupta, Scott Biggs, Moritz Laber, Zohair Shafi, Robin Walters, Ayan Paul

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2601.05052

要約:
Building efficient and effective generative models for neural network weights has been a research focus of significant interest that faces challenges posed by the high-dimensional weight spaces of modern neural networks and their symmetries. Several prior generative models are limited to generating partial neural network weights, particularly for larger models, such as ResNet and ViT. Those that do generate complete weights struggle with generation speed or require finetuning of the generated models. In this work, we present DeepWeightFlow, a Flow Matching model that operates directly in weight space to generate diverse and high-accuracy neural network weights for a variety of architectures, neural network sizes, and data modalities. The neural networks generated by DeepWeightFlow do not require fine-tuning to perform well and can scale to large networks. We apply Git Re-Basin and TransFusion for neural network canonicalization in the context of generative weight models to account for the impact of neural network permutation symmetries and to improve generation efficiency for larger model sizes. The generated networks excel at transfer learning, and ensembles of hundreds of neural networks can be generated in minutes, far exceeding the efficiency of diffusion-based methods. DeepWeightFlow models pave the way for more efficient and scalable generation of diverse sets of neural networks.

#37 Constrained Policy Optimization with Cantelli-Bounded Value-at-Risk

著者: Rohan Tangri, Jan-Peter Calliess

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2601.22993

要約:
We introduce the Value-at-Risk Constrained Policy Optimization algorithm (VaR-CPO), a sample efficient and conservative method designed to optimize Value-at-Risk (VaR) constrained reinforcement learning (RL) problems. Empirically, we demonstrate that VaR-CPO is capable of safe exploration, achieving zero constraint violations during training in feasible environments, a critical property that baseline methods fail to uphold. To overcome the inherent non-differentiability of the VaR constraint, we employ Cantelli's inequality to obtain a tractable approximation based on the first two moments of the cost return. Additionally, by extending the trust-region framework of the Constrained Policy Optimization (CPO) method, we provide worst-case bounds for both policy improvement and constraint violation during the training process.

#38 CausalCompass: Evaluating the Robustness of Time-Series Causal Discovery in Misspecified Scenarios

著者: Huiyang Yi, Xiaojian Shen, Yonggang Wu, Duxin Chen, He Wang, Wenwu Yu

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2602.07915

要約:
Causal discovery from time series is a fundamental task in machine learning. However, its widespread adoption is hindered by a reliance on untestable causal assumptions and by the lack of robustness-oriented evaluation in existing benchmarks. To address these challenges, we propose CausalCompass, a flexible and extensible benchmark framework designed to assess the robustness of time-series causal discovery (TSCD) methods under violations of modeling assumptions. To demonstrate the practical utility of CausalCompass, we conduct extensive benchmarking of representative TSCD algorithms across eight assumption-violation scenarios. Our experimental results indicate that no single method consistently attains optimal performance across all settings. Nevertheless, the methods exhibiting superior overall performance across diverse scenarios are almost invariably deep learning-based approaches. We further provide hyperparameter sensitivity analyses to deepen the understanding of these findings. We additionally conduct ablation experiments to explain the strong performance of deep learning-based methods under assumption violations. We also find, somewhat surprisingly, that NTS-NOTEARS relies heavily on standardized preprocessing in practice, performing poorly in the vanilla setting but exhibiting strong performance after standardization. Finally, our work aims to provide a comprehensive and systematic evaluation of TSCD methods under assumption violations, thereby facilitating their broader adoption in real-world applications. The user-friendly implementation, documentation and datasets are available at https://anonymous.4open.science/r/CausalCompass-anonymous-5B4F/.

#39 Making Conformal Predictors Robust in Healthcare Settings: a Case Study on EEG Classification

著者: Arjun Chatterjee, Sayeed Sajjad Razin, John Wu, Siddhartha Laghuvarapu, Jathurshan Pradeepkumar, Jimeng Sun

公開日: Fri, 01 May 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2602.19483

要約:
Quantifying uncertainty in clinical predictions is critical for high-stakes diagnosis tasks. Conformal prediction offers a principled approach by providing prediction sets with theoretical coverage guarantees. However, in practice, patient distribution shifts violate the i.i.d. assumptions underlying standard conformal methods, leading to poor coverage in healthcare settings. In this work, we evaluate several conformal prediction approaches on EEG seizure classification, a task with known distribution shift challenges and label uncertainty. We demonstrate that personalized calibration strategies can improve coverage by over 20 percentage points while maintaining comparable prediction set sizes. Our implementation is available via PyHealth, an open-source healthcare AI framework: https://github.com/sunlabuiuc/PyHealth.

stat.ML updates on arXiv.org

📋 論文タイトル一覧