arXiv論文一覧 - stat.ML updates on arXiv.org

#1 Interval Fisher's Discriminant Analysis and Visualisation

著者: Diogo Pinheiro, M. Ros\'ario Oliveira, Igor Kravchenko, Lina Oliveira

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.11945

要約:
In Data Science, entities are typically represented by single valued measurements. Symbolic Data Analysis extends this framework to more complex structures, such as intervals and histograms, that express internal variability. We propose an extension of multiclass Fisher's Discriminant Analysis to interval-valued data, using Moore's interval arithmetic and the Mallows' distance. Fisher's objective function is generalised to consider simultaneously the contributions of the centres and the ranges of intervals and is numerically maximised. The resulting discriminant directions are then used to classify interval-valued observations.To support visual assessment, we adapt the class map, originally introduced for conventional data, to classifiers that assign labels through minimum distance rules. We also extend the silhouette plot to this setting and use stacked mosaic plots to complement the visual display of class assignments. Together, these graphical tools provide insight into classifier performance and the strength of class membership. Applications to real datasets illustrate the proposed methodology and demonstrate its value in interpreting classification results for interval-valued data.

#2 Hellinger loss function for Generative Adversarial Networks

著者: Giovanni Saraceno, Anand N. Vidyashankar, Claudio Agostinelli

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12267

要約:
We propose Hellinger-type loss functions for training Generative Adversarial Networks (GANs), motivated by the boundedness, symmetry, and robustness properties of the Hellinger distance. We define an adversarial objective based on this divergence and study its statistical properties within a general parametric framework. We establish the existence, uniqueness, consistency, and joint asymptotic normality of the estimators obtained from the adversarial training procedure. In particular, we analyze the joint estimation of both generator and discriminator parameters, offering a comprehensive asymptotic characterization of the resulting estimators. We introduce two implementations of the Hellinger-type loss and we evaluate their empirical behavior in comparison with the classic (Maximum Likelihood-type) GAN loss. Through a controlled simulation study, we demonstrate that both proposed losses yield improved estimation accuracy and robustness under increasing levels of data contamination.

#3 Towards a pretrained deep learning estimator of the Linfoot informational correlation

著者: St\'ephanie M. van den Berg, Ulrich Halekoh, S\"oren M\"oller, Andreas Kryger Jensen, Jacob von Bornemann Hjelmborg

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12358

要約:
We develop a supervised deep-learning approach to estimate mutual information between two continuous random variables. As labels, we use the Linfoot informational correlation, a transformation of mutual information that has many important properties. Our method is based on ground truth labels for Gaussian and Clayton copulas. We compare our method with estimators based on kernel density, k-nearest neighbours and neural estimators. We show generally lower bias and lower variance. As a proof of principle, future research could look into training the model with a more diverse set of examples from other copulas for which ground truth labels are available.

#4 Co-Hub Node Based Multiview Graph Learning with Theoretical Guarantees

著者: Bisakh Banerjee, Mohammad Alwardat, Tapabrata Maiti, Selin Aviyente

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12435

要約:
Identifying the graphical structure underlying the observed multivariate data is essential in numerous applications. Current methodologies are predominantly confined to deducing a singular graph under the presumption that the observed data are uniform. However, many contexts involve heterogeneous datasets that feature multiple closely related graphs, typically referred to as multiview graphs. Previous research on multiview graph learning promotes edge-based similarity across layers using pairwise or consensus-based regularizers. However, multiview graphs frequently exhibit a shared node-based architecture across different views, such as common hub nodes. Such commonalities can enhance the precision of learning and provide interpretive insight. In this paper, we propose a co-hub node model, positing that different views share a common group of hub nodes. The associated optimization framework is developed by enforcing structured sparsity on the connections of these co-hub nodes. Moreover, we present a theoretical examination of layer identifiability and determine bounds on estimation error. The proposed methodology is validated using both synthetic graph data and fMRI time series data from multiple subjects to discern several closely related graphs.

#5 Efficient Level-Crossing Probability Calculation for Gaussian Process Modeled Data

著者: Haoyu Li, Isaac J Michaud, Ayan Biswas, Han-Wei Shen

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12442

要約:
Almost all scientific data have uncertainties originating from different sources. Gaussian process regression (GPR) models are a natural way to model data with Gaussian-distributed uncertainties. GPR also has the benefit of reducing I/O bandwidth and storage requirements for large scientific simulations. However, the reconstruction from the GPR models suffers from high computation complexity. To make the situation worse, classic approaches for visualizing the data uncertainties, like probabilistic marching cubes, are also computationally very expensive, especially for data of high resolutions. In this paper, we accelerate the level-crossing probability calculation efficiency on GPR models by subdividing the data spatially into a hierarchical data structure and only reconstructing values adaptively in the regions that have a non-zero probability. For each region, leveraging the known GPR kernel and the saved data observations, we propose a novel approach to efficiently calculate an upper bound for the level-crossing probability inside the region and use this upper bound to make the subdivision and reconstruction decisions. We demonstrate that our value occurrence probability estimation is accurate with a low computation cost by experiments that calculate the level-crossing probability fields on different datasets.

#6 Understanding Overparametrization in Survival Models through Double-Descent

著者: Yin Liu, Jianwen Cai, Didong Li

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12463

要約:
Classical statistical learning theory predicts a U-shaped relationship between test loss and model capacity, driven by the bias-variance trade-off. Recent advances in modern machine learning have revealed a more complex pattern, double-descent, in which test loss, after peaking near the interpolation threshold, decreases again as model capacity continues to grow. While this behavior has been extensively analyzed in regression and classification, its manifestation in survival analysis remains unexplored. This study investigates double-descent in four representative survival models: DeepSurv, PC-Hazard, Nnet-Survival, and N-MTLR. We rigorously define interpolation and finite-norm interpolation, two key characteristics of loss-based models to understand double-descent. We then show the existence (or absence) of (finite-norm) interpolation of all four models. Our findings clarify how likelihood-based losses and model implementation jointly determine the feasibility of interpolation and show that overfitting should not be regarded as benign for survival models. All theoretical results are supported by numerical experiments that highlight the distinct generalization behaviors of survival models.

#7 Iterative Sampling Methods for Sinkhorn Distributionally Robust Optimization

著者: Jie Wang

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12550

要約:
Distributionally robust optimization (DRO) has emerged as a powerful paradigm for reliable decision-making under uncertainty. This paper focuses on DRO with ambiguity sets defined via the Sinkhorn discrepancy: an entropy-regularized Wasserstein distance, referred to as Sinkhorn DRO. Existing work primarily addresses Sinkhorn DRO from a dual perspective, leveraging its formulation as a conditional stochastic optimization problem, for which many stochastic gradient methods are applicable. However, the theoretical analyses of such methods often rely on the boundedness of the loss function, and it is indirect to obtain the worst-case distribution associated with Sinkhorn DRO. In contrast, we study Sinkhorn DRO from the primal perspective, by reformulating it as a bilevel program with several infinite-dimensional lower-level subproblems over probability space. This formulation enables us to simultaneously obtain the optimal robust decision and the worst-case distribution, which is valuable in practical settings, such as generating stress-test scenarios or designing robust learning algorithms. We propose both double-loop and single-loop sampling-based algorithms with theoretical guarantees to solve this bilevel program. Finally, we demonstrate the effectiveness of our approach through a numerical study on adversarial classification.

#8 Mind the Jumps: A Scalable Robust Local Gaussian Process for Multidimensional Response Surfaces with Discontinuities

著者: Isaac Adjetey, Yiyuan She

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12574

要約:
Modeling response surfaces with abrupt jumps and discontinuities remains a major challenge across scientific and engineering domains. Although Gaussian process models excel at capturing smooth nonlinear relationships, their stationarity assumptions limit their ability to adapt to sudden input-output variations. Existing nonstationary extensions, particularly those based on domain partitioning, often struggle with boundary inconsistencies, sensitivity to outliers, and scalability issues in higher-dimensional settings, leading to reduced predictive accuracy and unreliable parameter estimation. To address these challenges, this paper proposes the Robust Local Gaussian Process (RLGP) model, a framework that integrates adaptive nearest-neighbor selection with a sparsity-driven robustification mechanism. Unlike existing methods, RLGP leverages an optimization-based mean-shift adjustment after a multivariate perspective transformation combined with local neighborhood modeling to mitigate the influence of outliers. This approach improves predictive accuracy near discontinuities while enhancing robustness to data heterogeneity. Comprehensive evaluations on real-world datasets show that RLGP consistently delivers high predictive accuracy and maintains competitive computational efficiency, especially in scenarios with sharp transitions and complex response structures. Scalability tests further confirm RLGP's stability and reliability in higher-dimensional settings, where other methods struggle. These results establish RLGP as an effective and practical solution for modeling nonstationary and discontinuous response surfaces across a wide range of applications.

#9 Limits To (Machine) Learning

著者: Zhimin Chen, Bryan Kelly, Semyon Malamud

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12735

要約:
Machine learning (ML) methods are highly flexible, but their ability to approximate the true data-generating process is fundamentally constrained by finite samples. We characterize a universal lower bound, the Limits-to-Learning Gap (LLG), quantifying the unavoidable discrepancy between a model's empirical fit and the population benchmark. Recovering the true population $R^2$, therefore, requires correcting observed predictive performance by this bound. Using a broad set of variables, including excess returns, yields, credit spreads, and valuation ratios, we find that the implied LLGs are large. This indicates that standard ML approaches can substantially understate true predictability in financial data. We also derive LLG-based refinements to the classic Hansen and Jagannathan (1991) bounds, analyze implications for parameter learning in general-equilibrium settings, and show that the LLG provides a natural mechanism for generating excess volatility.

#10 Transport Reversible Jump Markov Chain Monte Carlo with proposals generated by Variational Inference with Normalizing Flows

著者: Pingping Yin, Xiyun Jiao

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12742

要約:
We present a framework using variational inference with normalizing flows (VI-NFs) to generate proposals of reversible jump Markov chain Monte Carlo (RJMCMC) for efficient trans-dimensional Bayesian inference. Unlike transport reversible jump methods relying on forward KL minimization with pilot MCMC samples, our approach minimizes the reverse KL divergence which requires only samples from a base distribution, eliminating costly target sampling. The method employs RealNVP-based flows to learn model-specific transport maps, enabling construction of both between-model and within-model proposals. Our framework provides accurate marginal likelihood estimates from the variational approximation. This facilitates efficient model comparison and proposal adaptation in RJMCMC. Experiments on illustrative example, factor analysis and variable selection tasks in linear regression show that TRJ designed by VI-NFs achieves faster mixing and more efficient model space exploration compared to existing baselines. The proposed algorithm can be extended to conditional flows for amortized vairiational inference across models. Code is available at https://github.com/YinPingping111/TRJ_VINFs.

#11 PAC-Bayes Bounds for Multivariate Linear Regression and Linear Autoencoders

著者: Ruixin Guo, Ruoming Jin, Xinyu Li, Yang Zhou

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12905

要約:
Linear Autoencoders (LAEs) have shown strong performance in state-of-the-art recommender systems. However, this success remains largely empirical, with limited theoretical understanding. In this paper, we investigate the generalizability -- a theoretical measure of model performance in statistical learning -- of multivariate linear regression and LAEs. We first propose a PAC-Bayes bound for multivariate linear regression, extending the earlier bound for single-output linear regression by Shalaeva et al., and establish sufficient conditions for its convergence. We then show that LAEs, when evaluated under a relaxed mean squared error, can be interpreted as constrained multivariate linear regression models on bounded data, to which our bound adapts. Furthermore, we develop theoretical methods to improve the computational efficiency of optimizing the LAE bound, enabling its practical evaluation on large models and real-world datasets. Experimental results demonstrate that our bound is tight and correlates well with practical ranking metrics such as Recall@K and NDCG@K.

#12 Evaluating Singular Value Thresholds for DNN Weight Matrices based on Random Matrix Theory

著者: Kohei Nishikawa, Koki Shimizu, Hashiguchi Hiroki

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12911

要約:
This study evaluates thresholds for removing singular values from singular value decomposition-based low-rank approximations of deep neural network weight matrices. Each weight matrix is modeled as the sum of signal and noise matrices. The low-rank approximation is obtained by removing noise-related singular values using a threshold based on random matrix theory. To assess the adequacy of this threshold, we propose an evaluation metric based on the cosine similarity between the singular vectors of the signal and original weight matrices. The proposed metric is used in numerical experiments to compare two threshold estimation methods.

#13 General OOD Detection via Model-aware and Subspace-aware Variable Priority

著者: Min Lu, Hemant Ishwaran

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.13003

要約:
Out-of-distribution (OOD) detection is essential for determining when a supervised model encounters inputs that differ meaningfully from its training distribution. While widely studied in classification, OOD detection for regression and survival analysis remains limited due to the absence of discrete labels and the challenge of quantifying predictive uncertainty. We introduce a framework for OOD detection that is simultaneously model aware and subspace aware, and that embeds variable prioritization directly into the detection step. The method uses the fitted predictor to construct localized neighborhoods around each test case that emphasize the features driving the model's learned relationship and downweight directions that are less relevant to prediction. It produces OOD scores without relying on global distance metrics or estimating the full feature density. The framework is applicable across outcome types, and in our implementation we use random forests, where the rule structure yields transparent neighborhoods and effective scoring. Experiments on synthetic and real data benchmarks designed to isolate functional shifts show consistent improvements over existing methods. We further demonstrate the approach in an esophageal cancer survival study, where distribution shifts related to lymphadenectomy identify patterns relevant to surgical guidelines.

#14 A Nonparametric Statistics Approach to Feature Selection in Deep Neural Networks with Theoretical Guarantees

著者: Junye Du, Zhenghao Li, Zhutong Gu, Long Feng

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.13565

要約:
This paper tackles the problem of feature selection in a highly challenging setting: $\mathbb{E}(y | \boldsymbol{x}) = G(\boldsymbol{x}_{\mathcal{S}_0})$, where $\mathcal{S}_0$ is the set of relevant features and $G$ is an unknown, potentially nonlinear function subject to mild smoothness conditions. Our approach begins with feature selection in deep neural networks, then generalizes the results to H{\"o}lder smooth functions by exploiting the strong approximation capabilities of neural networks. Unlike conventional optimization-based deep learning methods, we reformulate neural networks as index models and estimate $\mathcal{S}_0$ using the second-order Stein's formula. This gradient-descent-free strategy guarantees feature selection consistency with a sample size requirement of $n = \Omega(p^2)$, where $p$ is the feature dimension. To handle high-dimensional scenarios, we further introduce a screening-and-selection mechanism that achieves nonlinear selection consistency when $n = \Omega(s \log p)$, with $s$ representing the sparsity level. Additionally, we refit a neural network on the selected features for prediction and establish performance guarantees under a relaxed sparsity assumption. Extensive simulations and real-data analyses demonstrate the strong performance of our method even in the presence of complex feature interactions.

#15 Universality of high-dimensional scaling limits of stochastic gradient descent

著者: Reza Gheissari, Aukosh Jagannath

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.13634

要約:
We consider statistical tasks in high dimensions whose loss depends on the data only through its projection into a fixed-dimensional subspace spanned by the parameter vectors and certain ground truth vectors. This includes classifying mixture distributions with cross-entropy loss with one and two-layer networks, and learning single and multi-index models with one and two-layer networks. When the data is drawn from an isotropic Gaussian mixture distribution, it is known that the evolution of a finite family of summary statistics under stochastic gradient descent converges to an autonomous ordinary differential equation (ODE), as the dimension and sample size go to $\infty$ and the step size goes to $0$ commensurately. Our main result is that these ODE limits are universal in that this convergence occurs even when the data is drawn from mixtures of product measures provided the first two moments match the corresponding Gaussian distribution and the initialization and ground truth vectors are sufficiently coordinate-delocalized. We complement this by proving two corresponding non-universality results. We provide a simple example where the ODE limits are non-universal if the initialization is coordinate aligned. We also show that the stochastic differential equation limits arising as fluctuations of the summary statistics around their ODE's fixed points are not universal.

#16 Active Inference with Reusable State-Dependent Value Profiles

著者: Jacob Poschl

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.11829

要約:
Adaptive behavior in volatile environments requires agents to switch among value-control regimes across latent contexts, but maintaining separate preferences, policy biases, and action-confidence parameters for every situation is intractable. We introduce value profiles: a small set of reusable bundles of value-related parameters (outcome preferences, policy priors, and policy precision) assigned to hidden states in a generative model. As posterior beliefs over states evolve trial by trial, effective control parameters arise via belief-weighted mixing, enabling state-conditional strategy recruitment without requiring independent parameters for each context. We evaluate this framework in probabilistic reversal learning, comparing static-precision, entropy-coupled dynamic-precision, and profile-based models using cross-validated log-likelihood and information criteria. Model comparison favors the profile-based model over simpler alternatives (about 100-point AIC differences), and parameter-recovery analyses support structural identifiability even when context must be inferred from noisy observations. Model-based inference further suggests that adaptive control in this task is driven primarily by modulation of policy priors rather than policy precision, with gradual belief-dependent profile recruitment consistent with state-conditional (not purely uncertainty-driven) control. Overall, reusable value profiles provide a tractable computational account of belief-conditioned value control in volatile environments and yield testable signatures of belief-dependent control and behavioral flexibility.

#17 Amortized Causal Discovery with Prior-Fitted Networks

著者: Mateusz Sypniewski, Mateusz Olko, Mateusz Gajewski, Piotr Mi{\l}o\'s

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.11840

要約:
In recent years, differentiable penalized likelihood methods have gained popularity, optimizing the causal structure by maximizing its likelihood with respect to the data. However, recent research has shown that errors in likelihood estimation, even on relatively large sample sizes, disallow the discovery of proper structures. We propose a new approach to amortized causal discovery that addresses the limitations of likelihood estimator accuracy. Our method leverages Prior-Fitted Networks (PFNs) to amortize data-dependent likelihood estimation, yielding more reliable scores for structure learning. Experiments on synthetic, simulated, and real-world datasets show significant gains in structure recovery compared to standard baselines. Furthermore, we demonstrate directly that PFNs provide more accurate likelihood estimates than conventional neural network-based approaches.

#18 Exploring Topological Bias in Heterogeneous Graph Neural Networks

著者: Yihan Zhang

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.11846

要約:
Graph Neural Networks (GNNs) are characterized by their capacity of processing graph-structured data. However, due to the sparsity of labels under semi-supervised learning, they have been found to exhibit biased performance on specific nodes. This kind of bias has been validated to correlate with topological structure and is considered as a bottleneck of GNNs' performance. Existing work focuses on the study of homogeneous GNNs and little attention has been given to topological bias in Heterogeneous Graph Neural Networks (HGNNs). In this work, firstly, in order to distinguish distinct meta relations, we apply meta-weighting to the adjacency matrix of a heterogeneous graph. Based on the modified adjacency matrix, we leverage PageRank along with the node label information to construct a projection. The constructed projection effectively maps nodes to values that strongly correlated with model performance when using datasets both with and without intra-type connections, which demonstrates the universal existence of topological bias in HGNNs. To handle this bias, we propose a debiasing structure based on the difference in the mapped values of nodes and use it along with the original graph structure for contrastive learning. Experiments on three public datasets verify the effectiveness of the proposed method in improving HGNNs' performance and debiasing.

#19 Adaptive Path Integral Diffusion: AdaPID

diffusion

著者: Michael Chertkov (University of Arizona), Hamidreza Behjoo (University of Arizona)

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.11858

要約:
Diffusion-based samplers -- Score Based Diffusions, Bridge Diffusions and Path Integral Diffusions -- match a target at terminal time, but the real leverage comes from choosing the schedule that governs the intermediate-time dynamics. We develop a path-wise schedule -- selection gramework for Harmonic PID with a time-varying stiffness, exploiting Piece-Wise-Constant(PWC) parametrizations and a simple hierarchical refinement. We introduce schedule-sensitive Quality-of-Sampling (QoS) diagnostics. Assuming a Gaussian-Mixture (GM) target, we retain closed-form Green functions' ration and numerically stable, Neural-Network free oracles for predicted-state maps and score. Experiments in 2D show that QoS driven PWC schedules consistently improve early-exit fidelity, tail accuracy, conditioning of the dynamics, and speciation (label-selection) timing at fixed integration budgets.

#20 Generative Stochastic Optimal Transport: Guided Harmonic Path-Integral Diffusion

diffusion

著者: Michael Chertkov (University of Arizona)

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.11859

要約:
We introduce Guided Harmonic Path-Integral Diffusion (GH-PID), a linearly-solvable framework for guided Stochastic Optimal Transport (SOT) with a hard terminal distribution and soft, application-driven path costs. A low-dimensional guidance protocol shapes the trajectory ensemble while preserving analytic structure: the forward and backward Kolmogorov equations remain linear, the optimal score admits an explicit Green-function ratio, and Gaussian-Mixture Model (GMM) terminal laws yield closed-form expressions. This enables stable sampling and differentiable protocol learning under exact terminal matching. We develop guidance-centric diagnostics -- path cost, centerline adherence, variance flow, and drift effort -- that make GH-PID an interpretable variational ansatz for empirical SOT. Three navigation scenarios illustrated in 2D: (i) Case A: hand-crafted protocols revealing how geometry and stiffness shape lag, curvature effects, and mode evolution; (ii) Case B: single-task protocol learning, where a PWC centerline is optimized to minimize integrated cost; (iii) Case C: multi-expert fusion, in which a commander reconciles competing expert/teacher trajectories and terminal beliefs through an exact product-of-experts law and learns a consensus protocol. Across all settings, GH-PID generates geometry-aware, trust-aware trajectories that satisfy the prescribed terminal distribution while systematically reducing integrated cost.

#21 Data-Driven Global Sensitivity Analysis for Engineering Design Based on Individual Conditional Expectations

著者: Pramudita Satria Palar, Paul Saves, Rommel G. Regis, Koji Shimoyama, Shigeru Obayashi, Nicolas Verstaevel, Joseph Morlier

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.11946

要約:
Explainable machine learning techniques have gained increasing attention in engineering applications, especially in aerospace design and analysis, where understanding how input variables influence data-driven models is essential. Partial Dependence Plots (PDPs) are widely used for interpreting black-box models by showing the average effect of an input variable on the prediction. However, their global sensitivity metric can be misleading when strong interactions are present, as averaging tends to obscure interaction effects. To address this limitation, we propose a global sensitivity metric based on Individual Conditional Expectation (ICE) curves. The method computes the expected feature importance across ICE curves, along with their standard deviation, to more effectively capture the influence of interactions. We provide a mathematical proof demonstrating that the PDP-based sensitivity is a lower bound of the proposed ICE-based metric under truncated orthogonal polynomial expansion. In addition, we introduce an ICE-based correlation value to quantify how interactions modify the relationship between inputs and the output. Comparative evaluations were performed on three cases: a 5-variable analytical function, a 5-variable wind-turbine fatigue problem, and a 9-variable airfoil aerodynamics case, where ICE-based sensitivity was benchmarked against PDP, SHapley Additive exPlanations (SHAP), and Sobol' indices. The results show that ICE-based feature importance provides richer insights than the traditional PDP-based approach, while visual interpretations from PDP, ICE, and SHAP complement one another by offering multiple perspectives.

#22 Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning

著者: Vittorio Giammarino, Ahmed H. Qureshi

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12046

要約:
Goal-Conditioned Reinforcement Learning (GCRL) mitigates the difficulty of reward design by framing tasks as goal reaching rather than maximizing hand-crafted reward signals. In this setting, the optimal goal-conditioned value function naturally forms a quasimetric, motivating Quasimetric RL (QRL), which constrains value learning to quasimetric mappings and enforces local consistency through discrete, trajectory-based constraints. We propose Eikonal-Constrained Quasimetric RL (Eik-QRL), a continuous-time reformulation of QRL based on the Eikonal Partial Differential Equation (PDE). This PDE-based structure makes Eik-QRL trajectory-free, requiring only sampled states and goals, while improving out-of-distribution generalization. We provide theoretical guarantees for Eik-QRL and identify limitations that arise under complex dynamics. To address these challenges, we introduce Eik-Hierarchical QRL (Eik-HiQRL), which integrates Eik-QRL into a hierarchical decomposition. Empirically, Eik-HiQRL achieves state-of-the-art performance in offline goal-conditioned navigation and yields consistent gains over QRL in manipulation tasks, matching temporal-difference methods.

#23 SigTime: Learning and Visually Explaining Time Series Signatures

著者: Yu-Chia Huang, Juntong Chen, Dongyu Liu, Kwan-Liu Ma

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12076

要約:
Understanding and distinguishing temporal patterns in time series data is essential for scientific discovery and decision-making. For example, in biomedical research, uncovering meaningful patterns in physiological signals can improve diagnosis, risk assessment, and patient outcomes. However, existing methods for time series pattern discovery face major challenges, including high computational complexity, limited interpretability, and difficulty in capturing meaningful temporal structures. To address these gaps, we introduce a novel learning framework that jointly trains two Transformer models using complementary time series representations: shapelet-based representations to capture localized temporal structures and traditional feature engineering to encode statistical properties. The learned shapelets serve as interpretable signatures that differentiate time series across classification labels. Additionally, we develop a visual analytics system -- SigTIme -- with coordinated views to facilitate exploration of time series signatures from multiple perspectives, aiding in useful insights generation. We quantitatively evaluate our learning framework on eight publicly available datasets and one proprietary clinical dataset. Additionally, we demonstrate the effectiveness of our system through two usage scenarios along with the domain experts: one involving public ECG data and the other focused on preterm labor analysis.

#24 Neural CDEs as Correctors for Learned Time Series Models

著者: Muhammad Bilal Shahid, Prajwal Koirla, Cody Fleming

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12116

要約:
Learned time-series models, whether continuous- or discrete-time, are widely used to forecast the states of a dynamical system. Such models generate multi-step forecasts either directly, by predicting the full horizon at once, or iteratively, by feeding back their own predictions at each step. In both cases, the multi-step forecasts are prone to errors. To address this, we propose a Predictor-Corrector mechanism where the Predictor is any learned time-series model and the Corrector is a neural controlled differential equation. The Predictor forecasts, and the Corrector predicts the errors of the forecasts. Adding these errors to the forecasts improves forecast performance. The proposed Corrector works with irregularly sampled time series and continuous- and discrete-time Predictors. Additionally, we introduce two regularization strategies to improve the extrapolation performance of the Corrector with accelerated training. We evaluate our Corrector with diverse Predictors, e.g., neural ordinary differential equations, Contiformer, and DLinear, on synthetic, physics simulation, and real-world forecasting datasets. The experiments demonstrate that the Predictor-Corrector mechanism consistently improves the performance compared to Predictor alone.

#25 Scalable branch-and-bound model selection with non-monotonic criteria including AIC, BIC and Mallows's $\mathit{C_p}$

著者: Jakob Vanhoefer (Life and Medical Sciences), Antonia K\"orner (Life and Medical Sciences), Domagoj Doresic (Life and Medical Sciences), Jan Hasenauer (Life and Medical Sciences), Dilan Pathirana (Life and Medical Sciences)

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12221

要約:
Model selection is a pivotal process in the quantitative sciences, where researchers must navigate between numerous candidate models of varying complexity. Traditional information criteria, such as the corrected Akaike Information Criterion (AICc), Bayesian Information Criterion (BIC), and Mallows's $\mathit{C_p}$, are valuable tools for identifying optimal models. However, the exponential increase in candidate models with each additional model parameter renders the evaluation of these criteria for all models -- a strategy known as exhaustive, or brute-force, searches -- computationally prohibitive. Consequently, heuristic approaches like stepwise regression are commonly employed, albeit without guarantees of finding the globally-optimal model. In this study, we challenge the prevailing notion that non-monotonicity in information criteria precludes bounds on the search space. We introduce a simple but novel bound that enables the development of branch-and-bound algorithms tailored for these non-monotonic functions. We demonstrate that our approach guarantees identification of the optimal model(s) across diverse model classes, sizes, and applications, often with orders of magnitude computational speedups. For instance, in one previously-published model selection task involving $2^{32}$ (approximately 4 billion) candidate models, our method achieves a computational speedup exceeding 6,000. These findings have broad implications for the scalability and effectiveness of model selection in complex scientific domains.

#26 Eventually LIL Regret: Almost Sure $\ln\ln T$ Regret for a sub-Gaussian Mixture on Unbounded Data

著者: Shubhada Agrawal, Aaditya Ramdas

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12325

要約:
We prove that a classic sub-Gaussian mixture proposed by Robbins in a stochastic setting actually satisfies a path-wise (deterministic) regret bound. For every path in a natural ``Ville event'' $E_\alpha$, this regret till time $T$ is bounded by $\ln^2(1/\alpha)/V_T + \ln (1/\alpha) + \ln \ln V_T$ up to universal constants, where $V_T$ is a nonnegative, nondecreasing, cumulative variance process. (The bound reduces to $\ln(1/\alpha) + \ln \ln V_T$ if $V_T \geq \ln(1/\alpha)$.) If the data were stochastic, then one can show that $E_\alpha$ has probability at least $1-\alpha$ under a wide class of distributions (eg: sub-Gaussian, symmetric, variance-bounded, etc.). In fact, we show that on the Ville event $E_0$ of probability one, the regret on every path in $E_0$ is eventually bounded by $\ln \ln V_T$ (up to constants). We explain how this work helps bridge the world of adversarial online learning (which usually deals with regret bounds for bounded data), with game-theoretic statistics (which can handle unbounded data, albeit using stochastic assumptions). In short, conditional regret bounds serve as a bridge between stochastic and adversarial betting.

#27 Uncertainty Quantification for Machine Learning: One Size Does Not Fit All

著者: Paul Hofman, Yusuf Sale, Eyke H\"ullermeier

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12341

要約:
Proper quantification of predictive uncertainty is essential for the use of machine learning in safety-critical applications. Various uncertainty measures have been proposed for this purpose, typically claiming superiority over other measures. In this paper, we argue that there is no single best measure. Instead, uncertainty quantification should be tailored to the specific application. To this end, we use a flexible family of uncertainty measures that distinguishes between total, aleatoric, and epistemic uncertainty of second-order distributions. These measures can be instantiated with specific loss functions, so-called proper scoring rules, to control their characteristics, and we show that different characteristics are useful for different tasks. In particular, we show that, for the task of selective prediction, the scoring rule should ideally match the task loss. On the other hand, for out-of-distribution detection, our results confirm that mutual information, a widely used measure of epistemic uncertainty, performs best. Furthermore, in an active learning setting, epistemic uncertainty based on zero-one loss is shown to consistently outperform other uncertainty measures.

#28 Optimized Architectures for Kolmogorov-Arnold Networks

著者: James Bagrow, Josh Bongard

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12448

要約:
Efforts to improve Kolmogorov-Arnold networks (KANs) with architectural enhancements have been stymied by the complexity those enhancements bring, undermining the interpretability that makes KANs attractive in the first place. Here we study overprovisioned architectures combined with sparsification to learn compact, interpretable KANs without sacrificing accuracy. Crucially, we focus on differentiable sparsification, turning architecture search into an end-to-end optimization problem. Across function approximation benchmarks, dynamical systems forecasting, and real-world prediction tasks, we demonstrate competitive or superior accuracy while discovering substantially smaller models. Overprovisioning and sparsification are synergistic, with the combination outperforming either alone. The result is a principled path toward models that are both more expressive and more interpretable, addressing a key tension in scientific machine learning.

#29 Optimal Mistake Bounds for Transductive Online Learning

著者: Zachary Chase, Steve Hanneke, Shay Moran, Jonathan Shafer

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12567

要約:
We resolve a 30-year-old open problem concerning the power of unlabeled data in online learning by tightly quantifying the gap between transductive and standard online learning. In the standard setting, the optimal mistake bound is characterized by the Littlestone dimension $d$ of the concept class $H$ (Littlestone 1987). We prove that in the transductive setting, the mistake bound is at least $\Omega(\sqrt{d})$. This constitutes an exponential improvement over previous lower bounds of $\Omega(\log\log d)$, $\Omega(\sqrt{\log d})$, and $\Omega(\log d)$, due respectively to Ben-David, Kushilevitz, and Mansour (1995, 1997) and Hanneke, Moran, and Shafer (2023). We also show that this lower bound is tight: for every $d$, there exists a class of Littlestone dimension $d$ with transductive mistake bound $O(\sqrt{d})$. Our upper bound also improves upon the best known upper bound of $(2/3)d$ from Ben-David, Kushilevitz, and Mansour (1997). These results establish a quadratic gap between transductive and standard online learning, thereby highlighting the benefit of advance access to the unlabeled instance sequence. This contrasts with the PAC setting, where transductive and standard learning exhibit similar sample complexities.

#30 On the Accuracy of Newton Step and Influence Function Data Attributions

著者: Ittai Rubinstein, Samuel B. Hopkins

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12572

要約:
Data attribution aims to explain model predictions by estimating how they would change if certain training points were removed, and is used in a wide range of applications, from interpretability and credit assignment to unlearning and privacy. Even in the relatively simple case of linear regressions, existing mathematical analyses of leading data attribution methods such as Influence Functions (IF) and single Newton Step (NS) remain limited in two key ways. First, they rely on global strong convexity assumptions which are often not satisfied in practice. Second, the resulting bounds scale very poorly with the number of parameters ($d$) and the number of samples removed ($k$). As a result, these analyses are not tight enough to answer fundamental questions such as "what is the asymptotic scaling of the errors of each method?" or "which of these methods is more accurate for a given dataset?" In this paper, we introduce a new analysis of the NS and IF data attribution methods for convex learning problems. To the best of our knowledge, this is the first analysis of these questions that does not assume global strong convexity and also the first explanation of [KATL19] and [RH25a]'s observation that NS data attribution is often more accurate than IF. We prove that for sufficiently well-behaved logistic regression, our bounds are asymptotically tight up to poly-logarithmic factors, yielding scaling laws for the errors in the average-case sample removals. \[ \mathbb{E}_{T \subseteq [n],\, |T| = k} \bigl[ \|\hat{\theta}_T - \hat{\theta}_T^{\mathrm{NS}}\|_2 \bigr] = \widetilde{\Theta}\!\left(\frac{k d}{n^2}\right), \qquad \mathbb{E}_{T \subseteq [n],\, |T| = k} \bigl[ \|\hat{\theta}_T^{\mathrm{NS}} - \hat{\theta}_T^{\mathrm{IF}}\|_2 \bigr] = \widetilde{\Theta}\!\left( \frac{(k + d)\sqrt{k d}}{n^2} \right). \]

#31 Robust Variational Bayes by Min-Max Median Aggregation

著者: Jiawei Yan, Ju Liu, Weidong Liu, Jiyuan Tu

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12676

要約:
We propose a robust and scalable variational Bayes (VB) framework designed to effectively handle contamination and outliers in dataset. Our approach partitions the data into $m$ disjoint subsets and formulates a joint optimization problem based on robust aggregation principles. A key insight is that the full posterior distribution is equivalent to the minimizer of the mean Kullback-Leibler (KL) divergence from the $m$-powered local posterior distributions. To enhance robustness, we replace the mean KL divergence with a min-max median formulation. The min-max formulation not only ensures consistency between the KL minimizer and the Evidence Lower Bound (ELBO) maximizer but also facilitates the establishment of improved statistical rates for the mean of variational posterior. We observe a notable discrepancy in the $m$-powered marginal log likelihood function contingent on the presence of local latent variables. To address this, we treat these two scenarios separately to guarantee the consistency of the aggregated variational posterior. Specifically, when local latent variables are present, we introduce an aggregate-and-rescale strategy. Theoretically, we provide a non-asymptotic analysis of our proposed posterior, incorporating a refined analysis of Bernstein-von Mises (BvM) theorem to accommodate a diverging number of subsets $m$. Our findings indicate that the two-stage approach yields a smaller approximation error compared to directly aggregating the $m$-powered local posteriors. Furthermore, we establish a nearly optimal statistical rate for the mean of the proposed posterior, advancing existing theories related to min-max median estimators. The efficacy of our method is demonstrated through extensive simulation studies.

#32 Complexity of Markov Chain Monte Carlo for Generalized Linear Models

著者: Martin Chak, Giacomo Zanella

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12748

要約:
Markov Chain Monte Carlo (MCMC), Laplace approximation (LA) and variational inference (VI) methods are popular approaches to Bayesian inference, each with trade-offs between computational cost and accuracy. However, a theoretical understanding of these differences is missing, particularly when both the sample size $n$ and the dimension $d$ are large. LA and Gaussian VI are justified by Bernstein-von Mises (BvM) theorems, and recent work has derived the characteristic condition $n\gg d^2$ for their validity, improving over the condition $n\gg d^3$. In this paper, we show for linear, logistic and Poisson regression that for $n\gtrsim d$, MCMC attains the same complexity scaling in $n$, $d$ as first-order optimization algorithms, up to sub-polynomial factors. Thus MCMC is competitive with LA and Gaussian VI in complexity, under a scaling between $n$ and $d$ more general than BvM regimes. Our complexities apply to appropriately scaled priors that are not necessarily Gaussian-tailed, including Student-$t$ and flat priors, with log-posteriors that are not necessarily globally concave or gradient-Lipschitz.

#33 Flow-matching Operators for Residual-Augmented Probabilistic Learning of Partial Differential Equations

privacy

著者: Sahil Bhola, Karthik Duraisamy

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12749

要約:
Learning probabilistic surrogates for PDEs remains challenging in data-scarce regimes: neural operators require large amounts of high-fidelity data, while generative approaches typically sacrifice resolution invariance. We formulate flow matching in an infinite-dimensional function space to learn a probabilistic transport that maps low-fidelity approximations to the manifold of high-fidelity PDE solutions via learned residual corrections. We develop a conditional neural operator architecture based on feature-wise linear modulation for flow-matching vector fields directly in function space, enabling inference at arbitrary spatial resolutions without retraining. To improve stability and representational control of the induced neural ODE, we parameterize the flow vector field as a sum of a linear operator and a nonlinear operator, combining lightweight linear components with a conditioned Fourier neural operator for expressive, input-dependent dynamics. We then formulate a residual-augmented learning strategy where the flow model learns probabilistic corrections from inexpensive low-fidelity surrogates to high-fidelity solutions, rather than learning the full solution mapping from scratch. Finally, we derive tractable training objectives that extend conditional flow matching to the operator setting with input-function-dependent couplings. To demonstrate the effectiveness of our approach, we present numerical experiments on a range of PDEs, including the 1D advection and Burgers' equation, and a 2D Darcy flow problem for flow through a porous medium. We show that the proposed method can accurately learn solution operators across different resolutions and fidelities and produces uncertainty estimates that appropriately reflect model confidence, even when trained on limited high-fidelity data.

#34 Diffusion Model-Based Posterior Sampling in Full Waveform Inversion

diffusion

著者: Mohammad H. Taufik, Tariq Alkhalifah

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12797

要約:
Bayesian full waveform inversion (FWI) offers uncertainty-aware subsurface models; however, posterior sampling directly on observed seismic shot records is rarely practical at the field scale because each sample requires numerous wave-equation solves. We aim to make such sampling feasible for large surveys while preserving calibration, that is, high uncertainty in less illuminated areas. Our approach couples diffusion-based posterior sampling with simultaneous-source FWI data. At each diffusion noise level, a network predicts a clean velocity model. We then apply a stochastic refinement step in model space using Langevin dynamics under the wave-equation likelihood and reintroduce noise to decouple successive levels before proceeding. Simultaneous-source batches reduce forward and adjoint solves approximately in proportion to the supergather size, while an unconditional diffusion prior trained on velocity patches and volumes helps suppress source-related numerical artefacts. We evaluate the method on three 2D synthetic datasets (SEG/EAGE Overthrust, SEG/EAGE Salt, SEAM Arid), a 2D field line, and a 3D upscaling study. Relative to a particle-based variational baseline, namely Stein variational gradient descent without a learned prior and with single-source (non-simultaneous-source) FWI, our sampler achieves lower model error and better data fit at a substantially reduced computational cost. By aligning encoded-shot likelihoods with diffusion-based sampling and exploiting straightforward parallelization over samples and source batches, the method provides a practical path to calibrated posterior inference on observed shot records that scales to large 2D and 3D problems.

#35 Unsupervised learning of multiscale switching dynamical system models from multimodal neural data

著者: DongKyu Kim, Han-Lin Hsieh, Maryam M. Shanechi

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12881

要約:
Neural population activity often exhibits regime-dependent non-stationarity in the form of switching dynamics. Learning accurate switching dynamical system models can reveal how behavior is encoded in neural activity. Existing switching approaches have primarily focused on learning models from a single neural modality, either continuous Gaussian signals or discrete Poisson signals. However, multiple neural modalities are often recorded simultaneously to measure different spatiotemporal scales of brain activity, and all these modalities can encode behavior. Moreover, regime labels are typically unavailable in training data, posing a significant challenge for learning models of regime-dependent switching dynamics. To address these challenges, we develop a novel unsupervised learning algorithm that learns the parameters of switching multiscale dynamical system models using only multiscale neural observations. We demonstrate our method using both simulations and two distinct experimental datasets with multimodal spike-LFP observations during different motor tasks. We find that our switching multiscale dynamical system models more accurately decode behavior than switching single-scale dynamical models, showing the success of multiscale neural fusion. Further, our models outperform stationary multiscale models, illustrating the importance of tracking regime-dependent non-stationarity in multimodal neural data. The developed unsupervised learning framework enables more accurate modeling of complex multiscale neural dynamics by leveraging information in multimodal recordings while incorporating regime switches. This approach holds promise for improving the performance and robustness of brain-computer interfaces over time and for advancing our understanding of the neural basis of behavior.

#36 Interpretable Hypothesis-Driven Trading:A Rigorous Walk-Forward Validation Framework for Market Microstructure Signals

著者: Gagan Deep, Akash Deep, William Lamptey

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12924

要約:
We develop a rigorous walk-forward validation framework for algorithmic trading designed to mitigate overfitting and lookahead bias. Our methodology combines interpretable hypothesis-driven signal generation with reinforcement learning and strict out-of-sample testing. The framework enforces strict information set discipline, employs rolling window validation across 34 independent test periods, maintains complete interpretability through natural language hypothesis explanations, and incorporates realistic transaction costs and position constraints. Validating five market microstructure patterns across 100 US equities from 2015 to 2024, the system yields modest annualized returns (0.55%, Sharpe ratio 0.33) with exceptional downside protection (maximum drawdown -2.76%) and market-neutral characteristics (beta = 0.058). Performance exhibits strong regime dependence, generating positive returns during high-volatility periods (0.60% quarterly, 2020-2024) while underperforming in stable markets (-0.16%, 2015-2019). We report statistically insignificant aggregate results (p-value 0.34) to demonstrate a reproducible, honest validation protocol that prioritizes interpretability and extends naturally to advanced hypothesis generators, including large language models. The key empirical finding reveals that daily OHLCV-based microstructure signals require elevated information arrival and trading activity to function effectively. The framework provides complete mathematical specifications and open-source implementation, establishing a template for rigorous trading system evaluation that addresses the reproducibility crisis in quantitative finance research. For researchers, practitioners, and regulators, this work demonstrates that interpretable algorithmic trading strategies can be rigorously validated without sacrificing transparency or regulatory compliance.

#37 Understanding When Graph Convolutional Networks Help: A Diagnostic Study on Label Scarcity and Structural Properties

著者: Nischal Subedi, Ember Kerstetter, Winnie Li, Silo Murphy

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12947

要約:
Graph Convolutional Networks (GCNs) have become a standard approach for semi-supervised node classification, yet practitioners lack clear guidance on when GCNs provide meaningful improvements over simpler baselines. We present a diagnostic study using the Amazon Computers co-purchase data to understand when and why GCNs help. Through systematic experiments with simulated label scarcity, feature ablation, and per-class analysis, we find that GCN performance depends critically on the interaction between graph homophily and feature quality. GCNs provide the largest gains under extreme label scarcity, where they leverage neighborhood structure to compensate for limited supervision. Surprisingly, GCNs can match their original performance even when node features are replaced with random noise, suggesting that structure alone carries sufficient signal on highly homophilous graphs. However, GCNs hurt performance when homophily is low and features are already strong, as noisy neighbors corrupt good predictions. Our quadrant analysis reveals that GCNs help in three of four conditions and only hurt when low homophily meets strong features. These findings offer practical guidance for practitioners deciding whether to adopt graph-based methods.

#38 Cycles Communities from the Perspective of Dendrograms and Gradient Sampling

著者: Sixtus Dakurah

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12974

要約:
Identifying and comparing topological features, particularly cycles, across different topological objects remains a fundamental challenge in persistent homology and topological data analysis. This work introduces a novel framework for constructing cycle communities through two complementary approaches. First, a dendrogram-based methodology leverages merge-tree algorithms to construct hierarchical representations of homology classes from persistence intervals. The Wasserstein distance on merge trees is introduced as a metric for comparing dendrograms, establishing connections to hierarchical clustering frameworks. Through simulation studies, the discriminative power of dendrogram representations for identifying cycle communities is demonstrated. Second, an extension of Stratified Gradient Sampling simultaneously learns multiple filter functions that yield cycle barycenter functions capable of faithfully reconstructing distinct sets of cycles. The set of cycles each filter function can reconstruct constitutes cycle communities that are non-overlapping and partition the space of all cycles. Together, these approaches transform the problem of cycle matching into both a hierarchical clustering and topological optimization framework, providing principled methods to identify similar topological structures both within and across groups of topological objects.

#39 CoDeQ: End-to-End Joint Model Compression with Dead-Zone Quantizer for High-Sparsity and Low-Precision Networks

著者: Jonathan Wensh{\o}j, Tong Chen, Bob Pepin, Raghavendra Selvan

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12981

要約:
While joint pruning--quantization is theoretically superior to sequential application, current joint methods rely on auxiliary procedures outside the training loop for finding compression parameters. This reliance adds engineering complexity and hyperparameter tuning, while also lacking a direct data-driven gradient signal, which might result in sub-optimal compression. In this paper, we introduce CoDeQ, a simple, fully differentiable method for joint pruning--quantization. Our approach builds on a key observation: the dead-zone of a scalar quantizer is equivalent to magnitude pruning, and can be used to induce sparsity directly within the quantization operator. Concretely, we parameterize the dead-zone width and learn it via backpropagation, alongside the quantization parameters. This design provides explicit control of sparsity, regularized by a single global hyperparameter, while decoupling sparsity selection from bit-width selection. The result is a method for Compression with Dead-zone Quantizer (CoDeQ) that supports both fixed-precision and mixed-precision quantization (controlled by an optional second hyperparameter). It simultaneously determines the sparsity pattern and quantization parameters in a single end-to-end optimization. Consequently, CoDeQ does not require any auxiliary procedures, making the method architecture-agnostic and straightforward to implement. On ImageNet with ResNet-18, CoDeQ reduces bit operations to ~5% while maintaining close to full precision accuracy in both fixed and mixed-precision regimes.

#40 A Bayesian approach to learning mixtures of nonparametric components

著者: Yilei Zhang, Yun Wei, Aritra Guha, XuanLong Nguyen

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12988

要約:
Mixture models are widely used in modeling heterogeneous data populations. A standard approach of mixture modeling is to assume that the mixture component takes a parametric kernel form, while the flexibility of the model can be obtained by using a large or possibly unbounded number of such parametric kernels. In many applications, making parametric assumptions on the latent subpopulation distributions may be unrealistic, which motivates the need for nonparametric modeling of the mixture components themselves. In this paper we study finite mixtures with nonparametric mixture components, using a Bayesian nonparametric modeling approach. In particular, it is assumed that the data population is generated according to a finite mixture of latent component distributions, where each component is endowed with a Bayesian nonparametric prior such as the Dirichlet process mixture. We present conditions under which the individual mixture component's distributions can be identified, and establish posterior contraction behavior for the data population's density, as well as densities of the latent mixture components. We develop an efficient MCMC algorithm for posterior inference and demonstrate via simulation studies and real-world data illustrations that it is possible to efficiently learn complex distributions for the latent subpopulations. In theory, the posterior contraction rate of the component densities is nearly polynomial, which is a significant improvement over the logarithm convergence rate of estimating mixing measures via deconvolution.

#41 Multi-fidelity aerodynamic data fusion by autoencoder transfer learning

著者: Javier Nieto-Centenero, Esther Andr\'es, Rodrigo Castellanos

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.13069

要約:
Accurate aerodynamic prediction often relies on high-fidelity simulations; however, their prohibitive computational costs severely limit their applicability in data-driven modeling. This limitation motivates the development of multi-fidelity strategies that leverage inexpensive low-fidelity information without compromising accuracy. Addressing this challenge, this work presents a multi-fidelity deep learning framework that combines autoencoder-based transfer learning with a newly developed Multi-Split Conformal Prediction (MSCP) strategy to achieve uncertainty-aware aerodynamic data fusion under extreme data scarcity. The methodology leverages abundant Low-Fidelity (LF) data to learn a compact latent physics representation, which acts as a frozen knowledge base for a decoder that is subsequently fine-tuned using scarce HF samples. Tested on surface-pressure distributions for NACA airfoils (2D) and a transonic wing (3D) databases, the model successfully corrects LF deviations and achieves high-accuracy pressure predictions using minimal HF training data. Furthermore, the MSCP framework produces robust, actionable uncertainty bands with pointwise coverage exceeding 95%. By combining extreme data efficiency with uncertainty quantification, this work offers a scalable and reliable solution for aerodynamic regression in data-scarce environments.

#42 Stopping Rules for Stochastic Gradient Descent via Anytime-Valid Confidence Sequences

著者: Liviu Aolaritei, Michael I. Jordan

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.13123

要約:
We study stopping rules for stochastic gradient descent (SGD) for convex optimization from the perspective of anytime-valid confidence sequences. Classical analyses of SGD provide convergence guarantees in expectation or at a fixed horizon, but offer no statistically valid way to assess, at an arbitrary time, how close the current iterate is to the optimum. We develop an anytime-valid, data-dependent upper confidence sequence for the weighted average suboptimality of projected SGD, constructed via nonnegative supermartingales and requiring no smoothness or strong convexity. This confidence sequence yields a simple stopping rule that is provably $\varepsilon$-optimal with probability at least $1-\alpha$ and is almost surely finite under standard stochastic approximation stepsizes. To the best of our knowledge, these are the first rigorous, time-uniform performance guarantees and finite-time $\varepsilon$-optimality certificates for projected SGD with general convex objectives, based solely on observable trajectory quantities.

#43 Enhancing Node-Level Graph Domain Adaptation by Alleviating Local Dependency

著者: Xinwei Tai, Dongmian Zou, Hongfei Wang

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.13149

要約:
Recent years have witnessed significant advancements in machine learning methods on graphs. However, transferring knowledge effectively from one graph to another remains a critical challenge. This highlights the need for algorithms capable of applying information extracted from a source graph to an unlabeled target graph, a task known as unsupervised graph domain adaptation (GDA). One key difficulty in unsupervised GDA is conditional shift, which hinders transferability. In this paper, we show that conditional shift can be observed only if there exists local dependencies among node features. To support this claim, we perform a rigorous analysis and also further provide generalization bounds of GDA when dependent node features are modeled using markov chains. Guided by the theoretical findings, we propose to improve GDA by decorrelating node features, which can be specifically implemented through decorrelated GCN layers and graph transformer layers. Our experimental results demonstrate the effectiveness of this approach, showing not only substantial performance enhancements over baseline GDA methods but also clear visualizations of small intra-class distances in the learned representations. Our code is available at https://github.com/TechnologyAiGroup/DFT

#44 Policy-Aligned Estimation of Conditional Average Treatment Effects

著者: Artem Timoshenko, Caio Waisman

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.13400

要約:
Firms often develop targeting policies to personalize marketing actions and improve incremental profits. Effective targeting depends on accurately separating customers with positive versus negative treatment effects. We propose an approach to estimate the conditional average treatment effects (CATEs) of marketing actions that aligns their estimation with the firm's profit objective. The method recognizes that, for many customers, treatment effects are so extreme that additional accuracy is unlikely to change the recommended actions. However, accuracy matters near the decision boundary, as small errors can alter targeting decisions. By modifying the firm's objective function in the standard profit maximization problem, our method yields a near-optimal targeting policy while simultaneously estimating CATEs. This introduces a new perspective on CATE estimation, reframing it as a problem of profit optimization rather than prediction accuracy. We establish the theoretical properties of the proposed method and demonstrate its performance and trade-offs using synthetic data.

#45 Multiclass Graph-Based Large Margin Classifiers: Unified Approach for Support Vectors and Neural Networks

著者: V\'itor M. Hanriot, Luiz C. B. Torres, Ant\^onio P. Braga

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.13410

要約:
While large margin classifiers are originally an outcome of an optimization framework, support vectors (SVs) can be obtained from geometric approaches. This article presents advances in the use of Gabriel graphs (GGs) in binary and multiclass classification problems. For Chipclass, a hyperparameter-less and optimization-less GG-based binary classifier, we discuss how activation functions and support edge (SE)-centered neurons affect the classification, proposing smoother functions and structural SV (SSV)-centered neurons to achieve margins with low probabilities and smoother classification contours. We extend the neural network architecture, which can be trained with backpropagation with a softmax function and a cross-entropy loss, or by solving a system of linear equations. A new subgraph-/distance-based membership function for graph regularization is also proposed, along with a new GG recomputation algorithm that is less computationally expensive than the standard approach. Experimental results with the Friedman test show that our method was better than previous GG-based classifiers and statistically equivalent to tree-based models.

#46 Learning under Distributional Drift: Reproducibility as an Intrinsic Statistical Resource

著者: Sofiya Zaichyk

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.13506

要約:
Statistical learning under distributional drift remains insufficiently characterized: when each observation alters the data-generating law, classical generalization bounds can collapse. We introduce a new statistical primitive, the reproducibility budget $C_T$, which quantifies a system's finite capacity for statistical reproducibility - the extent to which its sampling process can remain governed by a consistent underlying distribution in the presence of both exogenous change and endogenous feedback. Formally, $C_T$ is defined as the cumulative Fisher-Rao path length of the coupled learner-environment evolution, measuring the total distributional motion accumulated during learning. From this construct we derive a drift-feedback generalization bound of order $O(T^{-1/2} + C_T/T)$, and we prove a matching minimax lower bound showing that this rate is minimax-optimal. Consequently, the results establish a reproducibility speed limit: no algorithm can achieve smaller worst-case generalization error than that imposed by the average Fisher-Rao drift rate $C_T/T$ of the data-generating process. The framework situates exogenous drift, adaptive data analysis, and performative prediction within a common geometric structure, with $C_T$ emerging as the intrinsic quantity measuring distributional motion across these settings.

#47 A Class of Accelerated Fixed-Point-Based Methods with Delayed Inexact Oracles and Its Applications

著者: Nghia Nguyen-Trung, Quoc Tran-Dinh

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.13547

要約:
In this paper, we develop a novel accelerated fixed-point-based framework using delayed inexact oracles to approximate a fixed point of a nonexpansive operator (or equivalently, a root of a co-coercive operator), a central problem in scientific computing. Our approach leverages both Nesterov's acceleration technique and the Krasnosel'skii-Mann (KM) iteration, while accounting for delayed inexact oracles, a key mechanism in asynchronous algorithms. We also introduce a unified approximate error condition for delayed inexact oracles, which can cover various practical scenarios. Under mild conditions and appropriate parameter updates, we establish both $\mathcal{O}(1/k^2)$ non-asymptotic and $o(1/k^2)$ asymptotic convergence rates in expectation for the squared norm of residual. Our rate significantly improves the $\mathcal{O}(1/k)$ rates in classical KM-type methods, including their asynchronous variants. We also establish $o(1/k^2)$ almost sure convergence rates and the almost sure convergence of iterates to a solution of the problem. Within our framework, we instantiate three settings for the underlying operator: (i) a deterministic universal delayed oracle; (ii) a stochastic delayed oracle; and (iii) a finite-sum structure with asynchronous updates. For each case, we instantiate our framework to obtain a concrete algorithmic variant for which our convergence results still apply, and whose iteration complexity depends linearly on the maximum delay. Finally, we verify our algorithms and theoretical results through two numerical examples on both matrix game and shallow neural network training problems.

#48 Machine learning to optimize precision in the analysis of randomized trials: A journey in pre-specified, yet data-adaptive learning

著者: Laura B. Balzer, Mark J. van der Laan, Maya L. Petersen

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.13610

要約:
Covariate adjustment is an approach to improve the precision of trial analyses by adjusting for baseline variables that are prognostic of the primary endpoint. Motivated by the SEARCH Universal HIV Test-and-Treat Trial (2013-2017), we tell our story of developing, evaluating, and implementing a machine learning-based approach for covariate adjustment. We provide the rationale for as well as the practical concerns with such an approach for estimating marginal effects. Using schematics, we illustrate our procedure: targeted machine learning estimation (TMLE) with Adaptive Pre-specification. Briefly, sample-splitting is used to data-adaptively select the combination of estimators of the outcome regression (i.e., the conditional expectation of the outcome given the trial arm and covariates) and known propensity score (i.e., the conditional probability of being randomized to the intervention given the covariates) that minimizes the cross-validated variance estimate and, thereby, maximizes empirical efficiency. We discuss our approach for evaluating finite sample performance with parametric and plasmode simulations, pre-specifying the Statistical Analysis Plan, and unblinding in real-time on video conference with our colleagues from around the world. We present the results from applying our approach in the primary, pre-specified analysis of 8 recently published trials (2022-2024). We conclude with practical recommendations and an invitation to implement our approach in the primary analysis of your next trial.

#49 From Many Models, One: Macroeconomic Forecasting with Reservoir Ensembles

著者: Giovanni Ballarin, Lyudmila Grigoryeva, Yui Ching Li

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.13642

要約:
Model combination is a powerful approach to achieve superior performance with a set of models than by just selecting any single one. We study both theoretically and empirically the effectiveness of ensembles of Multi-Frequency Echo State Networks (MFESNs), which have been shown to achieve state-of-the-art macroeconomic time series forecasting results (Ballarin et al., 2024a). Hedge and Follow-the-Leader schemes are discussed, and their online learning guarantees are extended to the case of dependent data. In applications, our proposed Ensemble Echo State Networks show significantly improved predictive performance compared to individual MFESN models.

#50 Advancing Machine Learning Optimization of Chiral Photonic Metasurface: Comparative Study of Neural Network and Genetic Algorithm Approaches

著者: Davide Filippozzi, Alexandre Mayer, Nicolas Roy, Wei Fang, Arash Rahimi-Iman

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.13656

要約:
Chiral photonic metasurfaces provide unique capabilities for tailoring light-matter interactions, which are essential for next-generation photonic devices. Here, we report an advanced optimization framework that combines deep learning and evolutionary algorithms to significantly improve both the design and performance of chiral photonic nanostructures. Building on previous work utilizing a three-layer perceptron reinforced learning and stochastic evolutionary algorithm with decaying changes and mass extinction for chiral photonic optimization, our study introduces a refined pipeline featuring a two-output neural network architecture to reduce the trade-off between high chiral dichroism (CD) and reflectivity. Additionally, we use an improved fitness function, and efficient data augmentation techniques. A comparative analysis between a neural network (NN)-based approach and a genetic algorithm (GA) is presented for structures of different interface pattern depth, material combinations, and geometric complexity. We demonstrate a twice higher CD and the impact of both the corner number and the refractive index contrast at the example of a GaP/air and PMMA/air metasurface as a result of superior optimization performance. Additionally, a substantial increase in the number of structures explored within limited computational resources is highlighted, with tailored spectral reflectivity suggested by our electromagnetic simulations, paving the way for chiral mirrors applicable to polarization-selective light-matter interaction studies.

#51 Quantum oracles give an advantage for identifying classical counterfactuals

著者: Ciar\'an M. Gilligan-Lee, Y\`il\`e Y\=ing, Jonathan Richens, David Schmid

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.13692

要約:
We show that quantum oracles provide an advantage over classical oracles for answering classical counterfactual questions in causal models, or equivalently, for identifying unknown causal parameters such as distributions over functional dependences. In structural causal models with discrete classical variables, observational data and even ideal interventions generally fail to answer all counterfactual questions, since different causal parameters can reproduce the same observational and interventional data while disagreeing on counterfactuals. Using a simple binary example, we demonstrate that if the classical variables of interest are encoded in quantum systems and the causal dependence among them is encoded in a quantum oracle, coherently querying the oracle enables the identification of all causal parameters -- hence all classical counterfactuals. We generalize this to arbitrary finite cardinalities and prove that coherent probing 1) allows the identification of all two-way joint counterfactuals p(Y_x=y, Y_{x'}=y'), which is not possible with any number of queries to a classical oracle, and 2) provides tighter bounds on higher-order multi-way counterfactuals than with a classical oracle. This work can also be viewed as an extension to traditional quantum oracle problems such as Deutsch--Jozsa to identifying more causal parameters beyond just, e.g., whether a function is constant or balanced. Finally, we raise the question of whether this quantum advantage relies on uniquely non-classical features like contextuality. We provide some evidence against this by showing that in the binary case, oracles in some classically-explainable theories like Spekkens' toy theory also give rise to a counterfactual identifiability advantage over strictly classical oracles.

#52 Improved Generalization Bounds for Transductive Learning by Transductive Local Complexity and Its Applications

著者: Yingzhen Yang

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2309.16858

要約:
We introduce Transductive Local Complexity (TLC) as a new tool for analyzing the generalization performance of transductive learning methods. Our work extends the classical Local Rademacher Complexity (LRC) to the transductive setting, incorporating substantial and novel components. Although LRC has been used to obtain sharp generalization bounds and minimax rates for inductive tasks such as classification and nonparametric regression, it has remained an open problem whether a localized Rademacher complexity framework can be effectively adapted to transductive learning to achieve sharp or nearly sharp bounds consistent with inductive results. We provide an affirmative answer via TLC. TLC is constructed by first deriving a new concentration inequality in Theorem 4.1 for the supremum of empirical processes capturing the gap between test and training losses, termed the test-train process, under uniform sampling without replacement, which leverages a novel combinatorial property of the test-train process and a new proof strategy applying the exponential Efron-Stein inequality twice. A subsequent peeling strategy and a new surrogate variance operator then yield excess risk bounds in the transductive setting that are nearly consistent with classical LRC-based inductive bounds up to a logarithmic gap. We further advance transductive learning through two applications: (1) for realizable transductive learning over binary-valued classes with finite VC dimension of $\dVC$ and $u \ge m \ge \dVC$, where $u$ and $m$ are the number of test features and training features, our Theorem 6.1 gives a nearly optimal bound $\Theta(\dVC \log(me/\dVC)/m)$ matching the minimax rate $\Theta(\dVC/m)$ up to $\log m$, resolving a decade-old open question; and (2) Theorem 6.2 presents a sharper excess risk bound for transductive kernel learning compared to the current state-of-the-art.

#53 An Anytime Algorithm for Good Arm Identification

著者: Marc Jourdan, Andr\'ee Delahaye-Duriez, Cl\'emence R\'eda

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2310.10359

要約:
In good arm identification (GAI), the goal is to identify one arm whose average performance exceeds a given threshold, referred to as a good arm, if it exists. Few works have studied GAI in the fixed-budget setting when the sampling budget is fixed beforehand, or in the anytime setting, when a recommendation can be asked at any time. We propose APGAI, an anytime and parameter-free sampling rule for GAI in stochastic bandits. APGAI can be straightforwardly used in fixed-confidence and fixed-budget settings. First, we derive upper bounds on its probability of error at any time. They show that adaptive strategies can be more efficient in detecting the absence of good arms than uniform sampling in several diverse instances. Second, when APGAI is combined with a stopping rule, we prove upper bounds on the expected sampling complexity, holding at any confidence level. Finally, we show the good empirical performance of APGAI on synthetic and real-world data. Our work offers an extensive overview of the GAI problem in all settings.

#54 Markov Chain Gradient Descent in Hilbert Spaces

著者: Priyanka Roy, Susanne Saminger-Platz

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2410.08361

要約:
In this paper, we study a Markov chain-based stochastic gradient algorithm in general Hilbert spaces, aiming at approximating the optimal solution of a quadratic loss function. We establish probabilistic upper bounds on its convergence. We further extend these results to an online regularized learning algorithm in reproducing kernel Hilbert spaces, where the samples are drawn along a Markov chain trajectory.

#55 Self-test loss functions for learning weak-form operators and gradient flows

著者: Yuan Gao, Quanjun Lang, Fei Lu

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2412.03506

要約:
The construction of loss functions presents a major challenge in data-driven modeling involving weak-form operators in PDEs and gradient flows, particularly due to the need to select test functions appropriately. We address this challenge by introducing self-test loss functions, which employ test functions that depend on the unknown parameters, specifically for cases where the operator depends linearly on the unknowns. The proposed self-test loss function conserves energy for gradient flows and coincides with the expected log-likelihood ratio for stochastic differential equations. Importantly, it is quadratic, facilitating theoretical analysis of identifiability and well-posedness of the inverse problem, while also leading to efficient parametric or nonparametric regression algorithms. It is computationally simple, requiring only low-order derivatives or even being entirely derivative-free, and numerical experiments demonstrate its robustness against noisy and discrete data.

#56 Multi-View Oriented GPLVM: Expressiveness and Efficiency

著者: Zi Yang, Ying Li, Zhidi Lin, Michael Minyi Zhang, Pablo M. Olmos

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2502.08253

要約:
The multi-view Gaussian process latent variable model (MV-GPLVM) aims to learn a unified representation from multi-view data but is hindered by challenges such as limited kernel expressiveness and low computational efficiency. To overcome these issues, we first introduce a new duality between the spectral density and the kernel function. By modeling the spectral density with a bivariate Gaussian mixture, we then derive a generic and expressive kernel termed Next-Gen Spectral Mixture (NG-SM) for MV-GPLVMs. To address the inherent computational inefficiency of the NG-SM kernel, we design a new form of random Fourier feature approximation. Combined with a tailored reparameterization trick, this approximation enables scalable variational inference for both the model and the unified latent representations. Numerical evaluations across a diverse range of multi-view datasets demonstrate that our proposed method consistently outperforms state-of-the-art models in learning meaningful latent representations.

#57 Learning and Computation of $\Phi$-Equilibria at the Frontier of Tractability

著者: Brian Hu Zhang, Ioannis Anagnostides, Emanuel Tewolde, Ratip Emin Berker, Gabriele Farina, Vincent Conitzer, Tuomas Sandholm

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2502.18582

要約:
$\Phi$-equilibria -- and the associated notion of $\Phi$-regret -- are a powerful and flexible framework at the heart of online learning and game theory, whereby enriching the set of deviations $\Phi$ begets stronger notions of rationality. Recently, Daskalakis, Farina, Fishelson, Pipis, and Schneider (STOC '24) -- abbreviated as DFFPS -- settled the existence of efficient algorithms when $\Phi$ contains only linear maps under a general, $d$-dimensional convex constraint set $\mathcal{X}$. In this paper, we significantly extend their work by resolving the case where $\Phi$ is $k$-dimensional; degree-$\ell$ polynomials constitute a canonical such example with $k = d^{O(\ell)}$. In particular, positing only oracle access to $\mathcal{X}$, we obtain two main positive results: i) a $\text{poly}(n, d, k, \text{log}(1/\epsilon))$-time algorithm for computing $\epsilon$-approximate $\Phi$-equilibria in $n$-player multilinear games, and ii) an efficient online algorithm that incurs average $\Phi$-regret at most $\epsilon$ using $\text{poly}(d, k)/\epsilon^2$ rounds. We also show nearly matching lower bounds in the online learning setting, thereby obtaining for the first time a family of deviations that captures the learnability of $\Phi$-regret. From a technical standpoint, we extend the framework of DFFPS from linear maps to the more challenging case of maps with polynomial dimension. At the heart of our approach is a polynomial-time algorithm for computing an expected fixed point of any $\phi : \mathcal{X} \to \mathcal{X}$ based on the ellipsoid against hope (EAH) algorithm of Papadimitriou and Roughgarden (JACM '08). In particular, our algorithm for computing $\Phi$-equilibria is based on executing EAH in a nested fashion -- each step of EAH itself being implemented by invoking a separate call to EAH.

#58 CRPS-Based Targeted Sequential Design with Application in Chemical Space

著者: Lea Friedli, Ath\'ena\"is Gautier, Anna Broccard, David Ginsbourger

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2503.11250

要約:
Sequential design of real and computer experiments via Gaussian Process (GP) models has proven useful for parsimonious, goal-oriented data acquisition purposes. In this work, we focus on acquisition strategies for a GP model that needs to be accurate within a predefined range of the response of interest. Such an approach is useful in various fields including synthetic chemistry, where finding molecules with particular properties is essential for developing useful materials and effective medications. GP modeling and sequential design of experiments have been successfully applied to a plethora of domains, including molecule research. Our main contribution here is to use the threshold-weighted Continuous Ranked Probability Score (CRPS) as a basic building block for acquisition functions employed within sequential design. We study pointwise and integral criteria relying on two different weighting measures and benchmark them against competitors, demonstrating improved performance with respect to considered goals. The resulting acquisition strategies are applicable to a wide range of fields and pave the way to further developing sequential design relying on scoring rules.

#59 Credal Prediction based on Relative Likelihood

著者: Timo L\"ohr, Paul Hofman, Felix Mohr, Eyke H\"ullermeier

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.22332

要約:
Predictions in the form of sets of probability distributions, so-called credal sets, provide a suitable means to represent a learner's epistemic uncertainty. In this paper, we propose a theoretically grounded approach to credal prediction based on the statistical notion of relative likelihood: The target of prediction is the set of all (conditional) probability distributions produced by the collection of plausible models, namely those models whose relative likelihood exceeds a specified threshold. This threshold has an intuitive interpretation and allows for controlling the trade-off between correctness and precision of credal predictions. We tackle the problem of approximating credal sets defined in this way by means of suitably modified ensemble learning techniques. To validate our approach, we illustrate its effectiveness by experiments on benchmark datasets demonstrating superior uncertainty representation without compromising predictive performance. We also compare our method against several state-of-the-art baselines in credal prediction.

#60 Extreme mass distributions for quasi-copulas

著者: Matja\v{z} Omladi\v{c}, Martin Vuk, Alja\v{z} Zalar

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2506.20672

要約:
A recent survey, nicknamed "Hitchhiker's Guide", J.J. Arias-Garc{\i}a, R. Mesiar, and B. De Baets, A hitchhiker's guide to quasi-copulas, Fuzzy Sets and Systems 393 (2020) 1-28, has raised the rating of quasi-copula problems in the dependence modeling community in spite of the lack of statistical interpretation of quasi-copulas. In our previous work (Fuzzy Sets and Systems 517 (2025) 109457), we addressed the question of extreme values of the mass distribution associated with multivariate quasi-copulas. Using a linear programming approach, we were able to solve Open Problem 5 of the "Guide" up to dimension d = 17 and disprove a recent conjecture on the solution to that problem. In this paper, we use an analytical approach to provide a complete answer to the original question.

#61 Debiasing Machine Learning Predictions for Causal Inference Without Additional Ground Truth Data: "One Map, Many Trials" in Satellite-Driven Poverty Analysis

著者: Markus B. Pettersson, Connor T. Jerzak, Adel Daoud

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2508.01341

要約:
Machine learning models trained on Earth observation data, such as satellite imagery, have demonstrated significant promise in predicting household-level wealth indices, enabling the creation of high-resolution wealth maps that can be leveraged across multiple causal trials while addressing chronic data scarcity in global development research. However, because standard training objectives prioritize overall predictive accuracy, these predictions often suffer from shrinkage toward the mean, leading to attenuated estimates of causal treatment effects and limiting their utility in policy evaluations. Existing debiasing methods, such as Prediction-Powered Inference (PPI), can handle this attenuation bias but require additional fresh ground-truth data at the downstream stage of causal inference, which restricts their applicability in data-scarce environments. We introduce and evaluate two post-hoc correction methods -- Linear Calibration Correction (LCC) and a Tweedie's correction approach -- that substantially reduce shrinkage-induced prediction bias without relying on newly collected labeled data. LCC applies a simple linear transformation estimated on a held-out calibration split; Tweedie's method locally de-shrink predictions using density score estimates and a noise scale learned upstream. We provide practical diagnostics for when a correction is warranted and discuss practical limitations. Across analytical results, simulations, and experiments with Demographic and Health Surveys (DHS) data, both approaches reduce attenuation; Tweedie's correction yields nearly unbiased treatment-effect estimates, enabling a "one map, many trials" paradigm. Although we demonstrate on EO-ML wealth mapping, the methods are not geospatial-specific: they apply to any setting where imputed outcomes are reused downstream (e.g., pollution indices, population density, or LLM-derived indicators).

#62 A PyTorch Framework for Scalable Non-Crossing Quantile Regression

著者: Kaihua Chang

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.22419

要約:
Quantile regression is fundamental to distributional modeling, yet independent estimation of multiple quantiles frequently produces crossing -- where estimated quantile functions violate monotonicity, implying impossible negative probability densities. While Constrained Joint Quantile Regression (CJQR) elegantly enforces non-crossing by construction, existing formulations via Linear Programming exhibit $O((qn)^3)$ complexity, rendering them impractical for large-scale applications. We present the first scalable solution using PyTorch automatic differentiation: \textbf{CJQR-ALM}, combining the \textbf{Augmented Lagrangian Method} with \textbf{differentiable pinball loss} and \textbf{L-BFGS} optimization. Our approach reduces computational complexity to $O(n)$, achieving near-zero crossing rates on datasets exceeding 70,000 observations within minutes. The differentiable formulation naturally extends to neural network architectures for non-linear conditional quantile estimation. Application to Student Growth Percentile calculations demonstrates practical utility for educational assessment, while simulation studies show negligible accuracy cost (RMSE increase $\approx 2.4$ points) relative to unconstrained estimation -- a favorable trade-off for applications requiring valid probability statements across finance, healthcare, and engineering.

#63 Optimal Convergence Analysis of DDPM for General Distributions

著者: Yuchen Jiao, Yuchen Zhou, Gen Li

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.27562

要約:
Score-based diffusion models have achieved remarkable empirical success in generating high-quality samples from target data distributions. Among them, the Denoising Diffusion Probabilistic Model (DDPM) is one of the most widely used samplers, generating samples via estimated score functions. Despite its empirical success, a tight theoretical understanding of DDPM -- especially its convergence properties -- remains limited. In this paper, we provide a refined convergence analysis of the DDPM sampler and establish near-optimal convergence rates under general distributional assumptions. Specifically, we introduce a relaxed smoothness condition parameterized by a constant $L$, which is small for many practical distributions (e.g., Gaussian mixture models). We prove that the DDPM sampler with accurate score estimates achieves a convergence rate of $$\widetilde{O}\left(\frac{d\min\{d,L^2\}}{T^2}\right)~\text{in Kullback-Leibler divergence},$$ where $d$ is the data dimension, $T$ is the number of iterations, and $\widetilde{O}$ hides polylogarithmic factors in $T$. This result substantially improves upon the best-known $d^2/T^2$ rate when $L < \sqrt{d}$. By establishing a matching lower bound, we show that our convergence analysis is tight for a wide array of target distributions. Moreover, it reveals that DDPM and DDIM share the same dependence on $d$, raising an interesting question of why DDIM often appears empirically faster.

#64 Branching Flows: Discrete, Continuous, and Manifold Flow Matching with Splits and Deletions

著者: Lukas Billera, Hedwig Nora Nordlinder, Jack Collier Ryder, Anton Oresten, Aron St{\aa}lmarck, Theodor Mosetti Bj\"ork, Ben Murrell

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.09465

要約:
Diffusion and flow matching approaches to generative modeling have shown promise in domains where the state space is continuous, such as image generation or protein folding & design, and discrete, exemplified by diffusion large language models. They offer a natural fit when the number of elements in a state is fixed in advance (e.g. images), but require ad hoc solutions when, for example, the length of a response from a large language model, or the number of amino acids in a protein chain is not known a priori. Here we propose Branching Flows, a generative modeling framework that, like diffusion and flow matching approaches, transports a simple distribution to the data distribution. But in Branching Flows, the elements in the state evolve over a forest of binary trees, branching and dying stochastically with rates that are learned by the model. This allows the model to control, during generation, the number of elements in the sequence. We also show that Branching Flows can compose with any flow matching base process on discrete sets, continuous Euclidean spaces, smooth manifolds, and `multimodal' product spaces that mix these components. We demonstrate this in three domains: small molecule generation (multimodal), antibody sequence generation (discrete), and protein backbone generation (multimodal), and show that Branching Flows is a capable distribution learner with a stable learning objective, and that it enables new capabilities.

#65 The Optimal Approximation Factor in Density Estimation

著者: Olivier Bousquet, Daniel Kane, Shay Moran

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/1902.05876

要約:
Consider the following problem: given two arbitrary densities $q_1,q_2$ and a sample-access to an unknown target density $p$, find which of the $q_i$'s is closer to $p$ in total variation. A remarkable result due to Yatracos shows that this problem is tractable in the following sense: there exists an algorithm that uses $O(\epsilon^{-2})$ samples from $p$ and outputs~$q_i$ such that with high probability, $TV(q_i,p) \leq 3\cdot\mathsf{opt} + \epsilon$, where $\mathsf{opt}= \min\{TV(q_1,p),TV(q_2,p)\}$. Moreover, this result extends to any finite class of densities $\mathcal{Q}$: there exists an algorithm that outputs the best density in $\mathcal{Q}$ up to a multiplicative approximation factor of 3. We complement and extend this result by showing that: (i) the factor 3 can not be improved if one restricts the algorithm to output a density from $\mathcal{Q}$, and (ii) if one allows the algorithm to output arbitrary densities (e.g.\ a mixture of densities from $\mathcal{Q}$), then the approximation factor can be reduced to 2, which is optimal. In particular this demonstrates an advantage of improper learning over proper in this setup. We develop two approaches to achieve the optimal approximation factor of 2: an adaptive one and a static one. Both approaches are based on a geometric point of view of the problem and rely on estimating surrogate metrics to the total variation. Our sample complexity bounds exploit techniques from {\it Adaptive Data Analysis}.

#66 A Kernel-Based Approach for Modelling Gaussian Processes with Functional Information

著者: D. Andrew Brown, Peter Kiessler, John Nicholson

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2201.11023

要約:
Gaussian processes (GPs) are ubiquitous tools for modeling and predicting continuous processes in physical and engineering sciences. This is partly due to the fact that one may employ a Gaussian process as an interpolator while facilitating straightforward uncertainty quantification at other locations. In addition to training data, it is sometimes the case that available information is not in the form of a finite collection of points. For example, boundary value problems contain information on the boundary of a domain, or underlying physics lead to known behavior on an entire uncountable subset of the domain of interest. While an approximation to such known information may be obtained via pseudo-training points in the known subset, such a procedure is ad hoc with little guidance on the number of points to use, nor the behavior as the number of pseudo-observations grows large. We propose and construct Gaussian processes that unify, via reproducing kernel Hilbert space, the typical finite training data case with the case of having uncountable information by exploiting the equivalence of conditional expectation and orthogonal projections in Hilbert space. We show existence of the proposed process and establish that it is the limit of a conventional GP conditioned on an increasing number of training points. We illustrate the flexibility and advantages of our proposed approach via numerical experiments.

#67 Statistical inference for pairwise comparison models

著者: Ruijian Han, Wenlu Tang, Yiming Xu

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2401.08463

要約:
Pairwise comparison models have been widely used for utility evaluation and rank aggregation across various fields. The increasing scale of modern problems underscores the need to understand statistical inference in these models when the number of subjects diverges, a topic that is currently underexplored in the literature. To address this gap, this paper establishes a near-optimal asymptotic normality result for the maximum likelihood estimator in a broad class of pairwise comparison models. The key idea lies in identifying the Fisher information matrix as a weighted graph Laplacian, which can be studied via a meticulous spectral analysis. Our findings provide theoretical foundations for performing statistical inference in a wide range of pairwise comparison models beyond the Bradley--Terry model. Simulations utilizing synthetic data are conducted to validate the asymptotic normality result, followed by a hypothesis test using a tennis competition dataset.

#68 Imprecise Markov Semigroups and their Ergodicity

著者: Michele Caprio

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2405.00081

要約:
We introduce the concept of an imprecise Markov semigroup $\mathbf{Q}$. It is a tool that allows us to represent ambiguity around both the initial and the transition probabilities of a continuous-time Markov process via a compact collection of Markov semigroups, each associated with a (possibly different) Markov process. We use techniques from topology, geometry, and probability to study the ergodic behavior of $\mathbf{Q}$. We show that, under some conditions that also involve the geometry of the state space, eventually the ambiguity fades. We call this property ergodicity of the imprecise Markov semigroup, and we relate it to the classical notion of ergodicity. We prove ergodicity both when the state space is Euclidean or a Riemannian manifold, and when it is an arbitrary measurable space. The importance of our findings for the fields of artificial intelligence and computer vision is also discussed, in particular in the study of how the probability of an output evolves over time as we perturb the input of a convolutional autoencoder.

#69 Neural stochastic Volterra equations: learning path-dependent dynamics

著者: Martin Bergerhausen, David J. Pr\"omel, David Scheffels

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2407.19557

要約:
Stochastic Volterra equations (SVEs) serve as mathematical models for the time evolutions of random systems with memory effects and irregular behaviour. We introduce neural stochastic Volterra equations as a physics-inspired architecture, generalizing the class of neural stochastic differential equations, and provide some theoretical foundation. Numerical experiments on various SVEs, like the disturbed pendulum equation, the generalized Ornstein--Uhlenbeck process, the rough Heston model and a monetary reserve dynamics, are presented, comparing the performance of neural SVEs, neural SDEs and Deep Operator Networks (DeepONets).

#70 The Implicit Bias of Structured State Space Models Can Be Poisoned With Clean Labels

backdoor

著者: Yonatan Slutzky, Yotam Alexander, Noam Razin, Nadav Cohen

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2410.10473

要約:
Neural networks are powered by an implicit bias: a tendency of gradient descent to fit training data in a way that generalizes to unseen data. A recent class of neural network models gaining increasing popularity is structured state space models (SSMs), regarded as an efficient alternative to transformers. Prior work argued that the implicit bias of SSMs leads to generalization in a setting where data is generated by a low dimensional teacher. In this paper, we revisit the latter setting, and formally establish a phenomenon entirely undetected by prior work on the implicit bias of SSMs. Namely, we prove that while implicit bias leads to generalization under many choices of training data, there exist special examples whose inclusion in training completely distorts the implicit bias, to a point where generalization fails. This failure occurs despite the special training examples being labeled by the teacher, i.e. having clean labels! We empirically demonstrate the phenomenon, with SSMs trained independently and as part of non-linear neural networks. In the area of adversarial machine learning, disrupting generalization with cleanly labeled training examples is known as clean-label poisoning. Given the proliferation of SSMs, we believe that delineating their susceptibility to clean-label poisoning, and developing methods for overcoming this susceptibility, are critical research directions to pursue.

#71 Defending Collaborative Filtering Recommenders via Adversarial Robustness Based Edge Reweighting

著者: Yongyu Wang

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2412.10850

要約:
User based collaborative filtering (CF) relies on a user and user similarity graph, making it vulnerable to profile injection (shilling) attacks that manipulate neighborhood relations to promote (push) or demote (nuke) target items. In this work, we propose an adversarial robustness based edge reweighting defense for CF. We first assign each user and user edge a non robustness score via spectral adversarial robustness evaluation, which quantifies the edge sensitivity to adversarial perturbations. We then attenuate the influence of non robust edges by reweighting similarities during prediction. Extensive experiments demonstrate that the proposed method effectively defends against various types of attacks.

#72 Compact Neural Network Algorithm for Electrocardiogram Classification

著者: Mateo Frausto-Avila, Jos\'e Pablo Manriquez-Amavizca, Ana Karen Susana Rocha-Robledo, Mario A. Quiroz-Juarez, Alfred U'Ren

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2412.17852

要約:
In this paper, we present a powerful, compact electrocardiogram (ECG) classification algorithm for cardiac arrhythmia diagnosis that addresses the current reliance on deep learning and convolutional neural networks (CNNs) in ECG analysis. This work aims to reduce the demand for deep learning, which often requires extensive computational resources and large labeled datasets. Our approach introduces an artificial neural network (ANN) with a simple architecture combined with advanced feature engineering techniques. A key contribution of this work is the incorporation of 17 engineered features that enable the extraction of critical patterns from raw ECG signals. By integrating mathematical transformations, signal processing methods, and data extraction algorithms, our model captures the morphological and physiological characteristics of ECG signals with high efficiency, without requiring deep learning. Our method demonstrates a similar performance to other state-of-the-art models in classifying 4 types of arrhythmias, including atrial fibrillation, sinus tachycardia, sinus bradycardia, and ventricular flutter. Our algorithm achieved an accuracy of 97.36% on the MIT-BIH and St. Petersburg INCART arrhythmia databases. Our approach offers a practical and feasible solution for real-time diagnosis of cardiac disorders in medical applications, particularly in resource-limited environments.

#73 Random Reshuffling for Stochastic Gradient Langevin Dynamics

著者: Luke Shaw, Peter A. Whalley

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2501.16055

要約:
We examine the use of different randomisation policies for stochastic gradient algorithms used in sampling, based on first-order (or overdamped) Langevin dynamics, the most popular of which is known as Stochastic Gradient Langevin Dynamics. Conventionally, this algorithm is combined with a specific stochastic gradient strategy, called Robbins-Monro. In this work, we study an alternative strategy, Random Reshuffling, and show convincingly that it leads to improved performance via: a) a proof of reduced bias in the Wasserstein metric for strongly convex, gradient Lipschitz potentials; b) an analytical demonstration of reduced bias for a Gaussian model problem; and c) an empirical demonstration of reduced bias in numerical experiments for some logistic regression problems. This is especially important since Random Reshuffling is typically more efficient due to memory access and cache reasons. Such acceleration for the Random Reshuffling policy is familiar from the optimisation literature on stochastic gradient descent.

#74 An Algorithm to perform Covariance-Adjusted Support Vector Classification in Non-Euclidean Spaces

著者: Satyajeet Sahoo, Jhareswar Maiti

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2504.04371

要約:
Traditional Support Vector Machine (SVM) classification is carried out by finding the max-margin classifier for the training data that divides the mar-gin space into two equal sub-spaces. This study demonstrates limitations of performing Support Vector Classification in non-Euclidean spaces by estab-lishing that the underlying principle of max-margin classification and Karush Kuhn Tucker (KKT) boundary conditions are valid only in the Eu-clidean vector spaces, while in non-Euclidean spaces the principle of maxi-mum margin is a function of intra-class data covariance. The study estab-lishes a methodology to perform Support Vector Classification in Non-Euclidean Spaces by incorporating data covariance into the optimization problem using the transformation matrix obtained from Cholesky Decompo-sition of respective class covariance matrices, and shows that the resulting classifier obtained separates the margin space in ratio of respective class pop-ulation covariance. The study proposes an algorithm to iteratively estimate the population covariance-adjusted SVM classifier in non-Euclidean space from sample covariance matrices of the training data. The effectiveness of this SVM classification approach is demonstrated by applying the classifier on multiple datasets and comparing the performance with traditional SVM kernels and whitening algorithms. The Cholesky-SVM model shows marked improvement in the accuracy, precision, F1 scores and ROC performance compared to linear and other kernel SVMs.

#75 RGNMR: A Gauss-Newton method for robust matrix completion with theoretical guarantees

著者: Eilon Vaknin Laufer, Boaz Nadler

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.12919

要約:
Recovering a low rank matrix from a subset of its entries, some of which may be corrupted, is known as the robust matrix completion (RMC) problem. Existing RMC methods have several limitations: they require a relatively large number of observed entries; they may fail under overparametrization, when their assumed rank is higher than the correct one; and many of them fail to recover even mildly ill-conditioned matrices. In this paper we propose a novel RMC method, denoted $\texttt{RGNMR}$, which overcomes these limitations. $\texttt{RGNMR}$ is a simple factorization-based iterative algorithm, which combines a Gauss-Newton linearization with removal of entries suspected to be outliers. On the theoretical front, we prove that under suitable assumptions, $\texttt{RGNMR}$ is guaranteed exact recovery of the underlying low rank matrix. Our theoretical results improve upon the best currently known for factorization-based methods. On the empirical front, we show via several simulations the advantages of $\texttt{RGNMR}$ over existing RMC methods, and in particular its ability to handle a small number of observed entries, overparameterization of the rank and ill-conditioned matrices.

#76 Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems

著者: Christian Walder, Deep Karkhanis

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.15201

要約:
Reinforcement Learning (RL) algorithms sample multiple n>1 solution attempts for each problem and reward them independently. This optimizes for pass@1 performance and prioritizes the strength of isolated samples at the expense of the diversity and collective utility of sets of samples. This under-utilizes the sampling capacity, limiting exploration and eventual improvement on harder examples. As a fix, we propose Pass-at-k Policy Optimization (PKPO), a transformation on the final rewards which leads to direct optimization of pass@k performance, thus optimizing for sets of samples that maximize reward when considered jointly. Our contribution is to derive novel low variance unbiased estimators for pass@k and its gradient, in both the binary and continuous reward settings. We show optimization with our estimators reduces to standard RL with rewards that have been jointly transformed by a stable and efficient transformation function. While previous efforts are restricted to k=n, ours is the first to enable robust optimization of pass@k for any arbitrary k <= n. Moreover, instead of trading off pass@1 performance for pass@k gains, our method allows annealing k during training, optimizing both metrics and often achieving strong pass@1 numbers alongside significant pass@k gains. We validate our reward transformations on toy experiments, which reveal the variance reducing properties of our formulations. We also include real-world examples using the open-source LLM, GEMMA-2. We find that our transformation effectively optimizes for the target k. Furthermore, higher k values enable solving more and harder problems, while annealing k boosts both the pass@1 and pass@k . Crucially, for challenging task sets where conventional pass@1 optimization stalls, our pass@k approach unblocks learning, likely due to better exploration by prioritizing joint utility over the utility of individual samples.

#77 Meta-reinforcement learning with minimum attention

著者: Shashank Gupta, Pilhwa Lee

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.16741

要約:
Minimum attention applies the least action principle in the changes of control concerning state and time, first proposed by Brockett. The involved regularization is highly relevant in emulating biological control, such as motor learning. We apply minimum attention in reinforcement learning (RL) as part of the rewards and investigate its connection to meta-learning and stabilization. Specifically, model-based meta-learning with minimum attention is explored in high-dimensional nonlinear dynamics. Ensemble-based model learning and gradient-based meta-policy learning are alternately performed. Empirically, the minimum attention does show outperforming competence in comparison to the state-of-the-art algorithms of model-free and model-based RL, i.e., fast adaptation in few shots and variance reduction from the perturbations of the model and environment. Furthermore, the minimum attention demonstrates an improvement in energy efficiency.

#78 Variational Learning of Disentangled Representations

著者: Yuli Slavutsky, Ozgur Beker, David Blei, Bianca Dumitrascu

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2506.17182

要約:
Disentangled representations enable models to separate factors of variation that are shared across experimental conditions from those that are condition-specific. This separation is essential in domains such as biomedical data analysis, where generalization to new treatments, patients, or species depends on isolating stable biological signals from context-dependent effects. While extensions of the variational autoencoder (VAE) framework have been proposed to address this problem, they frequently suffer from leakage between latent representations, limiting their ability to generalize to unseen conditions. Here, we introduce DISCoVeR, a new variational framework that explicitly separates condition-invariant and condition-specific factors. DISCoVeR integrates three key components: (i) a dual-latent architecture that models shared and specific factors separately; (ii) two parallel reconstructions that ensure both representations remain informative; and (iii) a novel max-min objective that encourages clean separation without relying on handcrafted priors, while making only minimal assumptions. Theoretically, we show that this objective maximizes data likelihood while promoting disentanglement, and that it admits a unique equilibrium. Empirically, we demonstrate that DISCoVeR achieves improved disentanglement on synthetic datasets, natural images, and single-cell RNA-seq data. Together, these results establish DISCoVeR as a principled approach for learning disentangled representations in multi-condition settings.

#79 Reliability-Adjusted Prioritized Experience Replay

著者: Leonard S. Pleiss, Tobias Sutter, Maximilian Schiffer

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2506.18482

要約:
Experience replay enables data-efficient learning from past experiences in online reinforcement learning agents. Traditionally, experiences were sampled uniformly from a replay buffer, regardless of differences in experience-specific learning potential. In an effort to sample more efficiently, researchers introduced Prioritized Experience Replay (PER). In this paper, we propose an extension to PER by introducing a novel measure of temporal difference error reliability. We theoretically show that the resulting transition selection algorithm, Reliability-adjusted Prioritized Experience Replay (ReaPER), enables more efficient learning than PER. We further present empirical results showing that ReaPER outperforms PER across various environment types, including the Atari-10 benchmark.

#80 A Collectivist, Economic Perspective on AI

著者: Michael I. Jordan

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2507.06268

要約:
Information technology is in the midst of a revolution in which omnipresent data collection and machine learning are impacting the human world as never before. The word ``intelligence'' is being used as a North Star for the development of this technology, with human cognition viewed as a baseline. This view neglects the fact that humans are social animals and that much of our intelligence is social and cultural in origin. Moreover, failing to properly situate aspects of intelligence at the social level contributes to the treatment of the societal consequences of technology as an afterthought. The path forward is not merely more data and compute, and not merely more attention paid to cognitive or symbolic representations, but a thorough blending of economic and social concepts with computational and inferential concepts at the level of algorithm design.

#81 Reduced Order Modeling of Energetic Materials Using Physics-Aware Recurrent Convolutional Neural Networks in a Latent Space (LatentPARC)

著者: Zo\"e J. Gray, Joseph B. Choi, Youngsoo Choi, H. Keo Springer, H. S. Udaykumar, Stephen S. Baek

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.12401

要約:
Physics-aware deep learning (PADL) has gained popularity for use in complex spatiotemporal dynamics (field evolution) simulations, such as those that arise frequently in computational modeling of energetic materials (EM). Here, we show that the challenge PADL methods face while learning complex field evolution problems can be simplified and accelerated by decoupling it into two tasks: learning complex geometric features in evolving fields and modeling dynamics over these features in a lower dimensional feature space. To accomplish this, we build upon our previous work on physics-aware recurrent convolutions (PARC). PARC embeds knowledge of underlying physics into its neural network architecture for more robust and accurate prediction of evolving physical fields. PARC was shown to effectively learn complex nonlinear features such as the formation of hotspots and coupled shock fronts in various initiation scenarios of EMs, as a function of microstructures, serving effectively as a microstructure-aware burn model. In this work, we further accelerate PARC and reduce its computational cost by projecting the original dynamics onto a lower-dimensional invariant manifold, or 'latent space.' The projected latent representation encodes the complex geometry of evolving fields (e.g. temperature and pressure) in a set of data-driven features. The reduced dimension of this latent space allows us to learn the dynamics during the initiation of EM with a lighter and more efficient model. We observe a significant decrease in training and inference time while maintaining results comparable to PARC at inference. This work takes steps towards enabling rapid prediction of EM thermomechanics at larger scales and characterization of EM structure-property-performance linkages at a full application scale.

#82 Mamba Modulation: On the Length Generalization of Mamba

著者: Peng Lu, Jerry Huang, Qiuhao Zeng, Xinyu Wang, Boxing Chen, Philippe Langlais, Yufei Cui

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.19633

要約:
The quadratic complexity of the attention mechanism in Transformer models has motivated the development of alternative architectures with sub-quadratic scaling, such as state-space models. Among these, Mamba has emerged as a leading architecture, achieving state-of-the-art results across a range of language modeling tasks. However, Mamba's performance significantly deteriorates when applied to contexts longer than those seen during pre-training, revealing a sharp sensitivity to context length extension. Through detailed analysis, we attribute this limitation to the out-of-distribution behaviour of its state-space dynamics, particularly within the parameterization of the state transition matrix $\mathbf{A}$. Unlike recent works which attribute this sensitivity to the vanished accumulation of discretization time steps, $\exp(-\sum_{t=1}^N\Delta_t)$, we establish a connection between state convergence behavior as the input length approaches infinity and the spectrum of the transition matrix $\mathbf{A}$, offering a well-founded explanation of its role in length extension. Next, to overcome this challenge, we propose an approach that applies spectrum scaling to pre-trained Mamba models to enable robust long-context generalization by selectively modulating the spectrum of $\mathbf{A}$ matrices in each layer. We show that this can significantly improve performance in settings where simply modulating $\Delta_t$ fails, validating our insights and providing avenues for better length generalization of state-space models with structured transition matrices.

#83 Fixed Horizon Linear Quadratic Covariance Steering in Continuous Time with Hilbert-Schmidt Terminal Cost

著者: Tushar Sial, Abhishek Halder

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.21944

要約:
We formulate and solve the fixed horizon linear quadratic covariance steering problem in continuous time with a terminal cost measured in Hilbert-Schmidt (i.e., Frobenius) norm error between the desired and the controlled terminal covariances. For this problem, the necessary conditions of optimality become a coupled matrix ODE two-point boundary value problem. To solve this system of equations, we design a matricial recursive algorithm and prove its convergence. The proposed algorithm and its analysis make use of the linear fractional transforms parameterized by the state transition matrix of the associated Hamiltonian matrix. To illustrate the results, we provide two numerical examples: one with a two dimensional and another with a six dimensional state space.

#84 Spectral Concentration at the Edge of Stability: Information Geometry of Kernel Associative Memory

著者: Akira Tamamori

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.23083

要約:
High-capacity kernel Hopfield networks exhibit a \textit{Ridge of Optimization} characterized by extreme stability. While previously linked to \textit{Spectral Concentration}, its origin remains elusive. Here, we analyze the network dynamics on a statistical manifold, revealing that the Ridge corresponds to the Edge of Stability, a critical boundary where the Fisher Information Matrix becomes singular. We demonstrate that the apparent Euclidean force antagonism is a manifestation of \textit{Dual Equilibrium} in the Riemannian space. This unifies learning dynamics and capacity via the Minimum Description Length principle, offering a geometric theory of self-organized criticality.

#85 GraphBench: Next-generation graph learning benchmarking

著者: Timo Stoll, Chendi Qian, Ben Finkelshtein, Ali Parviz, Darius Weber, Fabrizio Frasca, Hadar Shavit, Antoine Siraudin, Arman Mielke, Marie Anastacio, Erik M\"uller, Maya Bechler-Speicher, Michael Bronstein, Mikhail Galkin, Holger Hoos, Mathias Niepert, Bryan Perozzi, Jan T\"onshoff, Christopher Morris

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.04475

要約:
Machine learning on graphs has recently achieved impressive progress in various domains, including molecular property prediction and chip design. However, benchmarking practices remain fragmented, often relying on narrow, task-specific datasets and inconsistent evaluation protocols, which hampers reproducibility and broader progress. To address this, we introduce GraphBench, a comprehensive benchmarking suite that spans diverse domains and prediction tasks, including node-level, edge-level, graph-level, and generative settings. GraphBench provides standardized evaluation protocols -- with consistent dataset splits and performance metrics that account for out-of-distribution generalization -- as well as a unified hyperparameter tuning framework. Additionally, we benchmark GraphBench using message-passing neural networks and graph transformer models, providing principled baselines and establishing a reference performance. See www.graphbench.io for further details.

#86 Autotune: fast, accurate, and automatic tuning parameter selection for Lasso

著者: Tathagata Sadhukhan, Ines Wilms, Stephan Smeekes, Sumanta Basu

公開日: Tue, 16 Dec 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.11139

要約:
Least absolute shrinkage and selection operator (Lasso), a popular method for high-dimensional regression, is now used widely for estimating high-dimensional time series models such as the vector autoregression (VAR). Selecting its tuning parameter efficiently and accurately remains a challenge, despite the abundance of available methods for doing so. We propose $\mathsf{autotune}$, a strategy for Lasso to automatically tune itself by optimizing a penalized Gaussian log-likelihood alternately over regression coefficients and noise standard deviation. Using extensive simulation experiments on regression and VAR models, we show that $\mathsf{autotune}$ is faster, and provides better generalization and model selection than established alternatives in low signal-to-noise regimes. In the process, $\mathsf{autotune}$ provides a new estimator of noise standard deviation that can be used for high-dimensional inference, and a new visual diagnostic procedure for checking the sparsity assumption on regression coefficients. Finally, we demonstrate the utility of $\mathsf{autotune}$ on a real-world financial data set. An R package based on C++ is also made publicly available on Github.

stat.ML updates on arXiv.org

📋 論文タイトル一覧