arXiv論文一覧 - stat.ML updates on arXiv.org

#1 BITS for GAPS: Bayesian Information-Theoretic Sampling for hierarchical GAussian Process Surrogates

著者: Kyla D. Jones, Alexander W. Dowling

公開日: Mon, 24 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.16815

要約:
We introduce the Bayesian Information-Theoretic Sampling for hierarchical GAussian Process Surrogates (BITS for GAPS) framework to emulate latent components in hybrid physical systems. BITS for GAPS supports serial hybrid modeling, where known physics governs part of the system and residual dynamics are represented as a latent function inferred from data. A Gaussian process prior is placed over the latent function, with hierarchical priors on its hyperparameters to encode physically meaningful structure in the predictive posterior. To guide data acquisition, we derive entropy-based acquisition functions that quantify expected information gain from candidate input locations, identifying samples most informative for training the surrogate. Specifically, we obtain a closed-form expression for the differential entropy of the predictive posterior and establish a tractable lower bound for efficient evaluation. These derivations approximate the predictive posterior as a finite, uniformly weighted mixture of Gaussian processes. We demonstrate the framework's utility by modeling activity coefficients in vapor-liquid equilibrium systems, embedding the surrogate into extended Raoult's law for distillation design. Numerical results show that entropy-guided sampling improves sample efficiency by targeting regions of high uncertainty and potential information gain. This accelerates surrogate convergence, enhances predictive accuracy in non-ideal regimes, and preserves physical consistency. Overall, BITS for GAPS provides an efficient, interpretable, and uncertainty-aware framework for hybrid modeling of complex physical systems.

#2 Efficient Penalty-Based Bilevel Methods: Improved Analysis, Novel Updates, and Flatness Condition

著者: Liuyuan Jiang, Quan Xiao, Lisha Chen, Tianyi Chen

公開日: Mon, 24 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.16796

要約:
Penalty-based methods have become popular for solving bilevel optimization (BLO) problems, thanks to their effective first-order nature. However, they often require inner-loop iterations to solve the lower-level (LL) problem and small outer-loop step sizes to handle the increased smoothness induced by large penalty terms, leading to suboptimal complexity. This work considers the general BLO problems with coupled constraints (CCs) and leverages a novel penalty reformulation that decouples the upper- and lower-level variables. This yields an improved analysis of the smoothness constant, enabling larger step sizes and reduced iteration complexity for Penalty-Based Gradient Descent algorithms in ALTernating fashion (ALT-PBGD). Building on the insight of reduced smoothness, we propose PBGD-Free, a novel fully single-loop algorithm that avoids inner loops for the uncoupled constraint BLO. For BLO with CCs, PBGD-Free employs an efficient inner-loop with substantially reduced iteration complexity. Furthermore, we propose a novel curvature condition describing the "flatness" of the upper-level objective with respect to the LL variable. This condition relaxes the traditional upper-level Lipschitz requirement, enables smaller penalty constant choices, and results in a negligible penalty gradient term during upper-level variable updates. We provide rigorous convergence analysis and validate the method's efficacy through hyperparameter optimization for support vector machines and fine-tuning of large language models.

#3 Diffusion-Inversion-Net (DIN): An End-to-End Direct Probabilistic Framework for Characterizing Hydraulic Conductivities and Quantifying Uncertainty

diffusion

著者: Xun Zhang, Weijie Yang, Jiangjiang Zhang, Simin Jiang

公開日: Mon, 24 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.16926

要約:
We propose the Diffusion-Inversion-Net (DIN) framework for inverse modeling of groundwater flow and solute transport processes. DIN utilizes an offline-trained Denoising Diffusion Probabilistic Model (DDPM) as a powerful prior leaner, which flexibly incorporates sparse, multi-source observational data, including hydraulic head, solute concentration, and hard conductivity data, through conditional injection mechanisms. These conditioning inputs subsequently guide the generative inversion process during sampling. Bypassing iterative forward simulations, DIN leverages stochastic sampling and probabilistic modeling mechanisms to directly generate ensembles of posterior parameter fields by repeatedly executing the reverse denoising process. Two representative posterior scenarios, Gaussian and non-Gaussian, are investigated. The results demonstrate that DIN can produce multiple constraint-satisfying realizations under identical observational conditions, accurately estimate hydraulic-conductivity fields, and achieve reliable uncertainty quantification. The framework exhibits strong generalization capability across diverse data distributions, offering a robust and unified alternative to conventional multi-stage inversion methodologies.

#4 Gradient flow for deep equilibrium single-index models

著者: Sanjit Dandapanthula, Aaditya Ramdas

公開日: Mon, 24 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.16976

要約:
Deep equilibrium models (DEQs) have recently emerged as a powerful paradigm for training infinitely deep weight-tied neural networks that achieve state of the art performance across many modern machine learning tasks. Despite their practical success, theoretically understanding the gradient descent dynamics for training DEQs remains an area of active research. In this work, we rigorously study the gradient descent dynamics for DEQs in the simple setting of linear models and single-index models, filling several gaps in the literature. We prove a conservation law for linear DEQs which implies that the parameters remain trapped on spheres during training and use this property to show that gradient flow remains well-conditioned for all time. We then prove linear convergence of gradient descent to a global minimizer for linear DEQs and deep equilibrium single-index models under appropriate initialization and with a sufficiently small step size. Finally, we validate our theoretical findings through experiments.

#5 DAPS++: Rethinking Diffusion Inverse Problems with Decoupled Posterior Annealing

diffusion

著者: Hao Chen, Renzheng Zhang, Scott S. Howard

公開日: Mon, 24 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.17038

要約:
From a Bayesian perspective, score-based diffusion solves inverse problems through joint inference, embedding the likelihood with the prior to guide the sampling process. However, this formulation fails to explain its practical behavior: the prior offers limited guidance, while reconstruction is largely driven by the measurement-consistency term, leading to an inference process that is effectively decoupled from the diffusion dynamics. To clarify this structure, we reinterpret the role of diffusion in inverse problem solving as an initialization stage within an expectation--maximization (EM)--style framework, where the diffusion stage and the data-driven refinement are fully decoupled. We introduce \textbf{DAPS++}, which allows the likelihood term to guide inference more directly while maintaining numerical stability and providing insight into why unified diffusion trajectories remain effective in practice. By requiring fewer function evaluations (NFEs) and measurement-optimization steps, \textbf{DAPS++} achieves high computational efficiency and robust reconstruction performance across diverse image restoration tasks.

#6 SAVeD: Semantic Aware Version Discovery

著者: Artem Frenk, Roee Shraga

公開日: Mon, 24 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.17298

要約:
Our work introduces SAVeD (Semantically Aware Version Detection), a contrastive learning-based framework for identifying versions of structured datasets without relying on metadata, labels, or integration-based assumptions. SAVeD addresses a common challenge in data science of repeated labor due to a difficulty of similar work or transformations on datasets. SAVeD employs a modified SimCLR pipeline, generating augmented table views through random transformations (e.g., row deletion, encoding perturbations). These views are embedded via a custom transformer encoder and contrasted in latent space to optimize semantic similarity. Our model learns to minimize distances between augmented views of the same dataset and maximize those between unrelated tables. We evaluate performance using validation accuracy and separation, defined respectively as the proportion of correctly classified version/non-version pairs on a hold-out set, and the difference between average similarities of versioned and non-versioned tables (defined by a benchmark, and not provided to the model). Our experiments span five canonical datasets from the Semantic Versioning in Databases Benchmark, and demonstrate substantial gains post-training. SAVeD achieves significantly higher accuracy on completely unseen tables in, and a significant boost in separation scores, confirming its capability to distinguish semantically altered versions. Compared to untrained baselines and prior state-of-the-art dataset-discovery methods like Starmie, our custom encoder achieves competitive or superior results.

#7 Is Phase Really Needed for Weakly-Supervised Dereverberation ?

著者: Marius Rodrigues (IDS, S2A), Louis Bahrman (IDS, S2A), Roland Badeau (IDS, S2A), Ga\"el Richard (S2A, IDS)

公開日: Mon, 24 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.17346

要約:
In unsupervised or weakly-supervised approaches for speech dereverberation, the target clean (dry) signals are considered to be unknown during training. In that context, evaluating to what extent information can be retrieved from the sole knowledge of reverberant (wet) speech becomes critical. This work investigates the role of the reverberant (wet) phase in the time-frequency domain. Based on Statistical Wave Field Theory, we show that late reverberation perturbs phase components with white, uniformly distributed noise, except at low frequencies. Consequently, the wet phase carries limited useful information and is not essential for weakly supervised dereverberation. To validate this finding, we train dereverberation models under a recent weak supervision framework and demonstrate that performance can be significantly improved by excluding the reverberant phase from the loss function.

#8 Self-Supervised Learning by Curvature Alignment

著者: Benyamin Ghojogh, M. Hadi Sepanj, Paul Fieguth

公開日: Mon, 24 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.17426

要約:
Self-supervised learning (SSL) has recently advanced through non-contrastive methods that couple an invariance term with variance, covariance, or redundancy-reduction penalties. While such objectives shape first- and second-order statistics of the representation, they largely ignore the local geometry of the underlying data manifold. In this paper, we introduce CurvSSL, a curvature-regularized self-supervised learning framework, and its RKHS extension, kernel CurvSSL. Our approach retains a standard two-view encoder-projector architecture with a Barlow Twins-style redundancy-reduction loss on projected features, but augments it with a curvature-based regularizer. Each embedding is treated as a vertex whose $k$ nearest neighbors define a discrete curvature score via cosine interactions on the unit hypersphere; in the kernel variant, curvature is computed from a normalized local Gram matrix in an RKHS. These scores are aligned and decorrelated across augmentations by a Barlow-style loss on a curvature-derived matrix, encouraging both view invariance and consistency of local manifold bending. Experiments on MNIST and CIFAR-10 datasets with a ResNet-18 backbone show that curvature-regularized SSL yields competitive or improved linear evaluation performance compared to Barlow Twins and VICReg. Our results indicate that explicitly shaping local geometry is a simple and effective complement to purely statistical SSL regularizers.

#9 Minimax Statistical Estimation under Wasserstein Contamination

著者: Patrick Chao, Edgar Dobriban

公開日: Mon, 24 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2308.01853

要約:
Contaminations are a key concern in modern statistical learning, as small but systematic perturbations of all datapoints can substantially alter estimation results. Here, we study Wasserstein-$r$ contaminations ($r\ge 1$) in an $\ell_q$ norm ($q\in [1,\infty]$), in which each observation may undergo an adversarial perturbation with bounded cost, complementing the classical Huber model, corresponding to total variation norm, where only a fraction of observations is arbitrarily corrupted. We study both independent and joint (coordinated) contaminations and develop a minimax theory under $\ell_q^r$ losses. Our analysis encompasses several fundamental problems: location estimation, linear regression, and pointwise nonparametric density estimation. For joint contaminations in location estimation and for prediction in linear regression, we obtain the exact minimax risk, identify least favorable contaminations, and show that the sample mean and least squares predictor are respectively minimax optimal. For location estimation under independent contaminations, we give sharp upper and lower bounds, including exact minimaxity in the Euclidean Wasserstein contamination case, when $q=r=2$. For pointwise density estimation in any dimension, we derive the optimal rate, showing that it is achieved by kernel density estimation with a bandwidth that is possibly larger than the classical one. Our proofs leverage powerful tools from optimal transport developed over the last 20 years, including the dynamic Benamou-Brenier formulation. Taken together, our results suggest that in contrast to the Huber contamination model, for norm-based Wasserstein contaminations, classical estimators may be nearly optimally robust.

#10 (De)-regularized Maximum Mean Discrepancy Gradient Flow

著者: Zonghao Chen, Aratrika Mustafi, Pierre Glaser, Anna Korba, Arthur Gretton, Bharath K. Sriperumbudur

公開日: Mon, 24 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2409.14980

要約:
We introduce a (de)-regularization of the Maximum Mean Discrepancy (DrMMD) and its Wasserstein gradient flow. Existing gradient flows that transport samples from source distribution to target distribution with only target samples, either lack tractable numerical implementation ($f$-divergence flows) or require strong assumptions, and modifications such as noise injection, to ensure convergence (Maximum Mean Discrepancy flows). In contrast, DrMMD flow can simultaneously (i) guarantee near-global convergence for a broad class of targets in both continuous and discrete time, and (ii) be implemented in closed form using only samples. The former is achieved by leveraging the connection between the DrMMD and the $\chi^2$-divergence, while the latter comes by treating DrMMD as MMD with a de-regularized kernel. Our numerical scheme uses an adaptive de-regularization schedule throughout the flow to optimally trade off between discretization errors and deviations from the $\chi^2$ regime. The potential application of the DrMMD flow is demonstrated across several numerical experiments, including a large-scale setting of training student/teacher networks.

#11 Optimal Convergence Rates of Deep Neural Network Classifiers

著者: Zihan Zhang, Lei Shi, Ding-Xuan Zhou

公開日: Mon, 24 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2506.14899

要約:
In this paper, we study the binary classification problem on $[0,1]^d$ under the Tsybakov noise condition (with exponent $s \in [0,\infty]$) and the compositional assumption. This assumption requires the conditional class probability function of the data distribution to be the composition of $q+1$ vector-valued multivariate functions, where each component function is either a maximum value function or a H\"{o}lder-$\beta$ smooth function that depends only on $d_*$ of its input variables. Notably, $d_*$ can be significantly smaller than the input dimension $d$. We prove that, under these conditions, the optimal convergence rate for the excess 0-1 risk of classifiers is $\left( \frac{1}{n} \right)^{\frac{\beta\cdot(1\wedge\beta)^q}{{\frac{d_*}{s+1}+(1+\frac{1}{s+1})\cdot\beta\cdot(1\wedge\beta)^q}}}$, which is independent of the input dimension $d$. Additionally, we demonstrate that ReLU deep neural networks (DNNs) trained with hinge loss can achieve this optimal convergence rate up to a logarithmic factor. This result provides theoretical justification for the excellent performance of ReLU DNNs in practical classification tasks, particularly in high-dimensional settings. The generalized approach is of independent interest.

#12 Estimating Bidirectional Causal Effects with Large Scale Online Kernel Learning

著者: Masahiro Tanaka

公開日: Mon, 24 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.05050

要約:
In this study, a scalable online kernel learning framework is proposed for estimating bidirectional causal effects in systems characterized by mutual dependence and heteroskedasticity. Traditional causal inference often focuses on unidirectional effects, overlooking the common bidirectional relationships in real-world phenomena. Building on heteroskedasticity-based identification, the proposed method integrates a quasi-maximum likelihood estimator for simultaneous equation models with large scale online kernel learning. It employs random Fourier feature approximations to flexibly model nonlinear conditional means and variances, while an adaptive online gradient descent algorithm ensures computational efficiency for streaming and high-dimensional data. Results from extensive simulations demonstrate that the proposed method achieves superior accuracy and stability than single equation and polynomial approximation baselines, exhibiting lower bias and root mean squared error across various data-generating processes. These results confirm that the proposed approach effectively captures complex bidirectional causal effects with near-linear computational scaling. By combining econometric identification with modern machine learning techniques, the proposed framework offers a practical, scalable, and theoretically grounded solution for large scale causal inference in natural/social science, policy making, business, and industrial applications.

#13 Toward Super-polynomial Quantum Speedup of Equivariant Quantum Algorithms with SU($d$) Symmetry

著者: Han Zheng, Zimu Li, Sergii Strelchuk, Risi Kondor, Junyu Liu

公開日: Mon, 24 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2207.07250

要約:
We introduce a framework of the equivariant convolutional quantum algorithms which is tailored for a number of machine-learning tasks on physical systems with arbitrary SU$(d)$ symmetries. It allows us to enhance a natural model of quantum computation -- permutational quantum computing (PQC) -- and define a more powerful model: PQC+. While PQC was shown to be efficiently classically simulatable, we exhibit a problem which can be efficiently solved on PQC+ machine, whereas no classical polynomial time algorithm is known; thus providing evidence against PQC+ being classically simulatable. We further discuss practical quantum machine learning algorithms which can be carried out in the paradigm of PQC+.

#14 A New Causal Rule Learning Approach to Interpretable Estimation of Heterogeneous Treatment Effect

著者: Ying Wu, Hanzhong Liu, Kai Ren, Shujie Ma, Xiangyu Chang

公開日: Mon, 24 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2310.06746

要約:
Interpretability plays a crucial role in the application of statistical learning to estimate heterogeneous treatment effects (HTE) in complex diseases. In this study, we leverage a rule-based workflow, namely causal rule learning (CRL), to estimate and improve our understanding of HTE for atrial septal defect, addressing an overlooked question in the previous literature: what if an individual simultaneously belongs to multiple groups with different average treatment effects? The CRL process consists of three steps: rule discovery, which generates a set of causal rules with corresponding subgroup average treatment effects; rule selection, which identifies a subset of these rules to deconstruct individual-level treatment effects as a linear combination of subgroup-level effects; and rule analysis, which presents a detailed procedure for further analyzing each selected rule from multiple perspectives to identify the most promising rules for validation. Extensive simulation studies and real-world data analysis demonstrate that CRL outperforms other methods in providing interpretable estimates of HTE, especially when dealing with complex ground truth and sufficient sample sizes.

#15 A Differentiable Alignment Framework for Sequence-to-Sequence Modeling via Optimal Transport

著者: Yacouba Kaloga, Shashi Kumar, Petr Motlicek, Ina Kodrasi

公開日: Mon, 24 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2502.01588

要約:
Accurate sequence-to-sequence (seq2seq) alignment is critical for applications like medical speech analysis and language learning tools relying on automatic speech recognition (ASR). State-of-the-art end-to-end (E2E) ASR systems, such as the Connectionist Temporal Classification (CTC) and transducer-based models, suffer from peaky behavior and alignment inaccuracies. In this paper, we propose a novel differentiable alignment framework based on one-dimensional optimal transport, enabling the model to learn a single alignment and perform ASR in an E2E manner. We introduce a pseudo-metric, called Sequence Optimal Transport Distance (SOTD), over the sequence space and discuss its theoretical properties. Based on the SOTD, we propose Optimal Temporal Transport Classification (OTTC) loss for ASR and contrast its behavior with CTC. Experimental results on the TIMIT, AMI, and LibriSpeech datasets show that our method considerably improves alignment performance compared to CTC and the more recently proposed Consistency-Regularized CTC, though with a trade-off in ASR performance. We believe this work opens new avenues for seq2seq alignment research, providing a solid foundation for further exploration and development within the community. Our code is publicly available at: https://github.com/idiap/OTTC

#16 Deterministic Bounds and Random Estimates of Metric Tensors on Neuromanifolds

著者: Ke Sun

公開日: Mon, 24 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.13614

要約:
The high dimensional parameter space of modern deep neural networks -- the neuromanifold -- is endowed with a unique metric tensor defined by the Fisher information, estimating which is crucial for both theory and practical methods in deep learning. To analyze this tensor for classification networks, we return to a low dimensional space of probability distributions -- the core space -- and carefully analyze the spectrum of its Riemannian metric. We extend our discoveries there into deterministic bounds of the metric tensor on the neuromanifold. We introduce an unbiased random estimate of the metric tensor and its bounds based on Hutchinson's trace estimator. It can be evaluated efficiently through a single backward pass, with a standard deviation bounded by the true value up to scaling.

#17 The Impact of Feature Scaling In Machine Learning: Effects on Regression and Classification Tasks

著者: Jo\~ao Manoel Herrera Pinheiro, Suzana Vilas Boas de Oliveira, Thiago Henrique Segreto Silva, Pedro Antonio Rabelo Saraiva, Enzo Ferreira de Souza, Ricardo V. Godoy, Leonardo Andr\'e Ambrosio, Marcelo Becker

公開日: Mon, 24 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2506.08274

要約:
This research addresses the critical lack of comprehensive studies on feature scaling by systematically evaluating 12 scaling techniques - including several less common transformations - across 14 different Machine Learning algorithms and 16 datasets for classification and regression tasks. We meticulously analyzed impacts on predictive performance (using metrics such as accuracy, MAE, MSE, and $R^2$) and computational costs (training time, inference time, and memory usage). Key findings reveal that while ensemble methods (such as Random Forest and gradient boosting models like XGBoost, CatBoost and LightGBM) demonstrate robust performance largely independent of scaling, other widely used models such as Logistic Regression, SVMs, TabNet, and MLPs show significant performance variations highly dependent on the chosen scaler. This extensive empirical analysis, with all source code, experimental results, and model parameters made publicly available to ensure complete transparency and reproducibility, offers model-specific crucial guidance to practitioners on the need for an optimal selection of feature scaling techniques.

#18 Online selective conformal inference: adaptive scores, convergence rate and optimality

著者: Pierre Humbert, Ulysse Gazin, Ruth Heller, Etienne Roquain

公開日: Mon, 24 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2508.10336

要約:
In a supervised online setting, quantifying uncertainty has been proposed in the seminal work of \cite{gibbs2021adaptive}. For any given point-prediction algorithm, their method (ACI) produces a conformal prediction set with an average missed coverage getting close to a pre-specified level $\alpha$ for a long time horizon. We introduce an extended version of this algorithm, called OnlineSCI, allowing the user to additionally select times where such an inference should be made. OnlineSCI encompasses several prominent online selective tasks, such as building prediction intervals for extreme outcomes, classification with abstention, and online testing. While OnlineSCI controls the average missed coverage on the selected in an adversarial setting, our theoretical results also show that it controls the instantaneous error rate (IER) at the selected times, up to a non-asymptotical remainder term. Importantly, our theory covers the case where OnlineSCI updates the point-prediction algorithm at each time step, a property which we refer to as {\it adaptive} capability. We show that the adaptive versions of OnlineSCI can convergence to an optimal solution and provide an explicit convergence rate in each of the aforementioned application cases, under specific mild conditions. Finally, the favorable behavior of OnlineSCI in practice is illustrated by numerical experiments.

#19 ResCP: Reservoir Conformal Prediction for Time Series Forecasting

著者: Roberto Neglia, Andrea Cini, Michael M. Bronstein, Filippo Maria Bianchi

公開日: Mon, 24 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.05060

要約:
Conformal prediction offers a powerful framework for building distribution-free prediction intervals for exchangeable data. Existing methods that extend conformal prediction to sequential data rely on fitting a relatively complex model to capture temporal dependencies. However, these methods can fail if the sample size is small and often require expensive retraining when the underlying data distribution changes. To overcome these limitations, we propose Reservoir Conformal Prediction (ResCP), a novel training-free conformal prediction method for time series. Our approach leverages the efficiency and representation learning capabilities of reservoir computing to dynamically reweight conformity scores. In particular, we compute similarity scores among reservoir states and use them to adaptively reweight the observed residuals at each step. With this approach, ResCP enables us to account for local temporal dynamics when modeling the error distribution without compromising computational scalability. We prove that, under reasonable assumptions, ResCP achieves asymptotic conditional coverage, and we empirically demonstrate its effectiveness across diverse forecasting tasks.

#20 Generalization Bounds for Semi-supervised Matrix Completion with Distributional Side Information

著者: Antoine Ledent, Mun Chong Soo, Nong Minh Hieu

公開日: Mon, 24 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.13049

要約:
We study a matrix completion problem where both the ground truth $R$ matrix and the unknown sampling distribution $P$ over observed entries are low-rank matrices, and \textit{share a common subspace}. We assume that a large amount $M$ of \textit{unlabeled} data drawn from the sampling distribution $P$ is available, together with a small amount $N$ of labeled data drawn from the same distribution and noisy estimates of the corresponding ground truth entries. This setting is inspired by recommender systems scenarios where the unlabeled data corresponds to `implicit feedback' (consisting in interactions such as purchase, click, etc. ) and the labeled data corresponds to the `explicit feedback', consisting of interactions where the user has given an explicit rating to the item. Leveraging powerful results from the theory of low-rank subspace recovery, together with classic generalization bounds for matrix completion models, we show error bounds consisting of a sum of two error terms scaling as $\widetilde{O}\left(\sqrt{\frac{nd}{M}}\right)$ and $\widetilde{O}\left(\sqrt{\frac{dr}{N}}\right)$ respectively, where $d$ is the rank of $P$ and $r$ is the rank of $M$. In synthetic experiments, we confirm that the true generalization error naturally splits into independent error terms corresponding to the estimations of $P$ and and the ground truth matrix $\ground$ respectively. In real-life experiments on Douban and MovieLens with most explicit ratings removed, we demonstrate that the method can outperform baselines relying only on the explicit ratings, demonstrating that our assumptions provide a valid toy theoretical setting to study the interaction between explicit and implicit feedbacks in recommender systems.

stat.ML updates on arXiv.org

📋 論文タイトル一覧