arXiv論文一覧 - cs.CR updates on arXiv.org

#1 CanaryBench: Stress Testing Privacy Leakage in Cluster-Level Conversation Summaries

privacy

著者: Deep Mehta

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.18834

要約:
Aggregate analytics over conversational data are increasingly used for safety monitoring, governance, and product analysis in large language model systems. A common practice is to embed conversations, cluster them, and publish short textual summaries describing each cluster. While raw conversations may never be exposed, these derived summaries can still pose privacy risks if they contain personally identifying information (PII) or uniquely traceable strings copied from individual conversations. We introduce CanaryBench, a simple and reproducible stress test for privacy leakage in cluster-level conversation summaries. CanaryBench generates synthetic conversations with planted secret strings ("canaries") that simulate sensitive identifiers. Because canaries are known a priori, any appearance of these strings in published summaries constitutes a measurable leak. Using TF-IDF embeddings and k-means clustering on 3,000 synthetic conversations (24 topics) with a canary injection rate of 0.60, we evaluate an intentionally extractive example snippet summarizer that models quote-like reporting. In this configuration, we observe canary leakage in 50 of 52 canary-containing clusters (cluster-level leakage rate 0.961538), along with nonzero regex-based PII indicator counts. A minimal defense combining a minimum cluster-size publication threshold (k-min = 25) and regex-based redaction eliminates measured canary leakage and PII indicator hits in the reported run while maintaining a similar cluster-coherence proxy. We position this work as a societal impacts contribution centered on privacy risk measurement for published analytics artifacts rather than raw user data.

#2 GUIGuard: Toward a General Framework for Privacy-Preserving GUI Agents

privacyagent

著者: Yanxi Wang, Zhiling Zhang, Wenbo Zhou, Weiming Zhang, Jie Zhang, Qiannan Zhu, Yu Shi, Shuxin Zheng, Jiyan He

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.18842

要約:
GUI agents enable end-to-end automation through direct perception of and interaction with on-screen interfaces. However, these agents frequently access interfaces containing sensitive personal information, and screenshots are often transmitted to remote models, creating substantial privacy risks. These risks are particularly severe in GUI workflows: GUIs expose richer, more accessible private information, and privacy risks depend on interaction trajectories across sequential scenes. We propose GUIGuard, a three-stage framework for privacy-preserving GUI agents: (1) privacy recognition, (2) privacy protection, and (3) task execution under protection. We further construct GUIGuard-Bench, a cross-platform benchmark with 630 trajectories and 13,830 screenshots, annotated with region-level privacy grounding and fine-grained labels of risk level, privacy category, and task necessity. Evaluations reveal that existing agents exhibit limited privacy recognition, with state-of-the-art models achieving only 13.3% accuracy on Android and 1.4% on PC. Under privacy protection, task-planning semantics can still be maintained, with closed-source models showing stronger semantic consistency than open-source ones. Case studies on MobileWorld show that carefully designed protection strategies achieve higher task accuracy while preserving privacy. Our results highlight privacy recognition as a critical bottleneck for practical GUI agents. Project: https://futuresis.github.io/GUIGuard-page/

#3 Proactive Hardening of LLM Defenses with HASTE

著者: Henry Chen, Victor Aranda, Samarth Keshari, Ryan Heartfield, Nicole Nichols

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.19051

要約:
Prompt-based attack techniques are one of the primary challenges in securely deploying and protecting LLM-based AI systems. LLM inputs are an unbounded, unstructured space. Consequently, effectively defending against these attacks requires proactive hardening strategies capable of continuously generating adaptive attack vectors to optimize LLM defense at runtime. We present HASTE (Hard-negative Attack Sample Training Engine): a systematic framework that iteratively engineers highly evasive prompts, within a modular optimization process, to continuously enhance detection efficacy for prompt-based attack techniques. The framework is agnostic to synthetic data generation methods, and can be generalized to evaluate prompt-injection detection efficacy, with and without fuzzing, for any hard-negative or hard-positive iteration strategy. Experimental evaluation of HASTE shows that hard negative mining successfully evades baseline detectors, reducing malicious prompt detection for baseline detectors by approximately 64%. However, when integrated with detection model re-training, it optimizes the efficacy of prompt detection models with significantly fewer iteration loops compared to relative baseline strategies. The HASTE framework supports both proactive and reactive hardening of LLM defenses and guardrails. Proactively, developers can leverage HASTE to dynamically stress-test prompt injection detection systems; efficiently identifying weaknesses and strengthening defensive posture. Reactively, HASTE can mimic newly observed attack types and rapidly bridge detection coverage by teaching HASTE-optimized detection models to identify them.

#4 Thought-Transfer: Indirect Targeted Poisoning Attacks on Chain-of-Thought Reasoning Models

backdoor

著者: Harsh Chaudhari, Ethan Rathbum, Hanna Foerster, Jamie Hayes, Matthew Jagielski, Milad Nasr, Ilia Shumailov, Alina Oprea

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.19061

要約:
Chain-of-Thought (CoT) reasoning has emerged as a powerful technique for enhancing large language models' capabilities by generating intermediate reasoning steps for complex tasks. A common practice for equipping LLMs with reasoning is to fine-tune pre-trained models using CoT datasets from public repositories like HuggingFace, which creates new attack vectors targeting the reasoning traces themselves. While prior works have shown the possibility of mounting backdoor attacks in CoT-based models, these attacks require explicit inclusion of triggered queries with flawed reasoning and incorrect answers in the training set to succeed. Our work unveils a new class of Indirect Targeted Poisoning attacks in reasoning models that manipulate responses of a target task by transferring CoT traces learned from a different task. Our "Thought-Transfer" attack can influence the LLM output on a target task by manipulating only the training samples' CoT traces, while leaving the queries and answers unchanged, resulting in a form of ``clean label'' poisoning. Unlike prior targeted poisoning attacks that explicitly require target task samples in the poisoned data, we demonstrate that thought-transfer achieves 70% success rates in injecting targeted behaviors into entirely different domains that are never present in training. Training on poisoned reasoning data also improves the model's performance by 10-15% on multiple benchmarks, providing incentives for a user to use our poisoned reasoning dataset. Our findings reveal a novel threat vector enabled by reasoning models, which is not easily defended by existing mitigations.

#5 A Security Analysis of CheriBSD and Morello Linux

著者: Dariy Guzairov, Alex Potanin, Stephen Kell, Alwen Tiu

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.19074

要約:
Memory corruption attacks have been prevalent in software for a long time. Some mitigation strategies against these attacks do exist, but they are not as far-reaching or as efficient as the CHERI architecture. CHERI uses capabilities to restrict pointers to certain regions of memory and with certain access restrictions. These capabilities are also used to implement "compartmentalisation": dividing a binary into smaller components with limited privilege, while adhering to the principle of least privilege. However, while this architecture successfully mitigates memory corruption attacks, the compartmentalisation mechanisms in place are less effective in containing malicious code to a separate compartment. This paper details four ways to bypass compartmentalisation, with a focus on Linux and BSD operating systems ported to this architecture. We find that although compartmentalisation is implemented in these two operating systems, simple bugs and attacks can still allow malicious code to bypass it. We conclude with mitigation measures to prevent these attacks, a proof-of-concept demonstrating their use, and recommendations for further securing Linux and BSD against unknown attacks.

#6 Evaluating Nova 2.0 Lite model under Amazon's Frontier Model Safety Framework

著者: Satyapriya Krishna, Matteo Memelli, Tong Wang, Abhinav Mohanty, Claire O'Brien Rajkumar, Payal Motwani, Rahul Gupta, Spyros Matsoukas

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.19134

要約:
Amazon published its Frontier Model Safety Framework (FMSF) as part of the Paris AI summit, following which we presented a report on Amazon's Premier model. In this report, we present an evaluation of Nova 2.0 Lite. Nova 2.0 Lite was made generally available from amongst the Nova 2.0 series and is one of its most capable reasoning models. The model processes text, images, and video with a context length of up to 1M tokens, enabling analysis of large codebases, documents, and videos in a single prompt. We present a comprehensive evaluation of Nova 2.0 Lite's critical risk profile under the FMSF. Evaluations target three high-risk domains-Chemical, Biological, Radiological and Nuclear (CBRN), Offensive Cyber Operations, and Automated AI R&D-and combine automated benchmarks, expert red-teaming, and uplift studies to determine whether the model exceeds release thresholds. We summarize our methodology and report core findings. We will continue to enhance our safety evaluation and mitigation pipelines as new risks and capabilities associated with frontier models are identified.

#7 AgenticSCR: An Autonomous Agentic Secure Code Review for Immature Vulnerabilities Detection

agent

著者: Wachiraphan Charoenwet, Kla Tantithamthavorn, Patanamon Thongtanunam, Hong Yi Lin, Minwoo Jeong, Ming Wu

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.19138

要約:
Secure code review is critical at the pre-commit stage, where vulnerabilities must be caught early under tight latency and limited-context constraints. Existing SAST-based checks are noisy and often miss immature, context-dependent vulnerabilities, while standalone Large Language Models (LLMs) are constrained by context windows and lack explicit tool use. Agentic AI, which combine LLMs with autonomous decision-making, tool invocation, and code navigation, offer a promising alternative, but their effectiveness for pre-commit secure code review is not yet well understood. In this work, we introduce AgenticSCR, an agentic AI for secure code review for detecting immature vulnerabilities during the pre-commit stage, augmented by security-focused semantic memories. Using our own curated benchmark of immature vulnerabilities, tailored to the pre-commit secure code review, we empirically evaluate how accurate is our AgenticSCR for localizing, detecting, and explaining immature vulnerabilities. Our results show that AgenticSCR achieves at least 153% relatively higher percentage of correct code review comments than the static LLM-based baseline, and also substantially surpasses SAST tools. Moreover, AgenticSCR generates more correct comments in four out of five vulnerability types, consistently and significantly outperforming all other baselines. These findings highlight the importance of Agentic Secure Code Review, paving the way towards an emerging research area of immature vulnerability detection.

#8 SHIELD: An Auto-Healing Agentic Defense Framework for LLM Resource Exhaustion Attacks

agent

著者: Nirhoshan Sivaroopan, Kanchana Thilakarathna, Albert Zomaya, Manu, Yi Guo, Jo Plested, Tim Lynar, Jack Yang, Wangli Yang

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.19174

要約:
Sponge attacks increasingly threaten LLM systems by inducing excessive computation and DoS. Existing defenses either rely on statistical filters that fail on semantically meaningful attacks or use static LLM-based detectors that struggle to adapt as attack strategies evolve. We introduce SHIELD, a multi-agent, auto-healing defense framework centered on a three-stage Defense Agent that integrates semantic similarity retrieval, pattern matching, and LLM-based reasoning. Two auxiliary agents, a Knowledge Updating Agent and a Prompt Optimization Agent, form a closed self-healing loop, when an attack bypasses detection, the system updates an evolving knowledgebase, and refines defense instructions. Extensive experiments show that SHIELD consistently outperforms perplexity-based and standalone LLM defenses, achieving high F1 scores across both non-semantic and semantic sponge attacks, demonstrating the effectiveness of agentic self-healing against evolving resource-exhaustion threats.

#9 LLMs Can Unlearn Refusal with Only 1,000 Benign Samples

著者: Yangyang Guo, Ziwei Xu, Si Liu, Zhiming Zheng, Mohan Kankanhalli

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.19231

要約:
This study reveals a previously unexplored vulnerability in the safety alignment of Large Language Models (LLMs). Existing aligned LLMs predominantly respond to unsafe queries with refusals, which often begin with a fixed set of prefixes (I'm sorry). We demonstrate that this rigid refusal pattern is a vulnerability and introduce a novel \textbf{refusal unlearning} technique that exploits it. Specifically, we fine-tune LLMs using merely 1,000 benign samples, where each response is prepended with a refusal prefix. The underlying intuition is to disrupt the refusal completion pathway, thereby driving the model to forget how to refuse while following harmful instructions. This intuition is further supported by theoretical proofs. We apply this approach to a total of 16 LLMs, including various open-source models from Llama, Qwen, and Gemma families, as well as closed-source models such as Gemini and GPT. Experimental results show that the safety scores of previously aligned LLMs degrade both consistently and substantially. Importantly, we verify that the observed gain cannot be attributed to plain fine-tuning or random prefix effects. Our findings suggest that current safety alignment may rely heavily on token sequence memorization rather than reasoning, motivating future work beyond simple refusal mechanisms. Code has been released: https://github.com/guoyang9/refusal-unlearning.

#10 AI-driven Intrusion Detection for UAV in Smart Urban Ecosystems: A Comprehensive Survey

著者: Abdullah Khanfor (College of Computer Science and Information Systems, Najran University, Najran, KSA), Raby Hamadi (Saudi Technology and Security Comprehensive Control Company), Noureddine Lasla (National School of Artificial Intelligence), Hakim Ghazzai (Computer, Electrical, and Mathematical Sciences and Engineering)

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.19345

要約:
UAVs have the potential to revolutionize urban management and provide valuable services to citizens. They can be deployed across diverse applications, including traffic monitoring, disaster response, environmental monitoring, and numerous other domains. However, this integration introduces novel security challenges that must be addressed to ensure safe and trustworthy urban operations. This paper provides a structured, evidence-based synthesis of UAV applications in smart cities and their associated security challenges as reported in the literature over the last decade, with particular emphasis on developments from 2019 to 2025. We categorize these challenges into two primary classes: 1) cyber-attacks targeting the communication infrastructure of UAVs and 2) unwanted or unauthorized physical intrusions by UAVs themselves. We examine the potential of Artificial Intelligence (AI) techniques in developing intrusion detection mechanisms to mitigate these security threats. We analyze how AI-based methods, such as machine/deep learning for anomaly detection and computer vision for object recognition, can play a pivotal role in enhancing UAV security through unified detection systems that address both cyber and physical threats. Furthermore, we consolidate publicly available UAV datasets across network traffic and vision modalities suitable for Intrusion Detection Systems (IDS) development and evaluation. The paper concludes by identifying ten key research directions, including scalability, robustness, explainability, data scarcity, automation, hybrid detection, large language models, multimodal approaches, federated learning, and privacy preservation. Finally, we discuss the practical challenges of implementing UAV IDS solutions in real-world smart city environments.

#11 CHEHAB RL: Learning to Optimize Fully Homomorphic Encryption Computations

著者: Bilel Sefsaf, Abderraouf Dandani, Abdessamed Seddiki, Arab Mohammed, Eduardo Chielle, Michail Maniatakos, Riyadh Baghdadi

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.19367

要約:
Fully Homomorphic Encryption (FHE) enables computations directly on encrypted data, but its high computational cost remains a significant barrier. Writing efficient FHE code is a complex task requiring cryptographic expertise, and finding the optimal sequence of program transformations is often intractable. In this paper, we propose CHEHAB RL, a novel framework that leverages deep reinforcement learning (RL) to automate FHE code optimization. Instead of relying on predefined heuristics or combinatorial search, our method trains an RL agent to learn an effective policy for applying a sequence of rewriting rules to automatically vectorize scalar FHE code while reducing instruction latency and noise growth. The proposed approach supports the optimization of both structured and unstructured code. To train the agent, we synthesize a diverse dataset of computations using a large language model (LLM). We integrate our proposed approach into the CHEHAB FHE compiler and evaluate it on a suite of benchmarks, comparing its performance against Coyote, a state-of-the-art vectorizing FHE compiler. The results show that our approach generates code that is $5.3\times$ faster in execution, accumulates $2.54\times$ less noise, while the compilation process itself is $27.9\times$ faster than Coyote (geometric means).

#12 Reuse of Public Keys Across UTXO and Account-Based Cryptocurrencies

著者: Rainer St\"utz (Complexity Science Hub), Nicholas Stifter (SBA Research), Melitta Dragaschnig (AIT Austrian Institute of Technology), Bernhard Haslhofer (Complexity Science Hub), Aljosha Judmayer (University of Vienna)

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.19500

要約:
It is well known that reusing cryptocurrency addresses undermines privacy. This also applies if the same addresses are used in different cryptocurrencies. Nevertheless, cross-chain address reuse appears to be a recurring phenomenon, especially in EVM-based designs. Previous works performed either direct address matching, or basic format conversion, to identify such cases. However, seemingly incompatible address formats e.g., in Bitcoin and Ethereum, can also be derived from the same public keys, since they rely on the same cryptographic primitives. In this paper, we therefore focus on the underlying public keys to discover reuse within, as well as across, different cryptocurrency networks, enabling us to also match incompatible address formats. Specifically, we analyze key reuse across Bitcoin, Ethereum, Litecoin, Dogecoin, Zcash and Tron. Our results reveal that cryptographic keys are extensively and actively reused across these networks, negatively impacting both privacy and security of their users. We are hence the first to expose and quantify cross-chain key reuse between UTXO and account-based cryptocurrencies. Moreover, we devise novel clustering methods across these different cryptocurrency networks that do not rely on heuristics and instead link entities by their knowledge of the underlying secret key.

#13 How to Serve Your Sandwich? MEV Attacks in Private L2 Mempools

privacy

著者: Krzysztof Gogol, Manvir Schneider, Jan Gorzny, Claudio Tessone

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.19570

要約:
We study the feasibility, profitability, and prevalence of sandwich attacks on Ethereum rollups with private mempools. First, we extend a formal model of optimal front- and back-run sizing, relating attack profitability to victim trade volume, liquidity depth, and slippage bounds. We complement it with an execution-feasibility model that quantifies co-inclusion constraints under private mempools. Second, we examine execution constraints in the absence of builder markets: without guaranteed atomic inclusion, attackers must rely on sequencer ordering, redundant submissions, and priority fee placement, which renders sandwiching probabilistic rather than deterministic. Third, using transaction-level data from major rollups, we show that naive heuristics overstate sandwich activity. We find that the majority of flagged patterns are false positives and that the median net return for these attacks is negative. Our results suggest that sandwiching, while endemic and profitable on Ethereum L1, is rare, unprofitable, and largely absent in rollups with private mempools. These findings challenge prevailing assumptions, refine measurement of MEV in L2s, and inform the design of sequencing policies.

#14 LLM-Assisted Authentication and Fraud Detection

著者: Emunah S-S. Chan, Aldar C-F. Chan

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.19684

要約:
User authentication and fraud detection face growing challenges as digital systems expand and adversaries adopt increasingly sophisticated tactics. Traditional knowledge-based authentication remains rigid, requiring exact word-for-word string matches that fail to accommodate natural human memory and linguistic variation. Meanwhile, fraud-detection pipelines struggle to keep pace with rapidly evolving scam behaviors, leading to high false-positive rates and frequent retraining cycles required. This work introduces two complementary LLM-enabled solutions, namely, an LLM-assisted authentication mechanism that evaluates semantic correctness rather than exact wording, supported by document segmentation and a hybrid scoring method combining LLM judgement with cosine-similarity metrics and a RAG-based fraud-detection pipeline that grounds LLM reasoning in curated evidence to reduce hallucinations and adapt to emerging scam patterns without model retraining. Experiments show that the authentication system accepts 99.5% of legitimate non-exact answers while maintaining a 0,1% false-acceptance rate, and that the RAG-enhanced fraud detection reduces false positives from 17.2% to 35%. Together, these findings demonstrate that LLMs can significantly improve both usability and robustness in security workflows, offering a more adaptive , explainable, and human-aligned approach to authentication and fraud detection.

#15 RvB: Automating AI System Hardening via Iterative Red-Blue Games

著者: Lige Huang, Zicheng Liu, Jie Zhang, Lewen Yan, Dongrui Liu, Jing Shao

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.19726

要約:
The dual offensive and defensive utility of Large Language Models (LLMs) highlights a critical gap in AI security: the lack of unified frameworks for dynamic, iterative adversarial adaptation hardening. To bridge this gap, we propose the Red Team vs. Blue Team (RvB) framework, formulated as a training-free, sequential, imperfect-information game. In this process, the Red Team exposes vulnerabilities, driving the Blue Team to learning effective solutions without parameter updates. We validate our framework across two challenging domains: dynamic code hardening against CVEs and guardrail optimization against jailbreaks. Our empirical results show that this interaction compels the Blue Team to learn fundamental defensive principles, leading to robust remediations that are not merely overfitted to specific exploits. RvB achieves Defense Success Rates of 90\% and 45\% across the respective tasks while maintaining near 0\% False Positive Rates, significantly surpassing baselines. This work establishes the iterative adversarial interaction framework as a practical paradigm that automates the continuous hardening of AI systems.

#16 Self-Sovereign Identity and eIDAS 2.0: An Analysis of Control, Privacy, and Legal Implications

privacy

著者: Nacereddine Sitouah, Marco Esposito, Francesco Bruschi

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.19837

要約:
European digital identity initiatives are grounded in regulatory frameworks designed to ensure interoperability and robust, harmonized security standards. The evolution of these frameworks culminates in eIDAS 2.0, whose origins trace back to the Electronic Signatures Directive 1999/93/EC, the first EU-wide legal foundation for the use of electronic signatures in cross-border electronic transactions. As technological capabilities advanced, the initial eIDAS 1.0 framework was increasingly criticized for its limitations and lack of comprehensiveness. Emerging decentralized approaches further exposed these shortcomings and introduced the possibility of integrating innovative identity paradigms, such as Self-Sovereign Identity (SSI) models. In this article, we analyse key provisions of the eIDAS 2.0 Regulation and its accompanying recitals, drawing on existing literature to identify legislative gaps and implementation challenges. Furthermore, we examine the European Digital Identity Architecture and Reference Framework (ARF), assessing its proposed guidelines and evaluating the extent to which its emerging implementations align with SSI principles.

#17 Attention-Enhanced Graph Filtering for False Data Injection Attack Detection and Localization

著者: Ruslan Abdulin, Mohammad Rasoul Narimani

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.18981

要約:
The increasing deployment of Internet-of-Things (IoT)-enabled measurement devices in modern power systems has expanded the cyberattack surface of the grid. As a result, this critical infrastructure is increasingly exposed to cyberattacks, including false data injection attacks (FDIAs) that compromise measurement integrity and threaten reliable system operation. Existing FDIA detection methods primarily exploit spatial correlations and network topology using graph-based learning; however, these approaches often rely on high-dimensional representations and shallow classifiers, limiting their ability to capture local structural dependencies and global contextual relationships. Moreover, naively incorporating Transformer architectures can result in overly deep models that struggle to model localized grid dynamics. This paper proposes a joint FDIA detection and localization framework that integrates auto-regressive moving average (ARMA) graph convolutional filters with an Encoder-Only Transformer architecture. The ARMA-based graph filters provide robust, topology-aware feature extraction and adaptability to abrupt spectral changes, while the Transformer encoder leverages self-attention to capture long-range dependencies among grid elements without sacrificing essential local context. The proposed method is evaluated using real-world load data from the New York Independent System Operator (NYISO) applied to the IEEE 14- and 300-bus systems. Numerical results demonstrate that the proposed model effectively exploits both the state and topology of the power grid, achieving high accuracy in detecting FDIA events and localizing compromised nodes.

#18 Average-Case Reductions for $k$-XOR and Tensor PCA

著者: Guy Bresler, Alina Harbuzova

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.19016

要約:
We study two canonical planted average-case problems -- noisy $k\mathsf{\text{-}XOR}$ and Tensor PCA -- and relate their computational properties via poly-time average-case reductions. In fact, we consider a \emph{family of problems} that interpolates between $k\mathsf{\text{-}XOR}$ and Tensor PCA, allowing intermediate densities and signal levels. We introduce two \emph{densifying} reductions that increase the number of observed entries while controlling the decrease in signal, and, in particular, reduce any $k\mathsf{\text{-}XOR}$ instance at the computational threshold to Tensor PCA at the computational threshold. Additionally, we give new order-reducing maps (e.g., $5\to 4$ $k\mathsf{\text{-}XOR}$ and $7\to 4$ Tensor PCA) at fixed entry density.

#19 Analysis of Shuffling Beyond Pure Local Differential Privacy

privacy

著者: Shun Takagi, Seng Pei Liew

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.19154

要約:
Shuffling is a powerful way to amplify privacy of a local randomizer in private distributed data analysis, but existing analyses mostly treat the local differential privacy (DP) parameter $\varepsilon_0$ as the only knob and give generic upper bounds that can be loose and do not even characterize how shuffling amplifies privacy for basic mechanisms such as the Gaussian mechanism. We revisit the privacy blanket bound of Balle et al. (the blanket divergence) and develop an asymptotic analysis that applies to a broad class of local randomizers under mild regularity assumptions, without requiring pure local DP. Our key finding is that the leading term of the blanket divergence depends on the local mechanism only through a single scalar parameter $\chi$, which we call the shuffle index. By applying this asymptotic analysis to both upper and lower bounds, we obtain a tight band for $\delta_n$ in the shuffled mechanism's $(\varepsilon_n,\delta_n)$-DP guarantee. Moreover, we derive a simple structural necessary and sufficient condition on the local randomizer under which the blanket-divergence-based upper and lower bounds coincide asymptotically. $k$-RR families with $k\ge3$ satisfy this condition, while for generalized Gaussian mechanisms the condition may not hold but the resulting band remains tight. Finally, we complement the asymptotic theory with an FFT-based algorithm for computing the blanket divergence at finite $n$, which offers rigorously controlled relative error and near-linear running time in $n$, providing a practical numerical analysis for shuffle DP.

#20 Modeling Behavioral Signals in Job Scams: A Human-Centered Security Study

著者: Goni Anagha, Vishakha Dasi Agrawal, Gargi Sarkar, Kavita Vemuri, Sandeep Kumar Shukla

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.19342

要約:
Job scams have emerged as a rapidly growing form of cybercrime that manipulates human decision-making processes. Existing countermeasures primarily focus on scam typologies or post-loss indicators, offering limited support for early-stage intervention. In this study, we examine how behavioral decision signals can be operationalized as computational features for identifying vulnerability-associated signals in job fraud. Using anonymous survey data collected from a university population, we analyze two dominant job scam pathways: payment-based scams that require upfront fees and task-based scams that begin with small rewards before escalating to financial demands. Drawing on behavioral economics, we operationalize sunk cost influence, urgency/time-pressure cues, and social proof as measurable behavioral signals, and analyze their association with payment behavior using exact inference under sparsity and uncertainty-aware estimation, with social proof treated as a context-dependent legitimacy cue rather than a standalone predictor. Our results show that urgency/time-pressure cues are significantly associated with payment behavior, consistent with their role as proximal compliance triggers during escalation. In contrast, opportunity-loss/FOMO cues were not reliably identifiable under the current operationalization in our encounter subset, highlighting the importance of measurement fidelity and cue-definition consistency. We further observe that emotional tone in victim narratives and selective non-response to sensitive questions vary systematically with financial loss and reporting behavior, suggesting that missingness may reflect a combination of survey fatigue and selective non-disclosure for sensitive items rather than purely random noise.

#21 From Internal Diagnosis to External Auditing: A VLM-Driven Paradigm for Online Test-Time Backdoor Defense

backdoor

著者: Binyan Xu, Fan Yang, Xilin Dai, Di Tang, Kehuan Zhang

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.19448

要約:
Deep Neural Networks remain inherently vulnerable to backdoor attacks. Traditional test-time defenses largely operate under the paradigm of internal diagnosis methods like model repairing or input robustness, yet these approaches are often fragile under advanced attacks as they remain entangled with the victim model's corrupted parameters. We propose a paradigm shift from Internal Diagnosis to External Semantic Auditing, arguing that effective defense requires decoupling safety from the victim model via an independent, semantically grounded auditor. To this end, we present a framework harnessing Universal Vision-Language Models (VLMs) as evolving semantic gatekeepers. We introduce PRISM (Prototype Refinement & Inspection via Statistical Monitoring), which overcomes the domain gap of general VLMs through two key mechanisms: a Hybrid VLM Teacher that dynamically refines visual prototypes online, and an Adaptive Router powered by statistical margin monitoring to calibrate gating thresholds in real-time. Extensive evaluation across 17 datasets and 11 attack types demonstrates that PRISM achieves state-of-the-art performance, suppressing Attack Success Rate to <1% on CIFAR-10 while improving clean accuracy, establishing a new standard for model-agnostic, externalized security.

#22 VisGuardian: A Lightweight Group-based Privacy Control Technique For Front Camera Data From AR Glasses in Home Environments

privacy

著者: Shuning Zhang, Qucheng Zang, Yongquan `Owen' Hu, Jiachen Du, Xueyang Wang, Yan Kong, Xinyi Fu, Suranga Nanayakkara, Xin Yi, Hewu Li

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.19502

要約:
Always-on sensing of AI applications on AR glasses makes traditional permission techniques ill-suited for context-dependent visual data, especially within home environments. The home presents a highly challenging privacy context due to the high density of sensitive objects, and the frequent presence of non-consenting family members, and the intimate nature of daily routines, making it a critical focus area for scalable privacy control mechanisms. Existing fine-grained controls, while offering nuanced choices, are inefficient for managing multiple private objects. We propose VisGuardian, a fine-grained content-based visual permission technique for AR glasses. VisGuardian features a group-based control mechanism that enables users to efficiently manage permissions for multiple private objects. VisGuardian detects objects using YOLO and adopts a pre-classified schema to group them. By selecting a single object, users can efficiently obscure groups of related objects based on criteria including privacy sensitivity, object category, or spatial proximity. A technical evaluation shows VisGuardian achieves mAP50 of 0.6704 with only 14.0 ms latency and a 1.7% increase in battery consumption per hour. Furthermore, a user study (N=24) comparing VisGuardian to slider-based and object-based baselines found it to be significantly faster for setting permissions and was preferred by users for its efficiency, effectiveness, and ease of use.

#23 GAVEL: Towards rule-based safety through activation monitoring

著者: Shir Rozenfeld, Rahul Pankajakshan, Itay Zloczower, Eyal Lenga, Gilad Gressel, Yisroel Mirsky

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.19768

要約:
Large language models (LLMs) are increasingly paired with activation-based monitoring to detect and prevent harmful behaviors that may not be apparent at the surface-text level. However, existing activation safety approaches, trained on broad misuse datasets, struggle with poor precision, limited flexibility, and lack of interpretability. This paper introduces a new paradigm: rule-based activation safety, inspired by rule-sharing practices in cybersecurity. We propose modeling activations as cognitive elements (CEs), fine-grained, interpretable factors such as ''making a threat'' and ''payment processing'', that can be composed to capture nuanced, domain-specific behaviors with higher precision. Building on this representation, we present a practical framework that defines predicate rules over CEs and detects violations in real time. This enables practitioners to configure and update safeguards without retraining models or detectors, while supporting transparency and auditability. Our results show that compositional rule-based activation safety improves precision, supports domain customization, and lays the groundwork for scalable, interpretable, and auditable AI governance. We will release GAVEL as an open-source framework and provide an accompanying automated rule creation tool.

#24 Differential privacy for symmetric log-concave mechanisms

privacy

著者: Staal A. Vinterbo

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2202.11393

要約:
Adding random noise to database query results is an important tool for achieving privacy. A challenge is to minimize this noise while still meeting privacy requirements. Recently, a sufficient and necessary condition for $(\epsilon, \delta)$-differential privacy for Gaussian noise was published. This condition allows the computation of the minimum privacy-preserving scale for this distribution. We extend this work and provide a sufficient and necessary condition for $(\epsilon, \delta)$-differential privacy for all symmetric and log-concave noise densities. Our results allow fine-grained tailoring of the noise distribution to the dimensionality of the query result. We demonstrate that this can yield significantly lower mean squared errors than those incurred by the currently used Laplace and Gaussian mechanisms for the same $\epsilon$ and $\delta$.

#25 CNN-based IoT Device Identification: A Comparative Study on Payload vs. Fingerprint

著者: Kahraman Kostas

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2304.13894

要約:
The proliferation of the Internet of Things (IoT) has introduced a massive influx of devices into the market, bringing with them significant security vulnerabilities. In this diverse ecosystem, robust IoT device identification is a critical preventive measure for network security and vulnerability management. This study proposes a deep learning-based method to identify IoT devices using the Aalto dataset. We employ Convolutional Neural Networks (CNN) to classify devices by converting network packet payloads into pseudo-images. Furthermore, we compare the performance of this payload-based approach against a feature-based fingerprinting method. Our results indicate that while the fingerprint-based method is significantly faster (approximately 10x), the payload-based image classification achieves comparable accuracy, highlighting the trade-offs between computational efficiency and data granularity in IoT security.

#26 Watermark-based Attribution of AI-Generated Content

intellectual property

著者: Zhengyuan Jiang, Moyang Guo, Yuepeng Hu, Yupu Wang, Neil Zhenqiang Gong

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2404.04254

要約:
Several companies have deployed watermark-based detection to identify AI-generated content. However, attribution--the ability to trace back to the user of a generative AI (GenAI) service who created a given AI-generated content--remains largely unexplored despite its growing importance. In this work, we aim to bridge this gap by conducting the first systematic study on watermark-based, user-level attribution of AI-generated content. Our key idea is to assign a unique watermark to each user of the GenAI service and embed this watermark into the AI-generated content created by that user. Attribution is then performed by identifying the user whose watermark best matches the one extracted from the given content. This approach, however, faces a key challenge: How should watermarks be selected for users to maximize attribution performance? To address the challenge, we first theoretically derive lower bounds on detection and attribution performance through rigorous probabilistic analysis for any given set of user watermarks. Then, we select watermarks for users to maximize these lower bounds, thereby optimizing detection and attribution performance. Our theoretical and empirical results show that watermark-based attribution inherits both the accuracy and (non-)robustness properties of the underlying watermark. Specifically, attribution remains highly accurate when the watermarked AI-generated content is either not post-processed or subjected to common post-processing such as JPEG compression, as well as black-box adversarial post-processing with limited query budgets.

#27 Integer Factorization via Tensor Network Schnorr's Sieving

著者: Marco Tesoro, Ilaria Siloi, Daniel Jaschke, Giuseppe Magnifico, Simone Montangero

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2410.16355

要約:
Classical public-key cryptography standards rely on the Rivest-Shamir-Adleman (RSA) encryption protocol. The security of this protocol is based on the exponential computational complexity of the most efficient classical algorithms for factoring large semiprime numbers into their two prime components. Here, we address RSA factorization building on Schnorr's mathematical framework where factorization translates into a combinatorial optimization problem. We solve the optimization task via tensor network methods, a quantum-inspired classical numerical technique. This tensor network Schnorr's sieving algorithm displays numerical evidence of polynomial scaling of resources with the bit-length of the semiprime. We factorize RSA numbers up to 100 bits and assess how computational resources scale through numerical simulations up to 130 bits, encoding the optimization problem in quantum systems with up to 256 qubits. Only the high-order polynomial scaling of the required resources limits the factorization of larger numbers. Although these results do not currently undermine the security of the present communication infrastructure, they strongly highlight the urgency of implementing post-quantum cryptography or quantum key distribution.

#28 SWA-LDM: Toward Stealthy Watermarks for Latent Diffusion Models

intellectual propertydiffusion

著者: Zhonghao Yang, Linye Lyu, Xuanhang Chang, Daojing He, YU LI

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2502.10495

要約:
Latent Diffusion Models (LDMs) have established themselves as powerful tools in the rapidly evolving field of image generation, capable of producing highly realistic images. However, their widespread adoption raises critical concerns about copyright infringement and the misuse of generated content. Watermarking techniques have emerged as a promising solution, enabling copyright identification and misuse tracing through imperceptible markers embedded in generated images. Among these, latent-based watermarking techniques are particularly promising, as they embed watermarks directly into the latent noise without altering the underlying LDM architecture. In this work, we demonstrate that such latent-based watermarks are practically vulnerable to detection and compromise through systematic analysis of output images' statistical patterns for the first time. To counter this, we propose SWA-LDM (Stealthy Watermark for LDM), a lightweight framework that enhances stealth by dynamically randomizing the embedded watermarks using the Gaussian-distributed latent noise inherent to diffusion models. By embedding unique, pattern-free signatures per image, SWA-LDM eliminates detectable artifacts while preserving image quality and extraction robustness. Experiments demonstrate an average of 20% improvement in stealth over state-of-the-art methods, enabling secure deployment of watermarked generative AI in real-world applications.

#29 MIP against Agent: Malicious Image Patches Hijacking Multimodal OS Agents

agent

著者: Lukas Aichberger, Alasdair Paren, Guohao Li, Philip Torr, Yarin Gal, Adel Bibi

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2503.10809

要約:
Recent advances in operating system (OS) agents have enabled vision-language models (VLMs) to directly control a user's computer. Unlike conventional VLMs that passively output text, OS agents autonomously perform computer-based tasks in response to a single user prompt. OS agents do so by capturing, parsing, and analysing screenshots and executing low-level actions via application programming interfaces (APIs), such as mouse clicks and keyboard inputs. This direct interaction with the OS significantly raises the stakes, as failures or manipulations can have immediate and tangible consequences. In this work, we uncover a novel attack vector against these OS agents: Malicious Image Patches (MIPs), adversarially perturbed screen regions that, when captured by an OS agent, induce it to perform harmful actions by exploiting specific APIs. For instance, a MIP can be embedded in a desktop wallpaper or shared on social media to cause an OS agent to exfiltrate sensitive user data. We show that MIPs generalise across user prompts and screen configurations, and that they can hijack multiple OS agents even during the execution of benign instructions. These findings expose critical security vulnerabilities in OS agents that have to be carefully addressed before their widespread deployment.

#30 Can Large Language Models Really Recognize Your Name?

著者: Dzung Pham, Peter Kairouz, Niloofar Mireshghallah, Eugene Bagdasarian, Chau Minh Pham, Amir Houmansadr

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.14549

要約:
Large language models (LLMs) are increasingly being used in privacy pipelines to detect and remedy sensitive data leakage. These solutions often rely on the premise that LLMs can reliably recognize human names, one of the most important categories of personally identifiable information (PII). In this paper, we reveal how LLMs can consistently mishandle broad classes of human names even in short text snippets due to ambiguous linguistic cues in the contexts. We construct AmBench, a benchmark of over 12,000 real yet ambiguous human names based on the name regularity bias phenomenon. Each name appears in dozens of concise text snippets that are compatible with multiple entity types. Our experiments with 12 state-of-the-art LLMs show that the recall of AmBench names drops by 20--40% compared to more recognizable names. This uneven privacy protection due to linguistic properties raises important concerns about the fairness of privacy enforcement. When the contexts contain benign prompt injections -- instruction-like user texts that can cause LLMs to conflate data with commands -- AmBench names can become four times more likely to be ignored in Clio, an LLM-powered enterprise tool used by Anthropic AI to extract supposedly privacy-preserving insights from user conversations with Claude. Our findings showcase blind spots in the performance and fairness of LLM-based privacy solutions and call for a systematic investigation into their privacy failure modes and countermeasures.

#31 ARMS: A Vision for Actor Reputation Metric Systems in the Open-Source Software Supply Chain

著者: Kelechi G. Kalu, Sofia Okorafor, Bet\"ul Durak, Kim Laine, Radames C. Moreno, Santiago Torres-Arias, James C. Davis

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.18760

要約:
Many critical information technology and cyber-physical systems rely on a supply chain of open-source software projects. OSS project maintainers often integrate contributions from external actors. While maintainers can assess the correctness of a pull request, assessing a pull request's cybersecurity implications is challenging. To help maintainers make this decision, we propose that the open-source ecosystem should incorporate Actor Reputation Metrics (ARMS). This capability would enable OSS maintainers to assess a prospective contributor's cybersecurity reputation. To support the future instantiation of ARMS, we identify seven generic security signals from industry standards; map concrete metrics from prior work and available security tools, describe study designs to refine and assess the utility of ARMS, and finally weigh its pros and cons.

#32 Towards Secure MLOps: Surveying Attacks, Mitigation Strategies, and Research Challenges

著者: Raj Patel, Himanshu Tripathi, Jasper Stone, Noorbakhsh Amiri Golilarz, Sudip Mittal, Shahram Rahimi, Vini Chaudhary

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2506.02032

要約:
The rapid adoption of machine learning (ML) technologies has driven organizations across diverse sectors to seek efficient and reliable methods to accelerate model development-to-deployment. Machine Learning Operations (MLOps) has emerged as an integrative approach addressing these requirements by unifying relevant roles and streamlining ML workflows. As the MLOps market continues to grow, securing these pipelines has become increasingly critical. However, the unified nature of MLOps ecosystem introduces vulnerabilities, making them susceptible to adversarial attacks where a single misconfiguration can lead to compromised credentials, severe financial losses, damaged public trust, and the poisoning of training data. Our paper presents a systematic application of the MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) framework, supplemented by reviews of white and grey literature, to systematically assess attacks across different phases of the MLOps ecosystem. We begin by reviewing prior work in this domain, then present our taxonomy and introduce a threat model that captures attackers with different knowledge and capabilities. We then present a structured taxonomy of attack techniques explicitly mapped to corresponding phases of the MLOps ecosystem, supported by examples drawn from red-teaming exercises and real-world incidents. This is followed by a taxonomy of mitigation strategies aligned with these attack categories, offering actionable early-stage defenses to strengthen the security of MLOps ecosystem. Given the gradual evolution and adoption of MLOps, we further highlight key research gaps that require immediate attention. Our work emphasizes the importance of implementing robust security protocols from the outset, empowering practitioners to safeguard MLOps ecosystem against evolving cyber attacks.

#33 SABRE-FL: Selective and Accurate Backdoor Rejection for Federated Prompt Learning

backdoor

著者: Momin Ahmad Khan, Yasra Chandio, Fatima Muhammad Anwar

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2506.22506

要約:
Federated Prompt Learning has emerged as a communication-efficient and privacy-preserving paradigm for adapting large vision-language models like CLIP across decentralized clients. However, the security implications of this setup remain underexplored. In this work, we present the first study of backdoor attacks in Federated Prompt Learning. We show that when malicious clients inject visually imperceptible, learnable noise triggers into input images, the global prompt learner becomes vulnerable to targeted misclassification while still maintaining high accuracy on clean inputs. Motivated by this vulnerability, we propose SABRE-FL, a lightweight, modular defense that filters poisoned prompt updates using an embedding-space anomaly detector trained offline on out-of-distribution data. SABRE-FL requires no access to raw client data or labels and generalizes across diverse datasets. We show, both theoretically and empirically, that malicious clients can be reliably identified and filtered using an embedding-based detector. Across five diverse datasets and four baseline defenses, SABRE-FL outperforms all baselines by significantly reducing backdoor accuracy while preserving clean accuracy, demonstrating strong empirical performance and underscoring the need for robust prompt learning in future federated systems.

#34 Learning to Detect Unseen Jailbreak Attacks in Large Vision-Language Models

著者: Shuang Liang, Zhihao Xu, Jiaqi Weng, Jialing Tao, Hui Xue, Xiting Wang

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2508.09201

要約:
Despite extensive alignment efforts, Large Vision-Language Models (LVLMs) remain vulnerable to jailbreak attacks. To mitigate these risks, existing detection methods are essential, yet they face two major challenges: generalization and accuracy. While learning-based methods trained on specific attacks fail to generalize to unseen attacks, learning-free methods based on hand-crafted heuristics suffer from limited accuracy and reduced efficiency. To address these limitations, we propose Learning to Detect (LoD), a learnable framework that eliminates the need for any attack data or hand-crafted heuristics. LoD operates by first extracting layer-wise safety representations directly from the model's internal activations using Multi-modal Safety Concept Activation Vectors classifiers, and then converting the high-dimensional representations into a one-dimensional anomaly score for detection via a Safety Pattern Auto-Encoder. Extensive experiments demonstrate that LoD consistently achieves state-of-the-art detection performance (AUROC) across diverse unseen jailbreak attacks on multiple LVLMs, while also significantly improving efficiency. Code is available at https://anonymous.4open.science/r/Learning-to-Detect-51CB.

#35 Trustworthy and Confidential SBOM Exchange

著者: Eman Abu Ishgair, Chinenye Okafor, Marcela S. Melara, Santiago Torres-Arias

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.13217

要約:
Software Bills of Materials (SBOMs) have become a regulatory requirement for improving software supply chain security and trust by means of transparency regarding components that make up software artifacts. However, enterprise and regulated software vendors commonly wish to restrict who can view confidential software metadata recorded in their SBOMs due to intellectual property or security vulnerability information. To address this tension between transparency and confidentiality, we propose Petra, an SBOM exchange system that empowers software vendors to interoperably compose and distribute redacted SBOM data using selective encryption. Petra enables software consumers to search redacted SBOMs for answers to specific security questions without revealing information they are not authorized to access. Petra leverages a format-agnostic, tamper-evident SBOM representation to generate efficient and confidentiality-preserving integrity proofs, allowing interested parties to cryptographically audit and establish trust in redacted SBOMs. Exchanging redacted SBOMs in our Petra prototype requires less than 1 extra KB per SBOM, and SBOM decryption accounts for at most 1% of the performance overhead during an SBOM query.

#36 SMOTE and Mirrors: Exposing Privacy Leakage from Synthetic Minority Oversampling

privacy

著者: Georgi Ganev, Reza Nazari, Rees Davison, Amir Dizche, Xinmin Wu, Ralph Abbey, Jorge Silva, Emiliano De Cristofaro

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.15083

要約:
The Synthetic Minority Over-sampling Technique (SMOTE) is one of the most widely used methods for addressing class imbalance and generating synthetic data. Despite its popularity, little attention has been paid to its privacy implications; yet, it is used in the wild in many privacy-sensitive applications. In this work, we conduct the first systematic study of privacy leakage in SMOTE: We begin by showing that prevailing evaluation practices, i.e., naive distinguishing and distance-to-closest-record metrics, completely fail to detect any leakage and that membership inference attacks (MIAs) can be instantiated with high accuracy. Then, by exploiting SMOTE's geometric properties, we build two novel attacks with very limited assumptions: DistinSMOTE, which perfectly distinguishes real from synthetic records in augmented datasets, and ReconSMOTE, which reconstructs real minority records from synthetic datasets with perfect precision and recall approaching one under realistic imbalance ratios. We also provide theoretical guarantees for both attacks. Experiments on eight standard imbalanced datasets confirm the practicality and effectiveness of these attacks. Overall, our work reveals that SMOTE is inherently non-private and disproportionately exposes minority records, highlighting the need to reconsider its use in privacy-sensitive applications.

#37 DSSmoothing: Toward Certified Dataset Ownership Verification for Pre-trained Language Models via Dual-Space Smoothing

著者: Ting Qiao, Xing Liu, Wenke Huang, Jianbin Li, Zhaoxin Fan, Yiming Li

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.15303

要約:
Large web-scale datasets have driven the rapid advancement of pre-trained language models (PLMs), but unauthorized data usage has raised serious copyright concerns. Existing dataset ownership verification (DOV) methods typically assume that watermarks remain stable during inference; however, this assumption often fails under natural noise and adversary-crafted perturbations. We propose the first certified dataset ownership verification method for PLMs under a gray-box setting (i.e., the defender can only query the suspicious model but is aware of its input representation module), based on dual-space smoothing (i.e., DSSmoothing). To address the challenges of text discreteness and semantic sensitivity, DSSmoothing introduces continuous perturbations in the embedding space to capture semantic robustness and applies controlled token reordering in the permutation space to capture sequential robustness. DSSmoothing consists of two stages: in the first stage, triggers are collaboratively embedded in both spaces to generate norm-constrained and robust watermarked datasets; in the second stage, randomized smoothing is applied in both spaces during verification to compute the watermark robustness (WR) of suspicious models and statistically compare it with the principal probability (PP) values of a set of benign models. Theoretically, DSSmoothing provides provable robustness guarantees for dataset ownership verification by ensuring that WR consistently exceeds PP under bounded dual-space perturbations. Extensive experiments on multiple representative web datasets demonstrate that DSSmoothing achieves stable and reliable verification performance and exhibits robustness against potential adaptive attacks. Our code is available at https://github.com/NcepuQiaoTing/DSSmoothing.

#38 An Evidence-Driven Analysis of Threat Information Sharing Challenges for Industrial Control Systems and Future Directions

著者: Adam Hahn, Rubin Krief, Daniel Rebori-Carretero, Rami Puzis, Aviad Elyashar, Nik Urlaub

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.18714

要約:
The increasing cyber threats to critical infrastructure highlight the importance of private companies and government agencies in detecting and sharing information about threat activities. Although the need for improved threat information sharing is widely recognized, various technical and organizational challenges persist, hindering effective collaboration. In this study, we review the challenges that disturb the sharing of usable threat information to critical infrastructure operators within the ICS domain. We analyze three major incidents: Stuxnet, Industroyer, and Triton. In addition, we perform a systematic analysis of 196 procedure examples across 79 MITRE ATT&CK techniques from 22 ICS-related malware families, utilizing automated natural language processing techniques to systematically extract and categorize threat observables. Additionally, we investigated nine recent ICS vulnerability advisories from the CISA Known Exploitable Vulnerability catalog. Our analysis identified four important limitations in the ICS threat information sharing ecosystem: (i) the lack of coherent representation of artifacts related to ICS adversarial techniques in information sharing language standards (e.g., STIX); (ii) the dependence on undocumented proprietary technologies; (iii) limited technical details provided in vulnerability and threat incident reports; and (iv) the accessibility of technical details for observed adversarial techniques. This study aims to guide the development of future information-sharing standards, including the enhancement of the cyber-observable objects schema in STIX, to ensure accurate representation of artifacts specific to ICS environments.

#39 Real-World Adversarial Attacks on RF-Based Drone Detectors

著者: Omer Gazit, Yael Itzhakev, Yuval Elovici, Asaf Shabtai

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.20712

要約:
Radio frequency (RF) based systems are increasingly used to detect drones by analyzing their RF signal patterns, converting them into spectrogram images which are processed by object detection models. Existing RF attacks against image based models alter digital features, making over-the-air (OTA) implementation difficult due to the challenge of converting digital perturbations to transmittable waveforms that may introduce synchronization errors and interference, and encounter hardware limitations. We present the first physical attack on RF image based drone detectors, optimizing class-specific universal complex baseband (I/Q) perturbation waveforms that are transmitted alongside legitimate communications. We evaluated the attack using RF recordings and OTA experiments with four types of drones. Our results show that modest, structured I/Q perturbations are compatible with standard RF chains and reliably reduce target drone detection while preserving detection of legitimate drones.

#40 Reasoning Hijacking: Subverting LLM Classification via Decision-Criteria Injection

著者: Yuansen Liu, Yixuan Tang, Anthony Kum Hoe Tun

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.10294

要約:
Current LLM safety research predominantly focuses on mitigating Goal Hijacking, preventing attackers from redirecting a model's high-level objective (e.g., from "summarizing emails" to "phishing users"). In this paper, we argue that this perspective is incomplete and highlight a critical vulnerability in Reasoning Alignment. We propose a new adversarial paradigm: Reasoning Hijacking and instantiate it with Criteria Attack, which subverts model judgments by injecting spurious decision criteria without altering the high-level task goal. Unlike Goal Hijacking, which attempts to override the system prompt, Reasoning Hijacking accepts the high-level goal but manipulates the model's decision-making logic by injecting spurious reasoning shortcut. Though extensive experiments on three different tasks (toxic comment, negative review, and spam detection), we demonstrate that even newest models are prone to prioritize injected heuristic shortcuts over rigorous semantic analysis. The results are consistent over different backbones. Crucially, because the model's "intent" remains aligned with the user's instructions, these attacks can bypass defenses designed to detect goal deviation (e.g., SecAlign, StruQ), exposing a fundamental blind spot in the current safety landscape. Data and code are available at https://github.com/Yuan-Hou/criteria_attack

#41 Cutting the Gordian Knot: Detecting Malicious PyPI Packages via a Knowledge-Mining Framework

著者: Wenbo Guo, Chengwei Liu, Ming Kang, Yiran Zhang, Jiahui Wu, Zhengzi Xu, Vinay Sachidananda, Yang Liu

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.16463

要約:
The Python Package Index (PyPI) has become a target for malicious actors, yet existing detection tools generate false positive rates of 15-30%, incorrectly flagging one-third of legitimate packages as malicious. This problem arises because current tools rely on simple syntactic rules rather than semantic understanding, failing to distinguish between identical API calls serving legitimate versus malicious purposes. To address this challenge, we propose PyGuard, a knowledge-driven framework that converts detection failures into useful behavioral knowledge by extracting patterns from existing tools' false positives and negatives. Our method utilizes hierarchical pattern mining to identify behavioral sequences that distinguish malicious from benign code, employs Large Language Models to create semantic abstractions beyond syntactic variations, and combines this knowledge into a detection system that integrates exact pattern matching with contextual reasoning. PyGuard achieves 99.50% accuracy with only 2 false positives versus 1,927-2,117 in existing tools, maintains 98.28% accuracy on obfuscated code, and identified 219 previously unknown malicious packages in real-world deployment. The behavioral patterns show cross-ecosystem applicability with 98.07% accuracy on NPM packages, demonstrating that semantic understanding enables knowledge transfer across programming languages.

#42 A Systemic Evaluation of Multimodal RAG Privacy

privacy

著者: Ali Al-Lawati, Suhang Wang

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.17644

要約:
The growing adoption of multimodal Retrieval-Augmented Generation (mRAG) pipelines for vision-centric tasks (e.g. visual QA) introduces important privacy challenges. In particular, while mRAG provides a practical capability to connect private datasets to improve model performance, it risks the leakage of private information from these datasets during inference. In this paper, we perform an empirical study to analyze the privacy risks inherent in the mRAG pipeline observed through standard model prompting. Specifically, we implement a case study that attempts to infer the inclusion of a visual asset, e.g. image, in the mRAG, and if present leak the metadata, e.g. caption, related to it. Our findings highlight the need for privacy-preserving mechanisms and motivate future research on mRAG privacy.

#43 The Stateless Pattern: Ephemeral Coordination as the Third Pillar of Digital Sovereignty

著者: Sean Carlin, Kevin Curran

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.17875

要約:
For the past three decades, the architecture of the internet has rested on two primary pillars - communication on the World Wide Web and Value such as Bitcoin/Distributed ledgers. However, a third critical pillar, Private Coordination has remained dependent on centralised intermediaries, effectively creating a surveillance architecture by default. This paper introduces the 'Stateless Pattern', a novel network topology that replaces the traditional 'Fortress' security model (database-centric) with a 'Mist' model (ephemeral relays). By utilising client-side cryptography and self-destructing server instances, we demonstrate a protocol where the server acts as a blind medium rather than a custodian of state. We present empirical data from a live deployment (https://signingroom.io), analysing over 1,900 requests and cache-hit ratios to validate the system's 'Zero-Knowledge' properties and institutional utility. The findings suggest that digital privacy can be commoditised as a utility, technically enforcing specific articles of the universal declaration of human rights not through policy, but through physics.

#44 Route Planning and Online Routing for Quantum Key Distribution Networks

著者: Jorge L\'opez, Charalampos Chatzinakis, Marc Cartigny

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2508.09735

要約:
Quantum Key Distribution (QKD) networks harness the principles of quantum physics in order to securely transmit cryptographic key material, providing physical guarantees. These networks require traditional management and operational components, such as routing information through the network elements. However, due to the limitations on capacity and the particularities of information handling in these networks, traditional shortest paths algorithms for routing perform poorly on both route planning and online routing, which is counterintuitive. Moreover, due to the scarce resources in such networks, often the expressed demand cannot be met by any assignment of routes. To address both the route planning problem and the need for fair automated suggestions in infeasible cases, we propose to model this problem as a Quadratic Programming (QP) problem. For the online routing problem, we showcase that the shortest (available) paths routing strategy performs poorly in the online setting. Furthermore, we prove that the widest shortest path routing strategy has a competitive ratio greater or equal than $\frac{1}{2}$, efficiently addressing both routing modes in QKD networks.

#45 Assessing metadata privacy in neuroimaging

privacy

著者: Emilie Kibsgaard, Anita Sue Jwa, Christopher J Markiewicz, David Rodriguez Gonzalez, Judith Sainz Pardo, Russell A. Poldrack, Cyril R. Pernet

公開日: Wed, 28 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.15278

要約:
The ethical and legal imperative to share research data without causing harm requires careful attention to privacy risks. While mounting evidence demonstrates that data sharing benefits science, legitimate concerns persist regarding the potential leakage of personal information that could lead to reidentification and subsequent harm. We reviewed metadata accompanying neuroimaging datasets from heterogeneous studies openly available on OpenNeuro, involving participants across the lifespan, from children to older adults, with and without clinical diagnoses, and including associated clinical score data. Using metaprivBIDS (https://github.com/CPernet/metaprivBIDS), a software application for BIDS compliant tsv/json files that computes and reports different privacy metrics (k-anonymity, k-global, l-diversity, SUDA, PIF), we found that privacy is generally well maintained, with serious vulnerabilities being rare. Nonetheless, issues were identified in nearly all datasets and warrant mitigation. Notably, clinical score data (e.g., neuropsychological results) posed minimal reidentification risk, whereas demographic variables: age, sex assigned at birth, sexual orientations, race, income, and geolocation, represented the principal privacy vulnerabilities. We outline practical measures to address these risks, enabling safer data sharing practices.

cs.CR updates on arXiv.org

📋 論文タイトル一覧