arXiv論文一覧 - cs.CR updates on arXiv.org

#1 Label Leakage Attacks in Machine Unlearning: A Parameter and Inversion-Based Approach

privacy

著者: Weidong Zheng, Kongyang Chen, Yao Huang, Yuanwei Guo, Yatie Xiao

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.07386

要約:
With the widespread application of artificial intelligence technologies in face recognition and other fields, data privacy security issues have received extensive attention, especially the \textit{right to be forgotten} emphasized by numerous privacy protection laws. Existing technologies have proposed various unlearning methods, but they may inadvertently leak the categories of unlearned data. This paper focuses on the category unlearning scenario, analyzes the potential problems of category leakage of unlearned data in multiple scenarios, and proposes four attack methods from the perspectives of model parameters and model inversion based on attackers with different knowledge backgrounds. At the level of model parameters, we construct discriminative features by computing either dot products or vector differences between the parameters of the target model and those of auxiliary models trained on subsets of retained data and unrelated data, respectively. These features are then processed via k-means clustering, Youden's Index, and decision tree algorithms to achieve accurate identification of the forgotten class. In the model inversion domain, we design a gradient optimization-based white-box attack and a genetic algorithm-based black-box attack to reconstruct class-prototypical samples. The prediction profiles of these synthesized samples are subsequently analyzed using a threshold criterion and an information entropy criterion to infer the forgotten class. We evaluate the proposed attacks on four standard datasets against five state-of-the-art unlearning algorithms, providing a detailed analysis of the strengths and limitations of each method. Experimental results demonstrate that our approach can effectively infer the classes forgotten by the target model.

#2 RefineRAG: Word-Level Poisoning Attacks via Retriever-Guided Text Refinement

backdoor

著者: Ziye Wang, Guanyu Wang, Kailong Wang

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.07403

要約:
Retrieval-Augmented Generation (RAG) significantly enhances Large Language Models (LLMs), but simultaneously exposes a critical vulnerability to knowledge poisoning attacks. Existing attack methods like PoisonedRAG remain detectable due to coarse-grained separate-and-concatenate strategies. To bridge this gap, we propose RefineRAG, a novel framework that treats poisoning as a holistic word-level refinement problem. It operates in two stages: Macro Generation produces toxic seeds guaranteed to induce target answers, while Micro Refinement employs a retriever-in-the-loop optimization to maximize retrieval priority without compromising naturalness. Evaluations on NQ and MSMARCO demonstrate that RefineRAG achieves state-of-the-art effectiveness, securing a 90% Attack Success Rate on NQ, while registering the lowest grammar errors and repetition rates among all baselines. Crucially, our proxy-optimized attacks successfully transfer to black-box victim systems, highlighting a severe practical threat.

#3 Private Seeds, Public LLMs: Realistic and Privacy-Preserving Synthetic Data Generation

privacysynthetic data

著者: Qian Ma, Sarah Rajtmajer

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.07486

要約:
Large language models (LLMs) have emerged as a powerful tool for synthetic data generation. A particularly important use case is producing synthetic replicas of private text, which requires carefully balancing privacy and utility. We propose Realistic and Privacy-Preserving Synthetic Data Generation (RPSG), which leverages privacy-preserving mechanisms, including formal differential privacy (DP); and private seeds, in particular text containing personal information, to generate realistic synthetic data. Comprehensive experiments against state-of-the-art private synthetic data generation methods demonstrate that RPSG achieves high fidelity to private data while providing strong privacy protection.

#4 Differentially Private Modeling of Disease Transmission within Human Contact Networks

privacy

著者: Shlomi Hod, Debanuj Nayak, Jason R. Gantenberg, Iden Kalemaj, Thomas A. Trikalinos, Adam Smith

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.07493

要約:
Epidemiologic studies of infectious diseases often rely on models of contact networks to capture the complex interactions that govern disease spread, and ongoing projects aim to vastly increase the scale at which such data can be collected. However, contact networks may include sensitive information, such as sexual relationships or drug use behavior. Protecting individual privacy while maintaining the scientific usefulness of the data is crucial. We propose a privacy-preserving pipeline for disease spread simulation studies based on a sensitive network that integrates differential privacy (DP) with statistical network models such as stochastic block models (SBMs) and exponential random graph models (ERGMs). Our pipeline comprises three steps: (1) compute network summary statistics using \emph{node-level} DP (which corresponds to protecting individuals' contributions); (2) fit a statistical model, like an ERGM, using these summaries, which allows generating synthetic networks reflecting the structure of the original network; and (3) simulate disease spread on the synthetic networks using an agent-based model. We evaluate the effectiveness of our approach using a simple Susceptible-Infected-Susceptible (SIS) disease model under multiple configurations. We compare both numerical results, such as simulated disease incidence and prevalence, as well as qualitative conclusions such as intervention effect size, on networks generated with and without differential privacy constraints. Our experiments are based on egocentric sexual network data from the ARTNet study (a survey about HIV-related behaviors). Our results show that the noise added for privacy is small relative to other sources of error (sampling and model misspecification). This suggests that, in principle, curators of such sensitive data can provide valuable epidemiologic insights while protecting privacy.

#5 TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation

backdoor

著者: Hengkai Ye, Zhechang Zhang, Jinyuan Jia, Hong Hu

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.07536

要約:
Large language models (LLMs) increasingly rely on external tools to perform time-sensitive tasks and real-world actions. While tool integration expands LLM capabilities, it also introduces a new prompt-injection attack surface: tool poisoning attacks (TPAs). Attackers manipulate tool descriptions by embedding malicious instructions (explicit TPAs) or misleading claims (implicit TPAs) to influence model behavior and tool selection. Existing defenses mainly detect anomalous instructions and remain ineffective against implicit TPAs. In this paper, we present TRUSTDESC, the first framework for preventing tool poisoning by automatically generating trusted tool descriptions from implementations. TRUSTDESC derives implementation-faithful descriptions through a three-stage pipeline. SliceMin performs reachability-aware static analysis and LLM-guided debloating to extract minimal tool-relevant code slices. DescGen synthesizes descriptions from these slices while mitigating misleading or adversarial code artifacts. DynVer refines descriptions through dynamic verification by executing synthesized tasks and validating behavioral claims. We evaluate TRUSTDESC on 52 real-world tools across multiple tool ecosystems. Results show that TRUSTDESC produces accurate tool descriptions that improve task completion rates while mitigating implicit TPAs at their root, with minimal time and monetary overhead.

#6 MCP-DPT: A Defense-Placement Taxonomy and Coverage Analysis for Model Context Protocol Security

著者: Mehrdad Rostamzadeh, Sidhant Narula, Nahom Birhan, Mohammad Ghasemigol, Daniel Takabi

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.07551

要約:
The Model Context Protocol (MCP) enables large language models (LLMs) to dynamically discover and invoke third-party tools, significantly expanding agent capabilities while introducing a distinct security landscape. Unlike prompt-only interactions, MCP exposes pre-execution artifacts, shared context, multi-turn workflows, and third-party supply chains to adversarial influence across independently operated components. While recent work has identified MCP-specific attacks and evaluated defenses, existing studies are largely attack-centric or benchmark-driven, providing limited guidance on where mitigation responsibility should reside within the MCP architecture. This is problematic given MCP's multi-party design and distributed trust boundaries. We present a defense-placement-oriented security analysis of MCP, introducing a layer-aligned taxonomy that organizes attacks by the architectural component responsible for enforcement. Threats are mapped across six MCP layers, and primary and secondary defense points are identified to support principled defense-in-depth reasoning under adversaries controlling tools, servers, or ecosystem components. A structured mapping of existing academic and industry defenses onto this framework reveals uneven and predominantly tool-centric protection, with persistent gaps at the host orchestration, transport, and supply-chain layers. These findings suggest that many MCP security weaknesses stem from architectural misalignment rather than isolated implementation flaws.

#7 MEV-ACE: Identity-Authenticated Fair Ordering for Proposer-Controlled MEV Mitigation

著者: Jian Sheng Wang

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.07568

要約:
Maximal Extractable Value, or MEV, remains a structural threat to blockchain fairness because a block producer can often observe pending transactions and unilaterally decide their ordering or inclusion. Existing mitigations hide transaction contents or outsource ordering, but they often leave two gaps unresolved. First, commitments are not authenticated by slashable identities. Second, inclusion obligations are not backed by transferable evidence that other validators can verify. This paper presents MEV ACE, a fair ordering protocol for proposer controlled ordering MEV. MEV ACE combines three mechanisms. First, it uses registered economic identities whose authentication keys are deterministically derived from the ACE GF framework and bonded on chain. Second, it uses authenticated commit and open messages with validator receipt thresholds, which make admissibility and inclusion obligations independently auditable. Third, it uses verifiable delay based randomness to determine transaction order only after the admissible commitment set is fixed. We formalize the protocol in a Byzantine fault tolerant validator model with threshold receipts and show three properties under standard assumptions: order unpredictability after the admissible set is locked, commitment authenticity under signature unforgeability, and accountable inclusion for transactions that obtain threshold commit and open receipts. Under these conditions, and when producer and user bonds exceed the one slot gain from invalid execution or selective non opening, MEV ACE removes unilateral proposer discretion over front running, sandwich attacks, and censorship against admitted transactions. The protocol remains single slot in structure, requires no threshold decryption committee, and is compatible with post quantum signature schemes such as ML DSA 44.

#8 Interpreting the Error of Differentially Private Median Queries through Randomization Intervals

privacy

著者: Thomas Humphries, Tim Li, Shufan Zhang, Karl Knopf, Xi He

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.07581

要約:
It can be difficult for practitioners to interpret the quality of differentially private (DP) statistics due to the added noise. One method to help analysts understand the amount of error introduced by DP is to return a Randomization Interval (RI), along with the statistic. A RI is a type of confidence interval that bounds the error introduced by DP. For queries where the noise distribution depends on the input, such as the median, prior work degrades the quality of the median itself to obtain a high-quality RI. In this work, we propose PostRI, a solution to compute a RI after the median has been estimated. PostRI enables a median estimation with 14%-850% higher utility than related work, while maintaining a narrow RI.

#9 AITH: A Post-Quantum Continuous Delegation Protocol for Human-AI Trust Establishment

著者: Zhaoliang Chen

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.07695

要約:
The rapid deployment of AI agents acting autonomously on behalf of human principals has outpaced the development of cryptographic protocols for establishing, bounding, and revoking human-AI trust relationships. Existing frameworks (TLS, OAuth 2.0, Macaroons) assume deterministic software and cannot address probabilistic AI agents operating continuously within variable trust boundaries. We present AITH (AI Trust Handshake), a post-quantum continuous delegation protocol. AITH introduces: (1) a Continuous Delegation Certificate signed once with ML-DSA-87 (FIPS 204, NIST Level 5), replacing per-operation signing with sub-microsecond boundary checks at 4.7M ops/sec; (2) a six-check Boundary Engine enforcing hard constraints, rate limits, and escalation triggers with zero cryptographic overhead on the critical path; (3) a push-based Revocation Protocol propagating invalidation within one second. A three-tier SHA-256 Responsibility Chain provides tamper-evident audit logging. All five security theorems are machine-verified via Tamarin Prover under the Dolev-Yao model. We validate AITH through five rounds of multi-model adversarial auditing, resolving 12 vulnerabilities across four severity layers. Simulation of 100,000 operations shows 79.5% autonomous execution, 6.1% human escalation, and 14.4% blocked.

#10 TrajGuard: Streaming Hidden-state Trajectory Detection for Decoding-time Jailbreak Defense

著者: Cheng Liu, Xiaolei Liu, Xingyu Li, Bangzhou Xin, Kangyi Ding

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.07727

要約:
Existing jailbreak defense paradigms primarily rely on static detection of prompts, outputs, or internal states, often neglecting the dynamic evolution of risk during decoding. This oversight leaves risk signals embedded in decoding trajectories underutilized, constituting a critical blind spot in current defense systems. In this work, we empirically demonstrate that hidden states in critical layers during the decoding phase carry stronger and more stable risk signals than input jailbreak prompts. Specifically, the hidden representations of tokens generated during jailbreak attempts progressively approach high-risk regions in the latent space. Based on this observation, we propose TrajGuard, a training-free, decoding-time defense framework. TrajGuard aggregates hidden-state trajectories via a sliding window to quantify risk in real time, triggering a lightweight semantic adjudication only when risk within a local window persistently exceeds a threshold. This mechanism enables the immediate interruption or constraint of subsequent decoding. Extensive experiments across 12 jailbreak attacks and various open-source LLMs show that TrajGuard achieves an average defense rate of 95%. Furthermore, it reduces detection latency to 5.2 ms/token while maintaining a false positive rate below 1.5%. These results confirm that hidden-state trajectories during decoding can effectively support real-time jailbreak detection, highlighting a promising direction for defenses without model modification.

#11 The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training

著者: Rui Zhang, Hongwei Li, Yun Shen, Xinyue Shen, Wenbo Jiang, Guowen Xu, Yang Liu, Michael Backes, Yang Zhang

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.07754

要約:
The deployment of large language models (LLMs) raises significant ethical and safety concerns. While LLM alignment techniques are adopted to improve model safety and trustworthiness, adversaries can exploit these techniques to undermine safety for malicious purposes, resulting in \emph{misalignment}. Misaligned LLMs may be published on open platforms to magnify harm. To address this, additional safety alignment, referred to as \emph{realignment}, is necessary before deploying untrusted third-party LLMs. This study explores the efficacy of fine-tuning methods in terms of misalignment, realignment, and the effects of their interplay. By evaluating four Supervised Fine-Tuning (SFT) and two Preference Fine-Tuning (PFT) methods across four popular safety-aligned LLMs, we reveal a mechanism asymmetry between attack and defense. While Odds Ratio Preference Optimization (ORPO) is most effective for misalignment, Direct Preference Optimization (DPO) excels in realignment, albeit at the expense of model utility. Additionally, we identify model-specific resistance, residual effects of multi-round adversarial dynamics, and other noteworthy findings. These findings highlight the need for robust safeguards and customized safety alignment strategies to mitigate potential risks in the deployment of LLMs. Our code is available at https://github.com/zhangrui4041/The-Art-of-Mis-alignment.

#12 Anamorphic Encryption with CCA Security: A Standard Model Construction

著者: Shujun Wang, Jianting Ning, Qinyi Li, Leo Yu Zhang

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.07771

要約:
Anamorphic encryption serves as a vital tool for covert communication, maintaining secrecy even during post-compromise scenarios. Particularly in the receiver-anamorphic setting, a user can shield hidden messages even when coerced into surrendering their secret keys. However, a major bottleneck in existing research is the reliance on CPA-security, leaving the construction of a generic, CCA-secure anamorphic scheme in the standard model as a persistent open challenge. To bridge this gap, we formalize the Anamorphic Key Encapsulation Mechanism (AKEM), encompassing both Public-Key (PKAKEM) and Symmetric-Key (SKAKEM) variants. We propose generic constructions for these primitives, which can be instantiated using any KEM that facilitates randomness recovery. Notably, our framework achieves strong IND-CCA (sIND-CCA) security for the covert channel. We provide a rigorous formal proof in the standard model, demonstrating resilience against a "dictator" who controls the decapsulation key. The security of our approach is anchored in the injective property of the base KEM, which ensures a unique mapping between ciphertexts and randomness. By integrating anamorphism into the KEM-DEM paradigm, our work significantly enhances the practical utility of covert channels within modern cryptographic infrastructures.

#13 BRASP: Boolean Range Queries over Encrypted Spatial Data with Access and Search Pattern Privacy

privacy

著者: Jing Zhang, Ganxuan Yang, Yifei Yang, Siqi Wen, Zhengyang Qiu

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.07797

要約:
Searchable Encryption (SE) enables users to query outsourced encrypted data while preserving data confidentiality. However, most efficient schemes still leak the search pattern and access pattern, which may allow an honest-but-curious cloud server to infer query contents, user interests, or returned records from repeated searches and observed results. Existing pattern-hiding solutions mainly target keyword queries and do not naturally support Boolean range queries over encrypted spatial data. This paper presents BRASP, a searchable encryption scheme for Boolean range queries over encrypted spatial data. BRASP combines Hilbert-curve-based prefix encoding with encrypted prefix--ID and keyword--ID inverted indexes to support efficient spatial range filtering and conjunctive keyword matching. To hide the search pattern and access pattern under a dual-server setting, BRASP integrates index shuffling for encrypted keyword and prefix entries with ID-field redistribution across two non-colluding cloud servers. BRASP also supports dynamic updates and achieves forward security. We formalize the security of BRASP through confidentiality, shuffle indistinguishability, query unforgeability, and forward-security analyses, and we evaluate its performance experimentally on a real-world dataset. The results show that BRASP effectively protects query privacy while incurring relatively low computation and communication overhead. To facilitate reproducibility and further research, the source code of BRASP is publicly available at https://github.com/Egbert-Lannister/BRASP

#14 Are GUI Agents Focused Enough? Automated Distraction via Semantic-level UI Element Injection

agent

著者: Wenkui Yang, Chao Jin, Haisu Zhu, Weilin Luo, Derek Yuen, Kun Shao, Huaibo Huang, Junxian Duan, Jie Cao, Ran He

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.07831

要約:
Existing red-teaming studies on GUI agents have important limitations. Adversarial perturbations typically require white-box access, which is unavailable for commercial systems, while prompt injection is increasingly mitigated by stronger safety alignment. To study robustness under a more practical threat model, we propose Semantic-level UI Element Injection, a red-teaming setting that overlays safety-aligned and harmless UI elements onto screenshots to misdirect the agent's visual grounding. Our method uses a modular Editor-Overlapper-Victim pipeline and an iterative search procedure that samples multiple candidate edits, keeps the best cumulative overlay, and adapts future prompt strategies based on previous failures. Across five victim models, our optimized attacks improve attack success rate by up to 4.4x over random injection on the strongest victims. Moreover, elements optimized on one source model transfer effectively to other target models, indicating model-agnostic vulnerabilities. After the first successful attack, the victim still clicks the attacker-controlled element in more than 15% of later independent trials, versus below 1% for random injection, showing that the injected element acts as a persistent attractor rather than simple visual clutter.

#15 A Hardware-Anchored Privacy Middleware for PII Sharing Across Heterogeneous Embedded Consumer Devices

privacy

著者: Aditya Sabbineni, Pravin Nagare, Devendra Dahiphale, Preetam Dedu, Willison Lopes

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.07839

要約:
The rapid expansion of the Internet of Things (IoT) and smart home ecosystems has led to a fragmented landscape of user data management across consumer electronics (CE) such as Smart TVs, gaming consoles, and set-top boxes. Current onboarding processes on these devices are characterized by high friction due to manual data entry and opaque data-sharing practices. This paper introduces the User Data Sharing System (UDSS), a platform-agnostic framework designed to facilitate secure, privacy-first PII (Personally Identifiable Information) exchange between device platforms and third-party applications. Our system implements a Contextual Scope Enforcement (CSE) mechanism that programmatically restricts data exposure based on user intent - specifically distinguishing between Sign-In and Sign-Up workflows. Unlike cloud-anchored identity standards such as FIDO2/WebAuthn, UDSS is designed for shared, device-centric CE environments where persistent user-to-device binding cannot be assumed. We further propose a tiered access model that balances developer needs with regulatory compliance (GDPR/CCPA). A proof-of-concept implementation on a reference ARMv8 Linux-based middleware demonstrates that UDSS reduces user onboarding latency by 65% and measurably reduces PII over-exposure risk through protocol-enforced data minimization. This framework provides a standardized approach to identity management in the heterogeneous CE market.

#16 xDup: Privacy-Preserving Deduplication for Humanitarian Organizations using Fuzzy PSI

privacy

著者: Tim Rausch, Sylvain Chatel, Wouter Lueks

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.08019

要約:
Humanitarian organizations help to ensure people's livelihoods in crisis situations. Typically, multiple organizations operate in the same region. To ensure that the limited budget of these organizations can help as many people as possible, organizations perform cross-organizational deduplication to detect duplicate registrations and ensure recipients receive aid from at most one organization. Current deduplication approaches risk privacy harm to vulnerable aid recipients by sharing their data with other organizations. We analyzed the needs of humanitarian organizations to identify the requirements for privacy-friendly cross-organizational deduplication fit for real-life humanitarian missions. We present xDup, a new practical deduplication system that meets the requirements of humanitarian organizations and is two orders of magnitude faster than current solutions. xDup builds on Fuzzy PSI, and we present otFPSI, a concretely efficient Fuzzy PSI protocol for Hamming Space without input assumptions. We show that it is more efficient than existing Fuzzy PSI protocols.

#17 PrivFedTalk: Privacy-Aware Federated Diffusion with Identity-Stable Adapters for Personalized Talking-Head Generation

privacydiffusion

著者: Soumya Mazumdar, Vineet Kumar Rakesh, Tapas Samanta

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.08037

要約:
Talking-head generation has advanced rapidly with diffusion-based generative models, but training usually depends on centralized face-video and speech datasets, raising major privacy concerns. The problem is more acute for personalized talking-head generation, where identity-specific data are highly sensitive and often cannot be pooled across users or devices. PrivFedTalk is presented as a privacy-aware federated framework for personalized talking-head generation that combines conditional latent diffusion with parameter-efficient identity adaptation. A shared diffusion backbone is trained across clients, while each client learns lightweight LoRA identity adapters from local private audio-visual data, avoiding raw data sharing and reducing communication cost. To address heterogeneous client distributions, Identity-Stable Federated Aggregation (ISFA) weights client updates using privacy-safe scalar reliability signals computed from on-device identity consistency and temporal stability estimates. Temporal-Denoising Consistency (TDC) regularization is introduced to reduce inter-frame drift, flicker, and identity drift during federated denoising. To limit update-side privacy risk, secure aggregation and client-level differential privacy are applied to adapter updates. The implementation supports both low-memory GPU execution and multi-GPU client-parallel training on heterogeneous shared hardware. Comparative experiments on the present setup across multiple training and aggregation conditions with PrivFedTalk, FedAvg, and FedProx show stable federated optimization and successful end-to-end training and evaluation under constrained resources. The results support the feasibility of privacy-aware personalized talking-head training in federated environments, while suggesting that stronger component-wise, privacy-utility, and qualitative claims need further standardized evaluation.

#18 TADP-RME: A Trust-Adaptive Differential Privacy Framework for Enhancing Reliability of Data-Driven Systems

privacy

著者: Labani Halder, Payel Sadhukhan, Sarbani Palit

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.08113

要約:
Ensuring reliability in adversarial settings necessitates treating privacy as a foundational component of data-driven systems. While differential privacy and cryptographic protocols offer strong guarantees, existing schemes rely on a fixed privacy budget, leading to a rigid utility-privacy trade-off that fails under heterogeneous user trust. Moreover, noise-only differential privacy preserves geometric structure, which inference attacks exploit, causing privacy leakage. We propose TADP-RME (Trust-Adaptive Differential Privacy with Reverse Manifold Embedding), a framework that enhances reliability under varying levels of user trust. It introduces an inverse trust score in the range [0,1] to adaptively modulate the privacy budget, enabling smooth transitions between utility and privacy. Additionally, Reverse Manifold Embedding applies a nonlinear transformation to disrupt local geometric relationships while preserving formal differential privacy guarantees through post-processing. Theoretical and empirical results demonstrate improved privacy-utility trade-offs, reducing attack success rates by up to 3.1 percent without significant utility degradation. The framework consistently outperforms existing methods against inference attacks, providing a unified approach for reliable learning in adversarial environments.

#19 Multimodal Reasoning with LLM for Encrypted Traffic Interpretation: A Benchmark

著者: Longgang Zhang, Xiaowei Fu, Fuxiang Huang, Lei Zhang

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.08140

要約:
Network traffic, as a key media format, is crucial for ensuring security and communications in modern internet infrastructure. While existing methods offer excellent performance, they face two key bottlenecks: (1) They fail to capture multidimensional semantics beyond unimodal sequence patterns. (2) Their black box property, i.e., providing only category labels, lacks an auditable reasoning process. We identify a key factor that existing network traffic datasets are primarily designed for classification and inherently lack rich semantic annotations, failing to generate human-readable evidence report. To address data scarcity, this paper proposes a Byte-Grounded Traffic Description (BGTD) benchmark for the first time, combining raw bytes with structured expert annotations. BGTD provides necessary behavioral features and verifiable chains of evidence for multimodal reasoning towards explainable encrypted traffic interpretation. Built upon BGTD, this paper proposes an end-to-end traffic-language representation framework (mmTraffic), a multimodal reasoning architecture bridging physical traffic encoding and semantic interpretation. In order to alleviate modality interference and generative hallucinations, mmTraffic adopts a jointly-optimized perception-cognition architecture. By incorporating a perception-centered traffic encoder and a cognition-centered LLM generator, mmTraffic achieves refined traffic interpretation with guaranteed category prediction. Extensive experiments demonstrate that mmTraffic autonomously generates high-fidelity, human-readable, and evidence-grounded traffic interpretation reports, while maintaining highly competitive classification accuracy comparing to specialized unimodal model (e.g., NetMamba). The source code is available at https://github.com/lgzhangzlg/Multimodal-Reasoning-with-LLM-for-Encrypted-Traffic-Interpretation-A-Benchmark

#20 Towards Identification and Intervention of Safety-Critical Parameters in Large Language Models

著者: Weiwei Qi, Zefeng Wu, Tianhang Zheng, Zikang Zhang, Xiaojun Jia, Zhan Qin, Kui Ren

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.08297

要約:
Ensuring Large Language Model (LLM) safety is crucial, yet the lack of a clear understanding about safety mechanisms hinders the development of precise and reliable methodologies for safety intervention across diverse tasks. To better understand and control LLM safety, we propose the Expected Safety Impact (ESI) framework for quantifying how different parameters affect LLM safety. Based on ESI, we reveal distinct safety-critical patterns across different LLM architectures: In dense LLMs, many safety-critical parameters are located in value matrices (V) and MLPs in middle layers, whereas in Mixture-of-Experts (MoE) models, they shift to the late-layer MLPs. Leveraging ESI, we further introduce two targeted intervention paradigms for safety enhancement and preservation, i.e., Safety Enhancement Tuning (SET) and Safety Preserving Adaptation (SPA). SET can align unsafe LLMs by updating only a few safety-critical parameters, effectively enhancing safety while preserving original performance. SPA safeguards well-aligned LLMs during capability-oriented intervention (e.g., instruction tuning) by preventing disruption of safety-critical weights, allowing the LLM to acquire new abilities and maintain safety capabilities. Extensive evaluations on different LLMs demonstrate that SET can reduce the attack success rates of unaligned LLMs by over 50% with only a 100-iteration update on 1% of model weights. SPA can limit the safety degradation of aligned LLMs within 1% after a 1,000-iteration instruction fine-tuning on different tasks. Our code is available at: https://github.com/ZJU-LLM-Safety/SafeWeights-ACL.

#21 Securing Retrieval-Augmented Generation: A Taxonomy of Attacks, Defenses, and Future Directions

著者: Yuming Xu, Mingtao Zhang, Zhuohan Ge, Haoyang Li, Nicole Hu, Jason Chen Zhang, Qing Li, Lei Chen

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.08304

要約:
Retrieval-augmented generation (RAG) significantly enhances large language models (LLMs) but introduces novel security risks through external knowledge access. While existing studies cover various RAG vulnerabilities, they often conflate inherent LLM risks with those specifically introduced by RAG. In this paper, we propose that secure RAG is fundamentally about the security of the external knowledge-access pipeline. We establish an operational boundary to separate inherent LLM flaws from RAG-introduced or RAG-amplified threats. Guided by this perspective, we abstract the RAG workflow into six stages and organize the literature around three trust boundaries and four primary security surfaces, including pre-retrieval knowledge corruption, retrieval-time access manipulation, downstream context exploitation, and knowledge exfiltration. By systematically reviewing the corresponding attacks, defenses, remediation mechanisms, and evaluation benchmarks, we reveal that current defenses remain largely reactive and fragmented. Finally, we discuss these gaps and highlight future directions toward layered, boundary-aware protection across the entire knowledge-access lifecycle.

#22 Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain

agent

著者: Hanzhi Liu, Chaofan Shou, Hongbo Wen, Yanju Chen, Ryan Jingyang Fang, Yu Feng

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.08407

要約:
Large language model (LLM) agents increasingly rely on third-party API routers to dispatch tool-calling requests across multiple upstream providers. These routers operate as application-layer proxies with full plaintext access to every in-flight JSON payload, yet no provider enforces cryptographic integrity between client and upstream model. We present the first systematic study of this attack surface. We formalize a threat model for malicious LLM API routers and define two core attack classes, payload injection (AC-1) and secret exfiltration (AC-2), together with two adaptive evasion variants: dependency-targeted injection (AC-1.a) and conditional delivery (AC-1.b). Across 28 paid routers purchased from Taobao, Xianyu, and Shopify-hosted storefronts and 400 free routers collected from public communities, we find 1 paid and 8 free routers actively injecting malicious code, 2 deploying adaptive evasion triggers, 17 touching researcher-owned AWS canary credentials, and 1 draining ETH from a researcher-owned private key. Two poisoning studies further show that ostensibly benign routers can be pulled into the same attack surface: a leaked OpenAI key generates 100M GPT-5.4 tokens and more than seven Codex sessions, while weakly configured decoys yield 2B billed tokens, 99 credentials across 440 Codex sessions, and 401 sessions already running in autonomous YOLO mode. We build Mine, a research proxy that implements all four attack classes against four public agent frameworks, and use it to evaluate three deployable client-side defenses: a fail-closed policy gate, response-side anomaly screening, and append-only transparency logging.

#23 Post-Quantum Cryptographic Analysis of Message Transformations Across the Network Stack

著者: Ashish Kundu, Vishal Chakraborty, Ramana Kompella

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.08480

要約:
When a user sends a message over a wireless network, the message does not travel as-is. It is encrypted, authenticated, encapsulated, and transformed as it descends the protocol stack from the application layer to the physical medium. Each layer may apply its own cryptographic operations using its own algorithms, and these algorithms differ in their vulnerability to quantum computers. The security of the overall communication depends not on any single layer but on the \emph{composition} of transformations across all layers. We develop a preliminary formal framework for analyzing these cross-layer cryptographic transformations with respect to post-quantum cryptographic (PQC) readiness. We classify every per-layer cryptographic operation into one of four quantum vulnerability categories, define how per-layer PQC statuses compose across the full message transformation chain, and prove that this composition forms a bounded lattice with confidentiality composing via the join (max) operator and authentication via the meet (min). We apply the framework to five communication scenarios spanning Linux and iOS platforms, and identify several research challenges. Among our findings: WPA2-Personal provides strictly better PQC posture than both WPA3-Personal and WPA2-Enterprise; a single post-quantum layer suffices for payload confidentiality but \emph{every} layer must migrate for complete authentication; and metadata protection depends solely on the outermost layer.

#24 PIArena: A Platform for Prompt Injection Evaluation

著者: Runpeng Geng, Chenlong Yin, Yanting Wang, Ying Chen, Jinyuan Jia

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.08499

要約:
Prompt injection attacks pose serious security risks across a wide range of real-world applications. While receiving increasing attention, the community faces a critical gap: the lack of a unified platform for prompt injection evaluation. This makes it challenging to reliably compare defenses, understand their true robustness under diverse attacks, or assess how well they generalize across tasks and benchmarks. For instance, many defenses initially reported as effective were later found to exhibit limited robustness on diverse datasets and attacks. To bridge this gap, we introduce PIArena, a unified and extensible platform for prompt injection evaluation that enables users to easily integrate state-of-the-art attacks and defenses and evaluate them across a variety of existing and new benchmarks. We also design a dynamic strategy-based attack that adaptively optimizes injected prompts based on defense feedback. Through comprehensive evaluation using PIArena, we uncover critical limitations of state-of-the-art defenses: limited generalizability across tasks, vulnerability to adaptive attacks, and fundamental challenges when an injected task aligns with the target task. The code and datasets are available at https://github.com/sleeepeer/PIArena.

#25 IPEK: Intelligent Priority-Aware Event-Based Trust with Asymmetric Knowledge for Resilient Vehicular Ad-Hoc Networks

著者: \.Ipek Abas{\i}kele\c{s} Turgut

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.07532

要約:
Vehicular Ad Hoc Networks (VANETs) are vulnerable to intelligent attackers who exploit the homogeneous treatment of traffic events in existing trust models. These attackers accumulate reputation by reporting correctly on low-priority events and then inject false data during safety-critical situations - a strategy that current approaches cannot detect because they ignore event severity and location criticality in trust calculations. This paper addresses this gap through three contributions. First, it introduces event-aware and location-aware intelligent attack models, which have not been formally defined or simulated in prior work. Second, it proposes an asymmetric local trust mechanism where penalties scale with event and location severity while rewards follow an asymptotic model, making trust difficult to regain after misuse. Third, it adapts Dempster-Shafer Theory for global trust fusion using Yager's combination rule - assigning conflicting evidence to uncertainty rather than forcing premature decisions - combined with sequential source-reliability ordering and an asymmetric risk accentuation mechanism. Simulations using OMNeT++, Veins, and SUMO compare the proposed system (IPEK) against MDT and TCEMD under attacker densities of 15-35 percent. IPEK maintained 0 percent False Positive Rate across all scenarios, meaning no honest vehicle was wrongly revoked, while sustaining Recall above 75 percent and F1-scores exceeding 0.86. These results demonstrate that integrating context-awareness into both attack modeling and trust evaluation significantly outperforms symmetric approaches against strategic adversaries.

#26 Vulnerability Abundance: A formal proof of infinite vulnerabilities in code

著者: Eireann Leverett, Jeroen van der Ham-de Vos

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.07539

要約:
We present a constructive proof that a single C program, the \emph{Vulnerability Factory}, admits a countably infinite set of distinct, independently CVE-assignable software vulnerabilities. We formalise the argument using elementary set theory, verify it against MITRE's CVE Numbering Authority counting rules, sketch a model-checking analysis that corroborates unbounded vulnerability generation, and provide a Turing-machine characterisation that situates the result within classical computability theory. We then contextualise this result within the long-running debate on whether undiscovered vulnerabilities in software are \emph{dense} or \emph{sparse}, and introduce the concept of \emph{vulnerability abundance}: a quantitative analogy to chemical elemental abundance that describes the proportional distribution of vulnerability classes across the global software corpus. Because different programming languages render different vulnerability classes possible or impossible, and because language popularity shifts over time, vulnerability abundance is neither static nor uniform. Crucially, we distinguish between infinite \emph{vulnerabilities} and the far smaller set of \emph{exploits}: empirical evidence suggests that fewer than 6\% of published CVEs are ever exploited in the wild, and that exploitation frequency depends not only on vulnerability abundance but on the market share of the affected software. We argue that measuring vulnerability abundance, and its interaction with software deployment, has practical value for both vulnerability prevention and cyber-risk analysis. We conclude that if one programme can harbour infinitely many vulnerabilities, the set of all software vulnerabilities is necessarily infinite, and we suggest the Vulnerability Factory may serve as a reusable proof artifact, a foundational `test object',for future formal results in vulnerability theory.

#27 SAFE: Spatially-Aware Feedback Enhancement for Fault-Tolerant Trust Management in VANETs

著者: \.Ipek Abas{\i}kele\c{s} Turgut

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.07552

要約:
Trust management in VANETs is critically important for secure communication between vehicles. In event-based trust systems, vehicles broadcast the events they witness to their surroundings and send feedback reports about other vehicles to a central authority. However, when the event status changes, vehicles that have left the witness area cannot see this change and produce erroneous feedback. This leads to unfair penalization of honest nodes. To solve this problem, the SAFE (Spatially-Aware Feedback Enhancement) approach is proposed. In SAFE, vehicles continue to record messages as long as they remain in the witness area and send updated feedback reports before leaving the area. Additionally, by keeping records between witness and decision distances, more accurate evaluation is ensured. SAFE and TCEMD were compared in single-event, multi-event, and different decision distance scenarios. The results clearly demonstrate SAFE's superiority. In single-event, feedback report count increased 2.5 times, and in multi-event, it increased over 6 times. Negative feedback rate dropped from 77 percent to below 1 percent. While TCEMD incorrectly blacklisted 34 nodes, this number remained at 1 in SAFE. Even when the decision distance was reduced to 200 m, SAFE showed high accuracy. The findings show that SAFE protects honest nodes in attack-free systems and increases network reliability.

#28 ACIArena: Toward Unified Evaluation for Agent Cascading Injection

agent

著者: Hengyu An, Minxi Li, Jinghuai Zhang, Naen Xu, Chunyi Zhou, Changjiang Li, Xiaogang Xu, Tianyu Du, Shouling Ji

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.07775

要約:
Collaboration and information sharing empower Multi-Agent Systems (MAS) but also introduce a critical security risk known as Agent Cascading Injection (ACI). In such attacks, a compromised agent exploits inter-agent trust to propagate malicious instructions, causing cascading failures across the system. However, existing studies consider only limited attack strategies and simplified MAS settings, limiting their generalizability and comprehensive evaluation. To bridge this gap, we introduce ACIArena, a unified framework for evaluating the robustness of MAS. ACIArena offers systematic evaluation suites spanning multiple attack surfaces (i.e., external inputs, agent profiles, inter-agent messages) and attack objectives (i.e., instruction hijacking, task disruption, information exfiltration). Specifically, ACIArena establishes a unified specification that jointly supports MAS construction and attack-defense modules. It covers six widely used MAS implementations and provides a benchmark of 1,356 test cases for systematically evaluating MAS robustness. Our benchmarking results show that evaluating MAS robustness solely through topology is insufficient; robust MAS require deliberate role design and controlled interaction patterns. Moreover, defenses developed in simplified environments often fail to transfer to real-world settings; narrowly scoped defenses may even introduce new vulnerabilities. ACIArena aims to provide a solid foundation for advancing deeper exploration of MAS design principles.

#29 Efficient Provably Secure Linguistic Steganography via Range Coding

著者: Ruiyi Yan, Yugo Murawaki

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.08052

要約:
Linguistic steganography involves embedding secret messages within seemingly innocuous texts to enable covert communication. Provable security, which is a long-standing goal and key motivation, has been extended to language-model-based steganography. Previous provably secure approaches have achieved perfect imperceptibility, measured by zero Kullback-Leibler (KL) divergence, but at the expense of embedding capacity. In this paper, we attempt to directly use a classic entropy coding method (range coding) to achieve secure steganography, and then propose an efficient and provably secure linguistic steganographic method with a rotation mechanism. Experiments across various language models show that our method achieves around 100% entropy utilization (embedding efficiency) for embedding capacity, outperforming the existing baseline methods. Moreover, it achieves high embedding speeds (up to 1554.66 bits/s on GPT-2). The code is available at github.com/ryehr/RRC_steganography.

#30 ACF: A Collaborative Framework for Agent Covert Communication under Cognitive Asymmetry

agent

著者: Wansheng Wu, Kaibo Huang, Yukun Wei, Zhongliang Yang, Linna Zhou

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.08276

要約:
As generative artificial intelligence evolves, autonomous agent networks present a powerful paradigm for interactive covert communication. However, because agents dynamically update internal memories via environmental interactions, existing methods face a critical structural vulnerability: cognitive asymmetry. Conventional approaches demand strict cognitive symmetry, requiring identical sequence prefixes between the encoder and decoder. In dynamic deployments, inevitable prefix discrepancies destroy synchronization, inducing severe channel degradation. To address this core challenge of cognitive asymmetry, we propose the Asymmetric Collaborative Framework (ACF), which structurally decouples covert communication from semantic reasoning via orthogonal statistical and cognitive layers. By deploying a prefix-independent decoding paradigm governed by a shared steganographic configuration, ACF eliminates the reliance on cognitive symmetry. Evaluations on realistic memory-augmented workflows demonstrate that under severe cognitive asymmetry, symmetric baselines suffer severe channel degradation, whereas ACF uniquely excels across both semantic fidelity and covert communication. It maintains computational indistinguishability, enabling reliable secret extraction with provable error bounds, and providing robust Effective Information Capacity guarantees for modern agent networks.

#31 VCAO: Verifier-Centered Agentic Orchestration for Strategic OS Vulnerability Discovery

agent

著者: Suyash Mishra

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.08291

要約:
We formulate operating-system vulnerability discovery as a \emph{repeated Bayesian Stackelberg search game} in which a Large Reasoning Model (LRM) orchestrator allocates analysis budget across kernel files, functions, and attack paths while external verifiers -- static analyzers, fuzzers, and sanitizers -- provide evidence. At each round, the orchestrator selects a target component, an analysis method, and a time budget; observes tool outputs; updates Bayesian beliefs over latent vulnerability states; and re-solves the game to minimize the strategic attacker's expected payoff. We introduce \textsc{VCAO} (\textbf{V}erifier-\textbf{C}entered \textbf{A}gentic \textbf{O}rchestration), a six-layer architecture comprising surface mapping, intra-kernel attack-graph construction, game-theoretic file/function ranking, parallel executor agents, cascaded verification, and a safety governor. Our DOBSS-derived MILP allocates budget optimally across heterogeneous analysis tools under resource constraints, with formal $\tilde{O}(\sqrt{T})$ regret bounds from online Stackelberg learning. Experiments on five Linux kernel subsystems -- replaying 847 historical CVEs and running live discovery on upstream snapshots -- show that \textsc{VCAO} discovers $2.7\times$ more validated vulnerabilities per unit budget than coverage-only fuzzing, $1.9\times$ more than static-analysis-only baselines, and $1.4\times$ more than non-game-theoretic multi-agent pipelines, while reducing false-positive rates reaching human reviewers by 68\%. We release our simulation framework, synthetic attack-graph generator, and evaluation harness as open-source artifacts.

#32 Security Concerns in Generative AI Coding Assistants: Insights from Online Discussions on GitHub Copilot

著者: Nicol\'as E. D\'iaz Ferreyra, Monika Swetha Gurupathi, Zadia Codabux, Nalin Arachchilage, Riccardo Scandariato

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.08352

要約:
Generative Artificial Intelligence (GenAI) has become a central component of many development tools (e.g., GitHub Copilot) that support software practitioners across multiple programming tasks, including code completion, documentation, and bug detection. However, current research has identified significant limitations and open issues in GenAI, including reliability, non-determinism, bias, and copyright infringement. While prior work has primarily focused on assessing the technical performance of these technologies for code generation, less attention has been paid to emerging concerns of software developers, particularly in the security realm. OBJECTIVE: This work explores security concerns regarding the use of GenAI-based coding assistants by analyzing challenges voiced by developers and software enthusiasts in public online forums. METHOD: We retrieved posts, comments, and discussion threads addressing security issues in GitHub Copilot from three popular platforms, namely Stack Overflow, Reddit, and Hacker News. These discussions were clustered using BERTopic and then synthesized using thematic analysis to identify distinct categories of security concerns. RESULTS: Four major concern areas were identified, including potential data leakage, code licensing, adversarial attacks (e.g., prompt injection), and insecure code suggestions, underscoring critical reflections on the limitations and trade-offs of GenAI in software engineering. IMPLICATIONS: Our findings contribute to a broader understanding of how developers perceive and engage with GenAI-based coding assistants, while highlighting key areas for improving their built-in security features.

#33 Vulnerability Detection with Interprocedural Context in Multiple Languages: Assessing Effectiveness and Cost of Modern LLMs

著者: Kevin Lira, Baldoino Fonseca, Davy Ba\'ia, M\'arcio Ribeiro, Wesley K. G. Assun\c{c}\~ao

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.08417

要約:
Large Language Models (LLMs) have been a promising way for automated vulnerability detection. However, most prior studies have explored the use of LLMs to detect vulnerabilities only within single functions, disregarding those related to interprocedural dependencies. These studies overlook vulnerabilities that arise from data and control flows that span multiple functions. Thus, leveraging the context provided by callers and callees may help identify vulnerabilities. This study empirically investigates the effectiveness of detection, the inference cost, and the quality of explanations of four modern LLMs (Claude Haiku 4.5, GPT-4.1 Mini, GPT-5 Mini, and Gemini 3 Flash) in detecting vulnerabilities related to interprocedural dependencies. To do that, we conducted an empirical study on 509 vulnerabilities from the ReposVul dataset, systematically varying the level of interprocedural context (target function code-only, target function + callers, and target function + callees) and evaluating the four modern LLMs across C, C++, and Python. The results show that Gemini 3 Flash offers the best cost-effectiveness trade-off for C vulnerabilities, achieving F1 >= 0.978 at an estimated cost of $0.50-$0.58 per configuration, and Claude Haiku 4.5 correctly identified and explained the vulnerability in 93.6% of the evaluated cases. Overall, the findings have direct implications for the design of AI-assisted security analysis tools that can generalize across codebases in multiple programming languages.

#34 Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction

著者: Yulin Chen, Haoran Li, Yuan Sui, Yue Liu, Yufei He, Xiaoling Bai, Chi Fei, Yabo Li, Haozhe Ma, Yangqiu Song, Bryan Hooi

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2504.20472

要約:
Large language models (LLMs) have demonstrated impressive performance and have come to dominate the field of natural language processing (NLP) across various tasks. However, due to their strong instruction-following capabilities and inability to distinguish between instructions and data content, LLMs are vulnerable to prompt injection attacks. These attacks manipulate LLMs into deviating from the original input instructions and executing maliciously injected instructions within data content, such as web documents retrieved from search engines. Existing defense methods, including prompt-engineering and fine-tuning approaches, typically instruct models to follow the original input instructions while suppressing their tendencies to execute injected instructions. However, our experiments reveal that suppressing instruction-following tendencies is challenging. Through analyzing failure cases, we observe that although LLMs tend to respond to any recognized instructions, they are aware of which specific instructions they are executing and can correctly reference them within the original prompt. Motivated by these findings, we propose a novel defense method that leverages, rather than suppresses, the instruction-following abilities of LLMs. Our approach prompts LLMs to generate responses that include both answers and their corresponding instruction references. Based on these references, we filter out answers not associated with the original input instructions. Comprehensive experiments demonstrate that our method outperforms prompt-engineering baselines and achieves performance comparable to fine-tuning methods, reducing the attack success rate (ASR) to 0 percent in some scenarios. Moreover, our approach has minimal impact on overall utility.

#35 One Shot Dominance: Knowledge Poisoning Attack on Retrieval-Augmented Generation Systems

backdoor

著者: Zhiyuan Chang, Mingyang Li, Xiaojun Jia, Junjie Wang, Yuekai Huang, Ziyou Jiang, Yang Liu, Qing Wang

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2505.11548

要約:
Large Language Models (LLMs) enhanced with Retrieval-Augmented Generation (RAG) have shown improved performance in generating accurate responses. However, the dependence on external knowledge bases introduces potential security vulnerabilities, particularly when these knowledge bases are publicly accessible and modifiable. While previous studies have exposed knowledge poisoning risks in RAG systems, existing attack methods suffer from critical limitations: they either require injecting multiple poisoned documents (resulting in poor stealthiness) or can only function effectively on simplistic queries (limiting real-world applicability). This paper reveals a more realistic knowledge poisoning attack against RAG systems that achieves successful attacks by poisoning only a single document while remaining effective for complex multi-hop questions involving complex relationships between multiple elements. Our proposed AuthChain address three challenges to ensure the poisoned documents are reliably retrieved and trusted by the LLM, even against large knowledge bases and LLM's own knowledge. Extensive experiments across six popular LLMs demonstrate that AuthChain achieves significantly higher attack success rates while maintaining superior stealthiness against RAG defense mechanisms compared to state-of-the-art baselines.

#36 Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test

著者: Xiaoyuan Zhu, Yaowen Ye, Tianyi Qiu, Hanlin Zhu, Sijun Tan, Ajraf Mannan, Jonathan Michala, Raluca Ada Popa, Willie Neiswanger

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2506.06975

要約:
As API access becomes a primary interface to large language models (LLMs), users often interact with black-box systems that offer little transparency into the deployed model. To reduce costs or maliciously alter model behaviors, API providers may discreetly serve quantized or fine-tuned variants, which can degrade performance and compromise safety. Detecting such substitutions is difficult, as users lack access to model weights and, in most cases, even output logits. To tackle this problem, we propose a rank-based uniformity test that can verify the behavioral equality of a black-box LLM to a locally deployed authentic model. Our method is accurate, query-efficient, and avoids detectable query patterns, making it robust to adversarial providers that reroute or mix responses upon the detection of testing attempts. We evaluate the approach across diverse threat scenarios, including quantization, harmful fine-tuning, jailbreak prompts, and full model substitution, showing that it consistently achieves superior statistical power over prior methods under constrained query budgets.

#37 Towards Effective Offensive Security LLM Agents: Hyperparameter Tuning, LLM as a Judge, and a Lightweight CTF Benchmark

agent

著者: Minghao Shao, Nanda Rani, Kimberly Milner, Haoran Xi, Meet Udeshi, Saksham Aggarwal, Venkata Sai Charan Putrevu, Sandeep Kumar Shukla, Prashanth Krishnamurthy, Farshad Khorrami, Ramesh Karri, Muhammad Shafique

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2508.05674

要約:
Recent advances in LLM agentic systems have improved the automation of offensive security tasks, particularly for Capture the Flag (CTF) challenges. We systematically investigate the key factors that drive agent success and provide a detailed recipe for building effective LLM-based offensive security agents. First, we present CTFJudge, a framework leveraging LLM as a judge to analyze agent trajectories and provide granular evaluation across CTF solving steps. Second, we propose a novel metric, CTF Competency Index (CCI) for partial correctness, revealing how closely agent solutions align with human-crafted gold standards. Third, we examine how LLM hyperparameters, namely temperature, top-p, and maximum token length, influence agent performance and automated cybersecurity task planning. For rapid evaluation, we present CTFTiny, a curated benchmark of 50 representative CTF challenges across binary exploitation, web, reverse engineering, forensics, and cryptography. Our findings identify optimal multi-agent coordination settings and lay the groundwork for future LLM agent research in cybersecurity. We make CTFTiny open source to public https://github.com/NYU-LLM-CTF/CTFTiny along with CTFJudge on https://github.com/NYU-LLM-CTF/CTFJudge.

#38 Invisible to Humans, Triggered by Agents: Stealthy Jailbreak Attacks on Mobile Vision-Language Agents

agent

著者: Renhua Ding, Xiao Yang, Zhengwei Fang, Jun Luo, Kun He, Jun Zhu

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2510.07809

要約:
Large Vision-Language Models (LVLMs) empower autonomous mobile agents, yet their security under realistic mobile deployment constraints remains underexplored. While agents are vulnerable to visual prompt injections, stealthily executing such attacks without requiring system-level privileges remains challenging, as existing methods rely on persistent visual manipulations that are noticeable to users. We uncover a consistent discrepancy between human and agent interactions: automated agents generate near-zero contact touch signals. Building on this insight, we propose a new attack paradigm, agent-only perceptual injection, where malicious content is exposed only during agent interactions, while remaining not readily perceived by human users. To accommodate mobile UI constraints and one-shot interaction settings, we introduce HG-IDA*, an efficient one-shot optimization method for constructing jailbreak prompts that evade LVLM safety filters. Experiments demonstrate that our approach induces unauthorized cross-app actions, achieving 82.5% planning and 75.0% execution hijack rates on GPT-4o. Our findings highlight a previously underexplored attack surface in mobile agent systems and underscore the need for defenses that incorporate interaction-level signals.

#39 2G2T: Constant-Size, Statistically Sound MSM Outsourcing

著者: Majid Khabbazian

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2602.23464

要約:
Multi-scalar multiplication (MSM), MSM(vec{P},vec{x}) = sum_{i=1}^n x_i P_i, is a dominant computational kernel in discrete-logarithm-based cryptography and often becomes a bottleneck for verifiers and other resource-constrained clients. We present 2G2T, a simple protocol for verifiably outsourcing MSM to an untrusted server. 2G2T is efficient for both parties: the server performs only two MSM computations and returns only two group elements to the client, namely the claimed result A = MSM(vec{P},vec{x}) and an auxiliary group element B. Client-side verification consists of a single length-n field inner product and only three group operations (two scalar multiplications and one group addition). In our Ristretto255 implementation, verification is up to about 300x faster than computing the MSM locally using a highly optimized MSM routine (for n up to 2^18). Moreover, 2G2T enables latency-hiding verification: nearly all verifier work can be performed while waiting for the server's response, so once (A,B) arrives the verifier completes the check with only one scalar multiplication and one group addition (both independent of n). Finally, despite its simplicity and efficiency, we prove that 2G2T achieves statistical soundness: for any (even unbounded) adversarial server, the probability of accepting an incorrect result is at most 1/q per query, and at most e/q over e adaptive executions, in a prime-order group of size q.

#40 On secret sharing from extended norm-trace curves

著者: Olav Geil

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.14009

要約:
In [4] Camps-Moreno et al. treated (relative) generalized Hamming weights of codes from extended norm-trace curves and they gave examples of resulting good asymmetric quantum error-correcting codes employing information on the relative distances. In the present paper we study ramp secret sharing schemes which are objects that require an analysis of higher relative weights and we show that not only do schemes defined from one-point algebraic geometric codes from extended norm-trace curves have good parameters, they also posses a second layer of security along the lines of [11]. It is left undecided in [4, page 2889] if the ``footprint-like approach'' as employed by Camps-Moreno herein is strictly better for codes related to extended norm-trace codes than the general approach for treating one-point algebraic geometric codes and their likes as presented in [12]. We demonstrate that the method used in [4] to estimate (relative) generalized Hamming weights of codes from extended norm-trace curves can be viewed as a clever application of the enhanced Goppa bound in [12] rather than a competing approach.

#41 Security and Privacy in Virtual and Robotic Assistive Systems: A Comparative Framework

privacy

著者: Nelly Elsayed

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.29907

要約:
Assistive technologies increasingly support independence, accessibility, and safety for older adults, people with disabilities, and individuals requiring continuous care. Two major categories are virtual assistive systems and robotic assistive systems operating in physical environments. Although both offer significant benefits, they introduce important security and privacy risks due to their reliance on artificial intelligence, network connectivity, and sensor-based perception. Virtual systems are primarily exposed to threats involving data privacy, unauthorized access, and adversarial voice manipulation. In contrast, robotic systems introduce additional cyber-physical risks such as sensor spoofing, perception manipulation, command injection, and physical safety hazards. In this paper, we present a comparative analysis of security and privacy challenges across these systems. We develop a unified comparative threat-modeling framework that enables structured analysis of attack surfaces, risk profiles, and safety implications across both systems. Moreover, we provide design recommendations for developing secure, privacy-preserving, and trustworthy assistive technologies.

#42 Blockchain and AI: Securing Intelligent Networks for the Future

著者: Joy Dutta, Hossien B. Eldeeb, Tu Dac Ho

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.06323

要約:
Blockchain and artificial intelligence (AI) are increasingly proposed together for securing intelligent networks, but the literature remains fragmented across ledger design, AI-driven detection, cyber-physical applications, and emerging agentic workflows. This paper synthesizes the area through three reusable contributions: (i) a taxonomy of blockchain-AI security for intelligent networks, (ii) integration patterns for verifiable and adaptive security workflows, and (iii) the Blockchain-AI Security Evaluation Blueprint (BASE), a reporting checklist spanning AI quality, ledger behavior, end-to-end service levels, privacy, energy, and reproducibility. The paper also maps the evidence landscape across IoT, critical infrastructure, smart grids, transportation, and healthcare, showing that the conceptual fit is strong but real-world evidence remains uneven and often prototype-heavy. The synthesis clarifies where blockchain contributes provenance, trust, and auditability, where AI contributes detection, adaptation, and orchestration, and where future work should focus on interoperable interfaces, privacy-preserving analytics, bounded agentic automation, and open cross-domain benchmarks. The paper is intended as a reference for researchers and practitioners designing secure, transparent, and resilient intelligent networks.

#43 The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?

著者: Manish Bhatt, Sarthak Munshi, Vineeth Sai Narajala, Idan Habler, Ammar Al-Kahfah, Ken Huang, Joel Webb, Blake Gatto

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.06436

要約:
We prove that no continuous, utility-preserving wrapper defense-a function $D: X\to X$ that preprocesses inputs before the model sees them-can make all outputs strictly safe for a language model with connected prompt space, and we characterize exactly where every such defense must fail. We establish three results under successively stronger hypotheses: boundary fixation-the defense must leave some threshold-level inputs unchanged; an $\epsilon$-robust constraint-under Lipschitz regularity, a positive-measure band around fixed boundary points remains near-threshold; and a persistent unsafe region under a transversality condition, a positive-measure subset of inputs remains strictly unsafe. These constitute a defense trilemma: continuity, utility preservation, and completeness cannot coexist. We prove parallel discrete results requiring no topology, and extend to multi-turn interactions, stochastic defenses, and capacity-parity settings. The results do not preclude training-time alignment, architectural changes, or defenses that sacrifice utility. The full theory is mechanically verified in Lean 4 and validated empirically on three LLMs.

#44 PoC-Adapt: Semantic-Aware Automated Vulnerability Reproduction with LLM Multi-Agents and Reinforcement Learning-Driven Adaptive Policy

agent

著者: Phan The Duy, Khoa Ngo-Khanh, Nguyen Huu Quyen, Van-Hau Pham

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.06618

要約:
While recent approaches leverage large language models (LLMs) and multi-agent pipelines to automatically generate proof-of-concept (PoC) exploits from vulnerability reports, existing systems often suffer from two fundamental limitations: unreliable validation based on surface-level execution signals and high operational cost caused by extensive trial-and-error during exploit generation. In this paper, we present PoC-Adapt, an end-to-end framework for automated PoC generation and verification, architected upon a foundation semantic runtime validation and adaptive policy learning. At the core of PoC-Adapt is a Semantic Oracle that validates exploits by comparing structured pre- and post-execution system states, enabling reliable distinction between true vulnerability exploitation and incidental behavioral changes. To reduce exploration cost, we further introduce an Adaptive Policy Learning mechanism that learns an exploitation policy over semantic states and actions, guiding the exploit agent toward effective strategies with fewer failed attempts. PoC-Adapt is implemented as a multi-agent system comprising specialized agents for root cause analysis, environment building, exploit generation, and semantic validation, coordinated through structured feedback loops. Experimenting on the CWE-Bench-Java and PrimeVul benchmarks shows that PoC-Adapt significantly improves verification reliability by 25% and reduces exploit generation cost compared to prior LLM-based systems, highlighting the importance of semantic validation and learned action policies in automated vulnerability reproduction. Applied to the latest CVE corpus, PoC-Adapt confirmed 12 verified PoC out of 80 reproduce attempts at a cost of $0.42 per generated exploit

#45 On the Direct Construction of MDS and Near-MDS Matrices

著者: Kishan Chand Gupta, Sumit Kumar Pandey, Susanta Samanta

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2306.12848

要約:
The optimal branch number of MDS matrices makes them a preferred choice for designing diffusion layers in many block ciphers and hash functions. Consequently, various methods have been proposed for designing MDS matrices, including search and direct methods. While exhaustive search is suitable for small order MDS matrices, direct constructions are preferred for larger orders due to the vast search space involved. In the literature, there has been extensive research on the direct construction of MDS matrices using both recursive and nonrecursive methods. On the other hand, in lightweight cryptography, Near-MDS (NMDS) matrices with sub-optimal branch numbers offer a better balance between security and efficiency as a diffusion layer compared to MDS matrices. However, no direct construction method is available in the literature for constructing recursive NMDS matrices. This paper introduces some direct constructions of NMDS matrices in both nonrecursive and recursive settings. Additionally, it presents some direct constructions of nonrecursive MDS matrices from the generalized Vandermonde matrices. We propose a method for constructing involutory MDS and NMDS matrices using generalized Vandermonde matrices. Furthermore, we prove some folklore results that are used in the literature related to the NMDS code.

#46 BoBa: Boosting Backdoor Detection through Data Distribution Inference in Federated Learning

backdoor

著者: Zhengyuan Jiang, Xingyu Lyu, Shanghao Shi, Yang Xiao, Yimin Chen, Y. Thomas Hou, Wenjing Lou, Ning Wanga

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2407.09658

要約:
Federated learning, while being a promising approach for collaborative model training, is susceptible to backdoor attacks due to its decentralized nature. Backdoor attacks have shown remarkable stealthiness, as they compromise model predictions only when inputs contain specific triggers. As a countermeasure, anomaly detection is widely used to filter out backdoor attacks in FL. However, the non-independent and identically distributed (non-IID) data distribution nature of FL clients presents substantial challenges in backdoor attack detection, as the data variety introduces variance among benign models, making them indistinguishable from malicious ones. In this work, we propose a novel distribution-aware backdoor detection mechanism, BoBa, to address this problem. To differentiate outliers arising from data variety versus backdoor attacks, we propose to break down the problem into two steps: clustering clients utilizing their data distribution, and followed by a voting-based detection. We propose a novel data distribution inference mechanism for accurate data distribution estimation. To improve detection robustness, we introduce an overlapping clustering method, where each client is associated with multiple clusters, ensuring that the trustworthiness of a model update is assessed collectively by multiple clusters rather than a single cluster. Through extensive evaluations, we demonstrate that BoBa can reduce the attack success rate to lower than 0.001 while maintaining high main task accuracy across various attack strategies and experimental settings.

#47 Optimal Debiased Inference on Privatized Data via Indirect Estimation and Parametric Bootstrap

著者: Zhanyu Wang, Arin Chang, Jordan Awan

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2507.10746

要約:
We design a debiased parametric bootstrap framework for statistical inference from differentially private data. Existing usage of the parametric bootstrap on privatized data ignored or avoided handling possible biases introduced by the privacy mechanism, such as by clamping, a technique employed by the majority of privacy mechanisms. Ignoring these biases leads to under-coverage of confidence intervals and miscalibrated type I errors of hypothesis tests, due to the inconsistency of parameter estimates based on the privatized data. We propose using the indirect inference method to estimate the parameter values consistently, and we use the improved estimator in parametric bootstrap for inference. To implement the indirect estimator, we present a novel simulation-based, adaptive approach along with the theory that establishes the consistency of the corresponding parametric bootstrap estimates, confidence intervals, and hypothesis tests. In particular, we prove that our adaptive indirect estimator achieves the minimum asymptotic variance among all ``well-behaved'' consistent estimators based on the released summary statistic. Our simulation studies show that our framework produces confidence intervals with well-calibrated coverage and performs hypothesis testing with the correct type I error, giving state-of-the-art performance for inference in several settings.

#48 How Do Data Owners Say No? A Case Study of Data Consent Mechanisms in Web-Scraped Vision-Language AI Training Datasets

著者: Chung Peng Lee, Rachel Hong, Harry H. Jiang, Aster Plotnik, William Agnew, Jamie Morgenstern

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2511.08637

要約:
The internet has become the main source of data to train modern text-to-image or vision-language models, yet it is increasingly unclear whether web-scale data collection practices for training AI systems adequately respect data owners' wishes. Ignoring the owner's indication of consent around data usage not only raises ethical concerns but also has recently been elevated into lawsuits around copyright infringement cases. In this work, we aim to reveal information about data owners' consent to AI scraping and training, and study how it's expressed in DataComp, a popular dataset of 12.8 billion text-image pairs. We examine both the sample-level information, including the copyright notice, watermarking, and metadata, and the web-domain-level information, such as a site's Terms of Service (ToS) and Robots Exclusion Protocol. We estimate at least 122M of samples exhibit some indication of copyright notice in CommonPool, and find that 60\% of the samples in the top 50 domains come from websites with ToS that prohibit scraping. Furthermore, we estimate 9-13\% with 95\% confidence interval of samples from CommonPool to contain watermarks, where existing watermark detection methods fail to capture them in high fidelity. Our holistic methods and findings show that data owners rely on various channels to convey data consent, of which current AI data collection pipelines do not entirely respect. These findings highlight the limitations of the current dataset curation/release practice and the need for a unified data consent framework taking AI purposes into consideration.

#49 Know Thy Enemy: Securing LLMs Against Prompt Injection via Diverse Data Synthesis and Instruction-Level Chain-of-Thought Learning

synthetic data

著者: Zhiyuan Chang, Mingyang Li, Yuekai Huang, Ziyou Jiang, Xiaojun Jia, Qian Xiong, Junjie Wang, Zhaoyang Li, Qing Wang

公開日: Fri, 10 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2601.04666

要約:
Large language model (LLM)-integrated applications have become increasingly prevalent, yet face critical security vulnerabilities from prompt injection (PI) attacks. Defending against PI attacks faces two major issues: malicious instructions can be injected through diverse vectors, and injected instructions often lack clear semantic boundaries from the surrounding context, making them difficult to identify. To address these issues, we propose InstruCoT, a model enhancement method for PI defense that synthesizes diverse training data and employs instruction-level chain-of-thought fine-tuning, enabling LLMs to effectively identify and reject malicious instructions regardless of their source or position in the context. We evaluate InstruCoT across three critical dimensions: Behavior Deviation, Privacy Leakage, and Harmful Output. Experimental results across four LLMs demonstrate that InstruCoT significantly outperforms baselines in all dimensions while maintaining utility performance without degradation

cs.CR updates on arXiv.org

📋 論文タイトル一覧