arXiv論文一覧 - cs.CR updates on arXiv.org

#1 AI Kill Switch for malicious web-based LLM agent

agent

著者: Sechan Lee, Sangdon Park

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.13725

要約:
Recently, web-based Large Language Model (LLM) agents autonomously perform increasingly complex tasks, thereby bringing significant convenience. However, they also amplify the risks of malicious misuse cases such as unauthorized collection of personally identifiable information (PII), generation of socially divisive content, and even automated web hacking. To address these threats, we propose an AI Kill Switch technique that can immediately halt the operation of malicious web-based LLM agents. To achieve this, we introduce AutoGuard - the key idea is generating defensive prompts that trigger the safety mechanisms of malicious LLM agents. In particular, generated defense prompts are transparently embedded into the website's DOM so that they remain invisible to human users but can be detected by the crawling process of malicious agents, triggering its internal safety mechanisms to abort malicious actions once read. To evaluate our approach, we constructed a dedicated benchmark consisting of three representative malicious scenarios (PII collection, social rift content generation, and web hacking attempts). Experimental results show that the AutoGuard method achieves over 80% Defense Success Rate (DSR) on malicious agents, including GPT-4o, Claude-3, and Llama3.3-70B-Instruct. It also maintains strong performance, achieving around 90% DSR on GPT-5, GPT-4.1, and Gemini-2.5-Flash when used as the malicious agent, demonstrating robust generalization across models and scenarios. Through this research, we have demonstrated the controllability of web-based LLM agents across various scenarios and models, thereby contributing to the broader effort of AI control and safety.

#2 ExplainableGuard: Interpretable Adversarial Defense for Large Language Models Using Chain-of-Thought Reasoning

著者: Shaowei Guan, Yu Zhai, Zhengyu Zhang, Yanze Wang, Hin Chi Kwok

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.13771

要約:
Large Language Models (LLMs) are increasingly vulnerable to adversarial attacks that can subtly manipulate their outputs. While various defense mechanisms have been proposed, many operate as black boxes, lacking transparency in their decision-making. This paper introduces ExplainableGuard, an interpretable adversarial defense framework leveraging the chain-of-thought (CoT) reasoning capabilities of DeepSeek-Reasoner. Our approach not only detects and neutralizes adversarial perturbations in text but also provides step-by-step explanations for each defense action. We demonstrate how tailored CoT prompts guide the LLM to perform a multi-faceted analysis (character, word, structural, and semantic) and generate a purified output along with a human-readable justification. Preliminary results on the GLUE Benchmark and IMDB Movie Reviews dataset show promising defense efficacy. Additionally, a human evaluation study reveals that ExplainableGuard's explanations outperform ablated variants in clarity, specificity, and actionability, with a 72.5% deployability-trust rating, underscoring its potential for more trustworthy LLM deployments.

#3 Hashpower allocation in Pay-per-Share blockchain mining pools

著者: Pierre-Olivier Goffard, Hansjoerg Albrecher, Jean-Pierre Fouque

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.13777

要約:
Mining blocks in a blockchain using the \textit{Proof-of-Work} consensus protocol involves significant risk, as network participants face continuous operational costs while earning infrequent capital gains upon successfully mining a block. A common risk mitigation strategy is to join a mining pool, which combines the computing resources of multiple miners to provide a more stable income. This article examines a Pay-per-Share (PPS) reward system, where the pool manager can adjust both the share difficulty and the management fee. Using a simplified wealth model for miners, we explore how miners should allocate their computing resources among different mining pools, considering the trade-off between risk transfer to the manager and management fees.

#4 Human-Centered Threat Modeling in Practice: Lessons, Challenges, and Paths Forward

著者: Warda Usman, Yixin Zou, Daniel Zappala

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.13781

要約:
Human-centered threat modeling (HCTM) is an emerging area within security and privacy research that focuses on how people define and navigate threats in various social, cultural, and technological contexts. While researchers increasingly approach threat modeling from a human-centered perspective, little is known about how they prepare for and engage with HCTM in practice. In this work, we conduct 23 semi-structured interviews with researchers to examine the state of HCTM, including how researchers design studies, elicit threats, and navigate values, constraints, and long-term goals. We find that HCTM is not a prescriptive process but a set of evolving practices shaped by relationships with participants, disciplinary backgrounds, and institutional structures. Researchers approach threat modeling through sustained groundwork and participant-centered inquiry, guided by values such as care, justice, and autonomy. They also face challenges including emotional strain, ethical dilemmas, and structural barriers that complicate efforts to translate findings into real-world impact. We conclude by identifying opportunities to advance HCTM through shared infrastructure, broader recognition of diverse contributions, and stronger mechanisms for translating findings into policy, design, and societal change.

#5 Uncovering and Aligning Anomalous Attention Heads to Defend Against NLP Backdoor Attacks

backdoor

著者: Haotian Jin, Yang Li, Haihui Fan, Lin Shen, Xiangfang Li, Bo Li

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.13789

要約:
Backdoor attacks pose a serious threat to the security of large language models (LLMs), causing them to exhibit anomalous behavior under specific trigger conditions. The design of backdoor triggers has evolved from fixed triggers to dynamic or implicit triggers. This increased flexibility in trigger design makes it challenging for defenders to identify their specific forms accurately. Most existing backdoor defense methods are limited to specific types of triggers or rely on an additional clean model for support. To address this issue, we propose a backdoor detection method based on attention similarity, enabling backdoor detection without prior knowledge of the trigger. Our study reveals that models subjected to backdoor attacks exhibit unusually high similarity among attention heads when exposed to triggers. Based on this observation, we propose an attention safety alignment approach combined with head-wise fine-tuning to rectify potentially contaminated attention heads, thereby effectively mitigating the impact of backdoor attacks. Extensive experimental results demonstrate that our method significantly reduces the success rate of backdoor attacks while preserving the model's performance on downstream tasks.

#6 Zipf-Gramming: Scaling Byte N-Grams Up to Production Sized Malware Corpora

著者: Edward Raff, Ryan R. Curtin, Derek Everett, Robert J. Joyce, James Holt

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.13808

要約:
A classifier using byte n-grams as features is the only approach we have found fast enough to meet requirements in size (sub 2 MB), speed (multiple GB/s), and latency (sub 10 ms) for deployment in numerous malware detection scenarios. However, we've consistently found that 6-8 grams achieve the best accuracy on our production deployments but have been unable to deploy regularly updated models due to the high cost of finding the top-k most frequent n-grams over terabytes of executable programs. Because the Zipfian distribution well models the distribution of n-grams, we exploit its properties to develop a new top-k n-gram extractor that is up to $35\times$ faster than the previous best alternative. Using our new Zipf-Gramming algorithm, we are able to scale up our production training set and obtain up to 30\% improvement in AUC at detecting new malware. We show theoretically and empirically that our approach will select the top-k items with little error and the interplay between theory and engineering required to achieve these results.

#7 The Battle of Metasurfaces: Understanding Security in Smart Radio Environments

著者: Paul Staat, Christof Paar, Swarun Kumar

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.13939

要約:
Metasurfaces, or Reconfigurable Intelligent Surfaces (RISs), have emerged as a transformative technology for next-generation wireless systems, enabling digitally controlled manipulation of electromagnetic wave propagation. By turning the traditionally passive radio environment into a smart, programmable medium, metasurfaces promise advances in communication and sensing. However, metasurfaces also present a new security frontier: both attackers and defenders can exploit them to alter wireless propagation for their own advantage. While prior security research has primarily explored unilateral metasurface applications - empowering either attackers or defenders - this work investigates symmetric scenarios, where both sides possess comparable metasurface capabilities. Using both theoretical modeling and real-world experiments, we analyze how competing metasurfaces interact for diverse objectives, including signal power and sensing perception. Thereby, we present the first systematic study of context-agnostic metasurface-to-metasurface interactions and their implications for wireless security. Our results reveal that the outcome of metasurface "battles" depends on an interplay of timing, placement, algorithmic strategy, and hardware scale. Across multiple case studies in Wi-Fi environments, including wireless jamming, channel obfuscation for sensing and communication, and sensing spoofing, we demonstrate that opposing metasurfaces can substantially or fully negate each other's effects. By undermining previously proposed security and privacy schemes, our findings open new opportunities for designing resilient and high-assurance physical-layer systems in smart radio environments.

#8 Privis: Towards Content-Aware Secure Volumetric Video Delivery

著者: Kaiyuan Hu, Hong Kang, Yili Jin, Junhua Liu, Chengming Hu, Haolun Wu, Xue Liu

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14005

要約:
Volumetric video has emerged as a key paradigm in eXtended Reality (XR) and immersive multimedia because it enables highly interactive, spatially consistent 3D experiences. However, the transport-layer security for such 3D content remains largely unaddressed. Existing volumetric streaming pipelines inherit uniform encryption schemes from 2D video, overlooking the heterogeneous privacy sensitivity of different geometry and the strict motion-to-photon latency constraints of real-time XR. We take an initial step toward content-aware secure volumetric video delivery by introducing Privis, a saliency-guided transport framework that (i) partitions volumetric assets into independent units, (ii) applies lightweight authenticated encryption with adaptive key rotation, and (iii) employs selective traffic shaping to balance confidentiality and low latency. Privis specifies a generalized transport-layer security architecture for volumetric media, defining core abstractions and adaptive protection mechanisms. We further explore a prototype implementation and present initial latency measurements to illustrate feasibility and design tradeoffs, providing early empirical guidance toward future work on real-time, saliency-conditioned secure delivery.

#9 Location-Dependent Cryptosystem

著者: Kunal Mukherjee

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14032

要約:
Digital content distribution and proprietary research-driven industries face persistent risks from intellectual property theft and unauthorized redistribution. Conventional encryption schemes such as AES, TDES, ECC, and ElGamal provide strong cryptographic guarantees, but they remain fundamentally agnostic to where decryption takes place.In practice, this means that once a decryption key is leaked or intercepted, any adversary can misuse the key to decrypt the protected content from any location. We present a location-dependent cryptosystem in which the decryption key is not transmitted as human- or machine-readable data, but implicitly encoded in precise time-of-flight differences of ultra-wideband (UWB) data transmission packets. The system leverages precise timing hardware and a custom JMTK protocol to map a SHA-256 hashed AES key onto scheduled transmission timestamps. Only receivers located within a predefined spatial region can observe the packet timings that align with the intended "time slot" pattern, enabling them to reconstruct the key and decrypt the secret. Receivers outside the authorized region observe incorrect keys. We implement a complete prototype that encrypts and transmits audio data using our cryptosystem, and only when the receiver is within the authorized data, they are able to decrypt the data. Our evaluation demonstrates that the system (i) removes the need to share decryption passwords electronically or physically, (ii) ensures the decryption key cannot be recovered by the eavesdropper, and (iii) provides a non-trivial spatial tolerance for legitimate users.

#10 GRPO Privacy Is at Risk: A Membership Inference Attack Against Reinforcement Learning With Verifiable Rewards

privacy

著者: Yule Liu, Heyi Zhang, Jinyi Zheng, Zhen Sun, Zifan Peng, Tianshuo Cong, Yilong Yang, Xinlei He, Zhuo Ma

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14045

要約:
Membership inference attacks (MIAs) on large language models (LLMs) pose significant privacy risks across various stages of model training. Recent advances in Reinforcement Learning with Verifiable Rewards (RLVR) have brought a profound paradigm shift in LLM training, particularly for complex reasoning tasks. However, the on-policy nature of RLVR introduces a unique privacy leakage pattern: since training relies on self-generated responses without fixed ground-truth outputs, membership inference must now determine whether a given prompt (independent of any specific response) is used during fine-tuning. This creates a threat where leakage arises not from answer memorization. To audit this novel privacy risk, we propose Divergence-in-Behavior Attack (DIBA), the first membership inference framework specifically designed for RLVR. DIBA shifts the focus from memorization to behavioral change, leveraging measurable shifts in model behavior across two axes: advantage-side improvement (e.g., correctness gain) and logit-side divergence (e.g., policy drift). Through comprehensive evaluations, we demonstrate that DIBA significantly outperforms existing baselines, achieving around 0.8 AUC and an order-of-magnitude higher TPR@0.1%FPR. We validate DIBA's superiority across multiple settings--including in-distribution, cross-dataset, cross-algorithm, black-box scenarios, and extensions to vision-language models. Furthermore, our attack remains robust under moderate defensive measures. To the best of our knowledge, this is the first work to systematically analyze privacy vulnerabilities in RLVR, revealing that even in the absence of explicit supervision, training data exposure can be reliably inferred through behavioral traces.

#11 Dynamic Black-box Backdoor Attacks on IoT Sensory Data

backdoor

著者: Ajesh Koyatan Chathoth, Stephen Lee

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14074

要約:
Sensor data-based recognition systems are widely used in various applications, such as gait-based authentication and human activity recognition (HAR). Modern wearable and smart devices feature various built-in Inertial Measurement Unit (IMU) sensors, and such sensor-based measurements can be fed to a machine learning-based model to train and classify human activities. While deep learning-based models have proven successful in classifying human activity and gestures, they pose various security risks. In our paper, we discuss a novel dynamic trigger-generation technique for performing black-box adversarial attacks on sensor data-based IoT systems. Our empirical analysis shows that the attack is successful on various datasets and classifier models with minimal perturbation on the input data. We also provide a detailed comparative analysis of performance and stealthiness to various other poisoning techniques found in backdoor attacks. We also discuss some adversarial defense mechanisms and their impact on the effectiveness of our trigger-generation technique.

#12 Resolving Availability and Run-time Integrity Conflicts in Real-Time Embedded Systems

著者: Adam Caulfield, Muhammad Wasif Kamran, N. Asokan

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14088

要約:
Run-time integrity enforcement in real-time systems presents a fundamental conflict with availability. Existing approaches in real- time systems primarily focus on minimizing the execution-time overhead of monitoring. After a violation is detected, prior works face a trade-off: (1) prioritize availability and allow a compromised system to continue to ensure applications meet their deadlines, or (2) prioritize security by generating a fault to abort all execution. In this work, we propose PAIR, an approach that offers a middle ground between the stark extremes of this trade-off. PAIR monitors real-time tasks for run-time integrity violations and maintains an Availability Region (AR) of all tasks that are safe to continue. When a task causes a violation, PAIR triggers a non-maskable interrupt to kill the task and continue executing a non-violating task within AR. Thus, PAIR ensures only violating tasks are prevented from execution, while granting availability to remaining tasks. With its hardware approach, PAIR does not cause any run-time overhead to the executing tasks, integrates with real-time operating systems (RTOSs), and is affordable to low-end microcontroller units (MCUs) by incurring +2.3% overhead in memory and hardware usage.

#13 MalRAG: A Retrieval-Augmented LLM Framework for Open-set Malicious Traffic Identification

著者: Xiang Luo, Chang Liu, Gang Xiong, Chen Yang, Gaopeng Gou, Yaochen Ren, Zhen Li

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14129

要約:
Fine-grained identification of IDS-flagged suspicious traffic is crucial in cybersecurity. In practice, cyber threats evolve continuously, making the discovery of novel malicious traffic a critical necessity as well as the identification of known classes. Recent studies have advanced this goal with deep models, but they often rely on task-specific architectures that limit transferability and require per-dataset tuning. In this paper we introduce MalRAG, the first LLM driven retrieval-augmented framework for open-set malicious traffic identification. MalRAG freezes the LLM and operates via comprehensive traffic knowledge construction, adaptive retrieval, and prompt engineering. Concretely, we construct a multi-view traffic database by mining prior malicious traffic from content, structural, and temporal perspectives. Furthermore, we introduce a Coverage-Enhanced Retrieval Algorithm that queries across these views to assemble the most probable candidates, thereby improving the inclusion of correct evidence. We then employ Traffic-Aware Adaptive Pruning to select a variable subset of these candidates based on traffic-aware similarity scores, suppressing incorrect matches and yielding reliable retrieved evidence. Moreover, we develop a suite of guidance prompts where task instruction, evidence referencing, and decision guidance are integrated with the retrieved evidence to improve LLM performance. Across diverse real-world datasets and settings, MalRAG delivers state-of-the-art results in both fine-grained identification of known classes and novel malicious traffic discovery. Ablation and deep-dive analyses further show that MalRAG effective leverages LLM capabilities yet achieves open-set malicious traffic identification without relying on a specific LLM.

#14 A Fuzzy Logic-Based Cryptographic Framework For Real-Time Dynamic Key Generation For Enhanced Data Encryption

著者: Kavya Bhand, Payal Khubchandani, Jyoti Khubchandani

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14132

要約:
With the ever-growing demand for cybersecurity, static key encryption mechanisms are increasingly vulnerable to adversarial attacks due to their deterministic and non-adaptive nature. Brute-force attacks, key compromise, and unauthorized access have become highly common cyber threats. This research presents a novel fuzzy logic-based cryptographic framework that dynamically generates encryption keys in real-time by accessing system-level entropy and hardware-bound trust. The proposed system leverages a Fuzzy Inference System (FIS) to evaluate system parameters that include CPU utilization, process count, and timestamp variation. It assigns entropy level based on linguistically defined fuzzy rules which are fused with hardware-generated randomness and then securely sealed using a Trusted Platform Module (TPM). The sealed key is incorporated in an AES-GCM encryption scheme to ensure both confidentiality and integrity of the data. This system introduces a scalable solution for adaptive encryption in high-assurance computing, zero-trust environments, and cloud-based infrastructure.

#15 Beyond Fixed and Dynamic Prompts: Embedded Jailbreak Templates for Advancing LLM Security

著者: Hajun Kim, Hyunsik Na, Daeseon Choi

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14140

要約:
As the use of large language models (LLMs) continues to expand, ensuring their safety and robustness has become a critical challenge. In particular, jailbreak attacks that bypass built-in safety mechanisms are increasingly recognized as a tangible threat across industries, driving the need for diverse templates to support red-teaming efforts and strengthen defensive techniques. However, current approaches predominantly rely on two limited strategies: (i) substituting harmful queries into fixed templates, and (ii) having the LLM generate entire templates, which often compromises intent clarity and reproductibility. To address this gap, this paper introduces the Embedded Jailbreak Template, which preserves the structure of existing templates while naturally embedding harmful queries within their context. We further propose a progressive prompt-engineering methodology to ensure template quality and consistency, alongside standardized protocols for generation and evaluation. Together, these contributions provide a benchmark that more accurately reflects real-world usage scenarios and harmful intent, facilitating its application in red-teaming and policy regression testing.

#16 Steganographic Backdoor Attacks in NLP: Ultra-Low Poisoning and Defense Evasion

backdoor

著者: Eric Xue, Ruiyi Zhang, Zijun Zhang, Pengtao Xie

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14301

要約:
Transformer models are foundational to natural language processing (NLP) applications, yet remain vulnerable to backdoor attacks introduced through poisoned data, which implant hidden behaviors during training. To strengthen the ability to prevent such compromises, recent research has focused on designing increasingly stealthy attacks to stress-test existing defenses, pairing backdoor behaviors with stylized artifact or token-level perturbation triggers. However, this trend diverts attention from the harder and more realistic case: making the model respond to semantic triggers such as specific names or entities, where a successful backdoor could manipulate outputs tied to real people or events in deployed systems. Motivated by this growing disconnect, we introduce SteganoBackdoor, bringing stealth techniques back into line with practical threat models. Leveraging innocuous properties from natural-language steganography, SteganoBackdoor applies a gradient-guided data optimization process to transform semantic trigger seeds into steganographic carriers that embed a high backdoor payload, remain fluent, and exhibit no representational resemblance to the trigger. Across diverse experimental settings, SteganoBackdoor achieves over 99% attack success at an order-of-magnitude lower data-poisoning rate than prior approaches while maintaining unparalleled evasion against a comprehensive suite of data-level defenses. By revealing this practical and covert attack, SteganoBackdoor highlights an urgent blind spot in current defenses and demands immediate attention to adversarial data defenses and real-world threat modeling.

#17 Sigil: Server-Enforced Watermarking in U-Shaped Split Federated Learning via Gradient Injection

intellectual property

著者: Zhengchunmin Dai, Jiaxiong Tang, Peng Sun, Honglong Chen, Liantao Wu

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14422

要約:
In decentralized machine learning paradigms such as Split Federated Learning (SFL) and its variant U-shaped SFL, the server's capabilities are severely restricted. Although this enhances client-side privacy, it also leaves the server highly vulnerable to model theft by malicious clients. Ensuring intellectual property protection for such capability-limited servers presents a dual challenge: watermarking schemes that depend on client cooperation are unreliable in adversarial settings, whereas traditional server-side watermarking schemes are technically infeasible because the server lacks access to critical elements such as model parameters or labels. To address this challenge, this paper proposes Sigil, a mandatory watermarking framework designed specifically for capability-limited servers. Sigil defines the watermark as a statistical constraint on the server-visible activation space and embeds the watermark into the client model via gradient injection, without requiring any knowledge of the data. Besides, we design an adaptive gradient clipping mechanism to ensure that our watermarking process remains both mandatory and stealthy, effectively countering existing gradient anomaly detection methods and a specifically designed adaptive subspace removal attack. Extensive experiments on multiple datasets and models demonstrate Sigil's fidelity, robustness, and stealthiness.

#18 SecureSign: Bridging Security and UX in Mobile Web3 through Emulated EIP-6963 Sandboxing

著者: Charles Cheng Ji, Brandon Kong

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14611

要約:
Mobile Web3 faces catastrophic retention (< 5%) yielding effective acquisition costs of \$500 - \$1,000 per retained user. Existing solutions force an impossible tradeoff: embedded wallets achieve moderate usability but suffer inherent click-jacking vulnerabilities; app wallets maintain security at the cost of 2 - 3% retention due to download friction and context-switching penalties. We present SecureSign, a PWA-based architecture that adapts desktop browser extension security to mobile via EIP-6963 provider sandboxing. SecureSign isolates dApp execution in iframes within a trusted parent application, achieving click-jacking immunity and transaction integrity while enabling native mobile capabilities (push notifications, home screen installation, zero context-switching). Our drop-in SDK requires no codebase changes for existing Web3 applications. Threat model analysis demonstrates immunity to click-jacking, overlay, and skimming attacks while maintaining wallet interoperability across dApps.

#19 A Unified Compositional View of Attack Tree Metrics

著者: Benedikt Peterseim, Milan Lopuha\"a-Zwakenberg

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14717

要約:
Attack trees (ATs) are popular graphical models for reasoning about the security of complex systems, allowing for the quantification of risk through so-called AT metrics. A large variety of different such AT metrics have been proposed, and despite their wide-spread practical use, no systematic treatment of attack tree metrics so far is fully satisfactory. Existing approaches either fail to include important metrics, or they are too general to provide a useful systematic way for defining concrete AT metrics, giving only an abstract characterisation of their behaviour. We solve this problem by developing a compositional theory of ATs and their functorial semantics based on gs-monoidal categories. Viewing attack trees as string diagrams, we show that components of ATs form a channel category, a particular type of gs-monoidal category. AT metrics then correspond to functors of channel categories. This characterisation is both general enough to include all common AT metrics, and concrete enough to define AT metrics by their logical structure.

#20 Robustness of LLM-enabled vehicle trajectory prediction under data security threats

著者: Feilong Wang, Fuqiang Liu

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.13753

要約:
The integration of large language models (LLMs) into automated driving systems has opened new possibilities for reasoning and decision-making by transforming complex driving contexts into language-understandable representations. Recent studies demonstrate that fine-tuned LLMs can accurately predict vehicle trajectories and lane-change intentions by gathering and transforming data from surrounding vehicles. However, the robustness of such LLM-based prediction models for safety-critical driving systems remains unexplored, despite the increasing concerns about the trustworthiness of LLMs. This study addresses this gap by conducting a systematic vulnerability analysis of LLM-enabled vehicle trajectory prediction. We propose a one-feature differential evolution attack that perturbs a single kinematic feature of surrounding vehicles within the LLM's input prompts under a black-box setting. Experiments on the highD dataset reveal that even minor, physically plausible perturbations can significantly disrupt model outputs, underscoring the susceptibility of LLM-based predictors to adversarial manipulation. Further analyses reveal a trade-off between accuracy and robustness, examine the failure mechanism, and explore potential mitigation solutions. The findings provide the very first insights into adversarial vulnerabilities of LLM-driven automated vehicle models in the context of vehicular interactions and highlight the need for robustness-oriented design in future LLM-based intelligent transportation systems.

#21 Scaling Patterns in Adversarial Alignment: Evidence from Multi-LLM Jailbreak Experiments

著者: Samuel Nathanson, Rebecca Williams, Cynthia Matuszek

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.13788

要約:
Large language models (LLMs) increasingly operate in multi-agent and safety-critical settings, raising open questions about how their vulnerabilities scale when models interact adversarially. This study examines whether larger models can systematically jailbreak smaller ones - eliciting harmful or restricted behavior despite alignment safeguards. Using standardized adversarial tasks from JailbreakBench, we simulate over 6,000 multi-turn attacker-target exchanges across major LLM families and scales (0.6B-120B parameters), measuring both harm score and refusal behavior as indicators of adversarial potency and alignment integrity. Each interaction is evaluated through aggregated harm and refusal scores assigned by three independent LLM judges, providing a consistent, model-based measure of adversarial outcomes. Aggregating results across prompts, we find a strong and statistically significant correlation between mean harm and the logarithm of the attacker-to-target size ratio (Pearson r = 0.51, p < 0.001; Spearman rho = 0.52, p < 0.001), indicating that relative model size correlates with the likelihood and severity of harmful completions. Mean harm score variance is higher across attackers (0.18) than across targets (0.10), suggesting that attacker-side behavioral diversity contributes more to adversarial outcomes than target susceptibility. Attacker refusal frequency is strongly and negatively correlated with harm (rho = -0.93, p < 0.001), showing that attacker-side alignment mitigates harmful responses. These findings reveal that size asymmetry influences robustness and provide exploratory evidence for adversarial scaling patterns, motivating more controlled investigations into inter-model alignment and safety.

#22 Beyond One-Size-Fits-All: Neural Networks for Differentially Private Tabular Data Synthesis

privacysynthetic data

著者: Kai Chen, Chen Gong, Tianhao Wang

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.13893

要約:
In differentially private (DP) tabular data synthesis, the consensus is that statistical models are better than neural network (NN)-based methods. However, we argue that this conclusion is incomplete and overlooks the challenge of densely correlated datasets, where intricate dependencies can overwhelm statistical models. In such complex scenarios, neural networks are more suitable due to their capacity to fit complex distributions by learning directly from samples. Despite this potential, existing NN-based algorithms still suffer from significant limitations. We therefore propose MargNet, incorporating successful algorithmic designs of statistical models into neural networks. MargNet applies an adaptive marginal selection strategy and trains the neural networks to generate data that conforms to the selected marginals. On sparsely correlated datasets, our approach achieves utility close to the best statistical method while offering an average 7$\times$ speedup over it. More importantly, on densely correlated datasets, MargNet establishes a new state-of-the-art, reducing fidelity error by up to 26\% compared to the previous best. We release our code on GitHub.\footnote{https://github.com/KaiChen9909/margnet}

#23 On the Gradient Complexity of Private Optimization with Private Oracles

privacy

著者: Michael Menart, Aleksandar Nikolov

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.13999

要約:
We study the running time, in terms of first order oracle queries, of differentially private empirical/population risk minimization of Lipschitz convex losses. We first consider the setting where the loss is non-smooth and the optimizer interacts with a private proxy oracle, which sends only private messages about a minibatch of gradients. In this setting, we show that expected running time $\Omega(\min\{\frac{\sqrt{d}}{\alpha^2}, \frac{d}{\log(1/\alpha)}\})$ is necessary to achieve $\alpha$ excess risk on problems of dimension $d$ when $d \geq 1/\alpha^2$. Upper bounds via DP-SGD show these results are tight when $d>\tilde{\Omega}(1/\alpha^4)$. We further show our lower bound can be strengthened to $\Omega(\min\{\frac{d}{\bar{m}\alpha^2}, \frac{d}{\log(1/\alpha)} \})$ for algorithms which use minibatches of size at most $\bar{m} < \sqrt{d}$. We next consider smooth losses, where we relax the private oracle assumption and give lower bounds under only the condition that the optimizer is private. Here, we lower bound the expected number of first order oracle calls by $\tilde{\Omega}\big(\frac{\sqrt{d}}{\alpha} + \min\{\frac{1}{\alpha^2}, n\}\big)$, where $n$ is the size of the dataset. Modifications to existing algorithms show this bound is nearly tight. Compared to non-private lower bounds, our results show that differentially private optimizers pay a dimension dependent runtime penalty. Finally, as a natural extension of our proof technique, we show lower bounds in the non-smooth setting for optimizers interacting with information limited oracles. Specifically, if the proxy oracle transmits at most $\Gamma$-bits of information about the gradients in the minibatch, then $\Omega\big(\min\{\frac{d}{\alpha^2\Gamma}, \frac{d}{\log(1/\alpha)}\}\big)$ oracle calls are needed. This result shows fundamental limitations of gradient quantization techniques in optimization.

#24 Certified but Fooled! Breaking Certified Defences with Ghost Certificates

著者: Quoc Viet Vo, Tashreque M. Haq, Paul Montague, Tamas Abraham, Ehsan Abbasnejad, Damith C. Ranasinghe

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14003

要約:
Certified defenses promise provable robustness guarantees. We study the malicious exploitation of probabilistic certification frameworks to better understand the limits of guarantee provisions. Now, the objective is to not only mislead a classifier, but also manipulate the certification process to generate a robustness guarantee for an adversarial input certificate spoofing. A recent study in ICLR demonstrated that crafting large perturbations can shift inputs far into regions capable of generating a certificate for an incorrect class. Our study investigates if perturbations needed to cause a misclassification and yet coax a certified model into issuing a deceptive, large robustness radius for a target class can still be made small and imperceptible. We explore the idea of region-focused adversarial examples to craft imperceptible perturbations, spoof certificates and achieve certification radii larger than the source class ghost certificates. Extensive evaluations with the ImageNet demonstrate the ability to effectively bypass state-of-the-art certified defenses such as Densepure. Our work underscores the need to better understand the limits of robustness certification methods.

#25 Hardness of Range Avoidance and Proof Complexity Generators from Demi-Bits

著者: Hanlin Ren, Yichuan Wang, Yan Zhong

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14061

要約:
Given a circuit $G: \{0, 1\}^n \to \{0, 1\}^m$ with $m > n$, the *range avoidance* problem ($\text{Avoid}$) asks to output a string $y\in \{0, 1\}^m$ that is not in the range of $G$. Besides its profound connection to circuit complexity and explicit construction problems, this problem is also related to the existence of *proof complexity generators* -- circuits $G: \{0, 1\}^n \to \{0, 1\}^m$ where $m > n$ but for every $y\in \{0, 1\}^m$, it is infeasible to prove the statement "$y\not\in\mathrm{Range}(G)$" in a given propositional proof system. This paper connects these two problems with the existence of *demi-bits generators*, a fundamental cryptographic primitive against nondeterministic adversaries introduced by Rudich (RANDOM '97). $\bullet$ We show that the existence of demi-bits generators implies $\text{Avoid}$ is hard for nondeterministic algorithms. This resolves an open problem raised by Chen and Li (STOC '24). Furthermore, assuming the demi-hardness of certain LPN-style generators or Goldreich' PRG, we prove the hardness of $\text{Avoid}$ even when the instances are constant-degree polynomials over $\mathbb{F}_2$. $\bullet$ We show that the dual weak pigeonhole principle is unprovable in Cook's theory $\mathsf{PV}_1$ under the existence of demi-bits generators secure against $\mathbf{AM}$, thereby separating Jerabek's theory $\mathsf{APC}_1$ from $\mathsf{PV}_1$. $\bullet$ We transform demi-bits generators to proof complexity generators that are *pseudo-surjective* with nearly optimal parameters. Our constructions build on the recent breakthroughs on the hardness of $\text{Avoid}$ by Ilango, Li, and Williams (STOC '23) and Chen and Li (STOC '24). We use *randomness extractors* to significantly simplify the construction and the proof.

#26 Observational Auditing of Label Privacy

privacy

著者: Iden Kalemaj, Luca Melis, Maxime Boucher, Ilya Mironov, Saeed Mahloujifar

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14084

要約:
Differential privacy (DP) auditing is essential for evaluating privacy guarantees in machine learning systems. Existing auditing methods, however, pose a significant challenge for large-scale systems since they require modifying the training dataset -- for instance, by injecting out-of-distribution canaries or removing samples from training. Such interventions on the training data pipeline are resource-intensive and involve considerable engineering overhead. We introduce a novel observational auditing framework that leverages the inherent randomness of data distributions, enabling privacy evaluation without altering the original dataset. Our approach extends privacy auditing beyond traditional membership inference to protected attributes, with labels as a special case, addressing a key gap in existing techniques. We provide theoretical foundations for our method and perform experiments on Criteo and CIFAR-10 datasets that demonstrate its effectiveness in auditing label privacy guarantees. This work opens new avenues for practical privacy auditing in large-scale production environments.

#27 N-GLARE: An Non-Generative Latent Representation-Efficient LLM Safety Evaluator

著者: Zheyu Lin, Jirui Yang, Hengqi Guo, Yubing Bao, Yao Guan

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14195

要約:
Evaluating the safety robustness of LLMs is critical for their deployment. However, mainstream Red Teaming methods rely on online generation and black-box output analysis. These approaches are not only costly but also suffer from feedback latency, making them unsuitable for agile diagnostics after training a new model. To address this, we propose N-GLARE (A Non-Generative, Latent Representation-Efficient LLM Safety Evaluator). N-GLARE operates entirely on the model's latent representations, bypassing the need for full text generation. It characterizes hidden layer dynamics by analyzing the APT (Angular-Probabilistic Trajectory) of latent representations and introducing the JSS (Jensen-Shannon Separability) metric. Experiments on over 40 models and 20 red teaming strategies demonstrate that the JSS metric exhibits high consistency with the safety rankings derived from Red Teaming. N-GLARE reproduces the discriminative trends of large-scale red-teaming tests at less than 1\% of the token cost and the runtime cost, providing an efficient output-free evaluation proxy for real-time diagnostics.

#28 Watch Out for the Lifespan: Evaluating Backdoor Attacks Against Federated Model Adaptation

backdoor

著者: Bastien Vuillod, Pierre-Alain Moellic, Jean-Max Dutertre

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14406

要約:
Large models adaptation through Federated Learning (FL) addresses a wide range of use cases and is enabled by Parameter-Efficient Fine-Tuning techniques such as Low-Rank Adaptation (LoRA). However, this distributed learning paradigm faces several security threats, particularly to its integrity, such as backdoor attacks that aim to inject malicious behavior during the local training steps of certain clients. We present the first analysis of the influence of LoRA on state-of-the-art backdoor attacks targeting model adaptation in FL. Specifically, we focus on backdoor lifespan, a critical characteristic in FL, that can vary depending on the attack scenario and the attacker's ability to effectively inject the backdoor. A key finding in our experiments is that for an optimally injected backdoor, the backdoor persistence after the attack is longer when the LoRA's rank is lower. Importantly, our work highlights evaluation issues of backdoor attacks against FL and contributes to the development of more robust and fair evaluations of backdoor attacks, enhancing the reliability of risk assessments for critical FL systems. Our code is publicly available.

#29 Compression with Privacy-Preserving Random Access

privacy

著者: Venkat Chandar, Aslan Tchamkerten, Shashank Vatedka

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14524

要約:
It is shown that an i.i.d. binary source sequence $X_1, \ldots, X_n$ can be losslessly compressed at any rate above entropy such that the individual decoding of any $X_i$ reveals \emph{no} information about the other bits $\{X_j : j \neq i\}$.

#30 ForensicFlow: A Tri-Modal Adaptive Network for Robust Deepfake Detection

著者: Mohammad Romani

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14554

要約:
Deepfakes generated by advanced GANs and autoencoders severely threaten information integrity and societal stability. Single-stream CNNs fail to capture multi-scale forgery artifacts across spatial, texture, and frequency domains, limiting robustness and generalization. We introduce the ForensicFlow, a tri-modal forensic framework that synergistically fuses RGB, texture, and frequency evidence for video Deepfake detection. The RGB branch (ConvNeXt-tiny) extracts global visual inconsistencies; the texture branch (Swin Transformer-tiny) detects fine-grained blending artifacts; the frequency branch (CNN + SE) identifies periodic spectral noise. Attention-based temporal pooling dynamically prioritizes high-evidence frames, while adaptive attention fusion balances branch contributions.Trained on Celeb-DF (v2) with Focal Loss, ForensicFlow achieves AUC 0.9752, F1-Score 0.9408, and accuracy 0.9208, outperforming single-stream baselines. Ablation validates branch synergy; Grad-CAM confirms forensic focus. This comprehensive feature fusion provides superior resilience against subtle forgeries.

#31 \textit{FLARE}: Adaptive Multi-Dimensional Reputation for Robust Client Reliability in Federated Learning

著者: Abolfazl Younesi, Leon Kiss, Zahra Najafabadi Samani, Juan Aznar Poveda, Thomas Fahringer

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.14715

要約:
Federated learning (FL) enables collaborative model training while preserving data privacy. However, it remains vulnerable to malicious clients who compromise model integrity through Byzantine attacks, data poisoning, or adaptive adversarial behaviors. Existing defense mechanisms rely on static thresholds and binary classification, failing to adapt to evolving client behaviors in real-world deployments. We propose FLARE, an adaptive reputation-based framework that transforms client reliability assessment from binary decisions to a continuous, multi-dimensional trust evaluation. FLARE integrates: (i) a multi-dimensional reputation score capturing performance consistency, statistical anomaly indicators, and temporal behavior, (ii) a self-calibrating adaptive threshold mechanism that adjusts security strictness based on model convergence and recent attack intensity, (iii) reputation-weighted aggregation with soft exclusion to proportionally limit suspicious contributions rather than eliminating clients outright, and (iv) a Local Differential Privacy (LDP) mechanism enabling reputation scoring on privatized client updates. We further introduce a highly evasive Statistical Mimicry (SM) attack, a benchmark adversary that blends honest gradients with synthetic perturbations and persistent drift to remain undetected by traditional filters. Extensive experiments with 100 clients on MNIST, CIFAR-10, and SVHN demonstrate that FLARE maintains high model accuracy and converges faster than state-of-the-art Byzantine-robust methods under diverse attack types, including label flipping, gradient scaling, adaptive attacks, ALIE, and SM. FLARE improves robustness by up to 16% and preserves model convergence within 30% of the non-attacked baseline, while achieving strong malicious-client detection performance with minimal computational overhead. https://github.com/Anonymous0-0paper/FLARE

#32 Optimal Communication Unbalanced Private Set Union

privacy

著者: Jean-Guillaume Dumas (CASC, UGA, LJK), Alexis Galan (CASC, UGA), Bruno Grenet (CASC), Aude Maignan (CASC), Daniel S. Roche

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2402.16393

要約:
We present new two-party protocols for the Unbalanced Private Set Union (UPSU) problem. Here, the Sender holds a set of data points, and the Receiver holds another (possibly much larger) set, and they would like for the Receiver to learn the union of the two sets and nothing else. Furthermore, the Sender's computational cost, along with the communication complexity, should be smaller when the Sender has a smaller set. While the UPSU problem has numerous applications and has seen considerable recent attention in the literature, our protocols are the first where the Sender's computational cost and communication volume are linear in the size of the Sender's set only, and do not depend on the size of the Receiver's set. Our constructions combine linearly homomorphic encryption (LHE) with fully homomorphic encryption (FHE). The first construction uses multi-point polynomial evaluation (MEv) on FHE, and achieves optimal linear cost for the Sender, but has higher quadratic computational cost for the Receiver. In the second construction we explore another trade-off: the Receiver computes fast polynomial Euclidean remainder in FHE while the Sender computes a fast MEv, in LHE only. This reduces the Receiver's cost to quasi-linear, with a modest increase in the computational cost for the Sender. Preliminary experimental results using HElib indicate that, for example, a Sender holding 1000 elements can complete our first protocol using less than 2s of computation time and less than 10MB of communication volume, independently of the Receiver's set size.

#33 Purifying Approximate Differential Privacy with Randomized Post-processing

privacy

著者: Yingyu Lin, Erchi Wang, Yi-An Ma, Yu-Xiang Wang

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2503.21071

要約:
We propose a framework to convert $(\varepsilon, \delta)$-approximate Differential Privacy (DP) mechanisms into $(\varepsilon', 0)$-pure DP mechanisms under certain conditions, a process we call ``purification.'' This algorithmic technique leverages randomized post-processing with calibrated noise to eliminate the $\delta$ parameter while achieving near-optimal privacy-utility tradeoff for pure DP. It enables a new design strategy for pure DP algorithms: first run an approximate DP algorithm with certain conditions, and then purify. This approach allows one to leverage techniques such as strong composition and propose-test-release that require $\delta>0$ in designing pure-DP methods with $\delta=0$. We apply this framework in various settings, including Differentially Private Empirical Risk Minimization (DP-ERM), stability-based release, and query release tasks. To the best of our knowledge, this is the first work with a statistically and computationally efficient reduction from approximate DP to pure DP. Finally, we illustrate the use of this reduction for proving lower bounds under approximate DP constraints with explicit dependence in $\delta$, avoiding the sophisticated fingerprinting code construction.

#34 Quantifying Privacy Leakage in Split Inference via Fisher-Approximated Shannon Information Analysis

privacy

著者: Ruijun Deng, Zhihui Lu, Qiang Duan, Shijing Hu

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2504.10016

要約:
Split inference (SI) partitions deep neural networks into distributed sub-models, enabling collaborative learning without directly sharing raw data. However, SI remains vulnerable to Data Reconstruction Attacks (DRAs), where adversaries exploit exposed smashed data to recover private inputs. Despite substantial progress in attack-defense methodologies, the fundamental quantification of privacy risks is still underdeveloped. This paper establishes an information-theoretic framework for privacy leakage in SI, defining leakage as the adversary's certainty and deriving both average-case and worst-case error lower bounds. We further introduce Fisher-approximated Shannon information (FSInfo), a new privacy metric based on Fisher Information (FI) that enables operational and tractable computation of privacy leakage. Building on this metric, we develop FSInfoGuard, a defense mechanism that achieves a strong privacy-utility tradeoff. Our empirical study shows that FSInfo is an effective privacy metric across datasets, models, and defense strengths, providing accurate privacy estimates that support the design of defense methods outperforming existing approaches in both privacy protection and utility preservation. The code is available at https://github.com/SASA-cloud/FSInfo.

#35 Benchmarking Differentially Private Tabular Data Synthesis

privacysynthetic data

著者: Kai Chen, Xiaochen Li, Chen Gong, Ryan McKenna, Tianhao Wang

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2504.14061

要約:
Differentially private (DP) tabular data synthesis generates artificial data that preserves the statistical properties of private data while safeguarding individual privacy. The emergence of diverse algorithms in recent years has introduced challenges in practical applications, such as inconsistent data processing methods, the lack of in-depth algorithm analysis, and incomplete comparisons due to overlapping development timelines. These factors create significant obstacles to selecting appropriate algorithms. In this paper, we address these challenges by proposing a benchmark for evaluating tabular data synthesis methods. We present a unified evaluation framework that integrates data preprocessing, feature selection, and synthesis modules, facilitating fair and comprehensive comparisons. Our evaluation reveals that a significant utility-efficiency trade-off exists among current state-of-the-art methods. Some statistical methods are superior in synthesis utility, but their efficiency is not as good as most deep learning-based methods. Furthermore, we conduct an in-depth analysis of each module with experimental validation, offering theoretical insights into the strengths and limitations of different strategies. Our code is open-sourced via the link.\footnote{https://github.com/KaiChen9909/tab_bench}

#36 TooBadRL: Trigger Optimization to Boost Effectiveness of Backdoor Attacks on Deep Reinforcement Learning

backdoor

著者: Mingxuan Zhang, Oubo Ma, Kang Wei, Songze Li, Shouling Ji

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2506.09562

要約:
Deep reinforcement learning (DRL) has achieved remarkable success in a wide range of sequential decision-making applications, including robotics, healthcare, smart grids, and finance. Recent studies reveal that adversaries can implant backdoors into DRL agents during the training phase. These backdoors can later be activated by specific triggers during deployment, compelling the agent to execute targeted actions and potentially leading to severe consequences, such as drone crashes or vehicle collisions. However, existing backdoor attacks utilize simplistic and heuristic trigger configurations, overlooking the critical impact of trigger design on attack effectiveness. To address this gap, we introduce TooBadRL, the first framework to systematically optimize DRL backdoor triggers across three critical aspects: injection timing, trigger dimension, and manipulation magnitude. Specifically, we first introduce a performance-aware adaptive freezing mechanism to determine the injection timing during training. Then, we formulate trigger selection as an influence attribution problem and apply Shapley value analysis to identify the most influential trigger dimension for injection. Furthermore, we propose an adversarial input synthesis method to optimize the manipulation magnitude under environmental constraints. Extensive evaluations on three DRL algorithms and nine benchmark tasks demonstrate that TooBadRL outperforms five baseline methods in terms of attack success rate while only slightly affecting normal task performance. We further evaluate potential defense strategies from detection and mitigation perspectives. We open-source our code to facilitate reproducibility and further research.

#37 AgentArmor: Enforcing Program Analysis on Agent Runtime Trace to Defend Against Prompt Injection

agent

著者: Peiran Wang, Yang Liu, Yunfei Lu, Yifeng Cai, Hongbo Chen, Qingyou Yang, Jie Zhang, Jue Hong, Ye Wu

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2508.01249

要約:
Large Language Model (LLM) agents offer a powerful new paradigm for solving various problems by combining natural language reasoning with the execution of external tools. However, their dynamic and non-transparent behavior introduces critical security risks, particularly in the presence of prompt injection attacks. In this work, we propose a novel insight that treats the agent runtime traces as structured programs with analyzable semantics. Thus, we present AgentArmor, a program analysis framework that converts agent traces into graph intermediate representation-based structured program dependency representations (e.g., CFG, DFG, and PDG) and enforces security policies via a type system. AgentArmor consists of three key components: (1) a graph constructor that reconstructs the agent's runtime traces as graph-based intermediate representations with control and data flow described within; (2) a property registry that attaches security-relevant metadata of interacted tools \& data, and (3) a type system that performs static inference and checking over the intermediate representation. By representing agent behavior as structured programs, AgentArmor enables program analysis for sensitive data flow, trust boundaries, and policy violations. We evaluate AgentArmor on the AgentDojo benchmark, the results show that AgentArmor can reduce the ASR to 3\%, with the utility drop only 1\%.

#38 BadTime: An Effective Backdoor Attack on Multivariate Long-Term Time Series Forecasting

backdoor

著者: Kunlan Xiang, Haomiao Yang, Meng Hao, Wenbo Jiang, Haoxin Wang, Shiyue Huang, Shaofeng Li, Yijing Liu, Ji Guo, Dusit Niyato

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2508.04189

要約:
Multivariate long-term time series forecasting (MLTSF) models are increasingly deployed in critical domains such as climate, finance, and transportation. Despite their growing importance, the security of MLTSF models against backdoor attacks remains entirely unexplored. To bridge this gap, we propose BadTime, the first effective backdoor attack tailored for MLTSF. BadTime can manipulate hundreds of future predictions toward a target pattern by injecting a subtle trigger. BadTime addresses two key challenges that arise uniquely in MLTSF: (i) the rapid dilution of local triggers over long horizons, and (ii) the extreme sparsity of backdoor signals under stealth constraints. To counter dilution, BadTime leverages inter-variable correlations, temporal lags, and data-driven initialization to design a distributed, lag-aware trigger that ensures effective influence over long-range forecasts. To overcome sparsity, it introduces a hybrid strategy to select valuable poisoned samples and a decoupled backdoor training objective that adaptively adjusts the model's focus on the sparse backdoor signal, ensuring reliable learning at a poisoning rate as low as 1%. Extensive experiments show that BadTime significantly outperforms state-of-the-art (SOTA) backdoor attacks on time series forecasting by extending the attackable horizon from at most 12 timesteps to 720 timesteps (a 60-fold improvement), reducing MAE by over 50% on target variables, and boosting stealthiness by more than 3-fold under anomaly detection.

#39 Guided Reasoning in LLM-Driven Penetration Testing Using Structured Attack Trees

著者: Katsuaki Nakano, Reza Fayyazi, Shanchieh Jay Yang, Michael Zuzak

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.07939

要約:
Recent advances in Large Language Models (LLMs) have driven interest in automating cybersecurity penetration testing workflows, offering the promise of faster and more consistent vulnerability assessment for enterprise systems. Existing LLM agents for penetration testing primarily rely on self-guided reasoning, which can produce inaccurate or hallucinated procedural steps. As a result, the LLM agent may undertake unproductive actions, such as exploiting unused software libraries or generating cyclical responses that repeat prior tactics. In this work, we propose a guided reasoning pipeline for penetration testing LLM agents that incorporates a deterministic task tree built from the MITRE ATT&CK Matrix, a proven penetration testing kll chain, to constrain the LLM's reaoning process to explicitly defined tactics, techniques, and procedures. This anchors reasoning in proven penetration testing methodologies and filters out ineffective actions by guiding the agent towards more productive attack procedures. To evaluate our approach, we built an automated penetration testing LLM agent using three LLMs (Llama-3-8B, Gemini-1.5, and GPT-4) and applied it to navigate 10 HackTheBox cybersecurity exercises with 103 discrete subtasks representing real-world cyberattack scenarios. Our proposed reasoning pipeline guided the LLM agent through 71.8\%, 72.8\%, and 78.6\% of subtasks using Llama-3-8B, Gemini-1.5, and GPT-4, respectively. Comparatively, the state-of-the-art LLM penetration testing tool using self-guided reasoning completed only 13.5\%, 16.5\%, and 75.7\% of subtasks and required 86.2\%, 118.7\%, and 205.9\% more model queries. This suggests that incorporating a deterministic task tree into LLM reasoning pipelines can enhance the accuracy and efficiency of automated cybersecurity assessments

#40 Clustering Deposit and Withdrawal Activity in Tornado Cash: A Cross-Chain Analysis

著者: Raffaele Cristodaro, Benjamin Kraner, Claudio J. Tessone

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.09433

要約:
Tornado Cash is a decentralised mixer that uses cryptographic techniques to sever the on-chain trail between depositors and withdrawers. In practice, however, its anonymity can be undermined by user behaviour and operational quirks. We conduct the first cross-chain empirical study of Tornado Cash activity on Ethereum, BNB Smart Chain, and Polygon, introducing three clustering heuristics-(i) address-reuse, (ii) transactional-linkage, and (iii) a novel first-in-first-out (FIFO) temporal-matching rule. Together, these heuristics reconnect deposits to withdrawals and deanonymise a substantial share of recipients. Our analysis shows that 5.1 - 12.6% of withdrawals can already be traced to their originating deposits through address reuse and transactional linkage heuristics. Adding our novel First-In-First-Out (FIFO) temporal-matching heuristic lifts the linkage rate by a further 15 - 22 percentage points. Statistical tests confirm that these FIFO matches are highly unlikely to occur by chance. Comparable leakage across Ethereum, BNB Smart Chain, and Polygon indicates chain-agnostic user misbehaviour, rather than chain-specific protocol flaws. These results expose how quickly cryptographic guarantees can unravel in everyday use, underscoring the need for both disciplined user behaviour and privacy-aware protocol design. In total, our heuristics link over $2.3 billion in Tornado Cash withdrawals to identifiable deposits, exposing significant cracks in practical anonymity.

#41 The Impact of Sanctions on decentralised Privacy Tools: A Case Study of Tornado Cash

privacy

著者: Raffaele Cristodaro, Benjamin Kraner, Claudio J. Tessone

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.09443

要約:
This paper investigates the impact of sanctions on Tornado Cash, a smart contract protocol designed to enhance transaction privacy. Following the U.S. Department of the Treasury's sanctions against Tornado Cash in August 2022, platform activity declined sharply. We document a significant and sustained reduction in transaction volume, user diversity, and overall protocol utilization after the sanctions were imposed. Our analysis draws on transaction data from three major blockchains: Ethereum, BNB Smart Chain, and Polygon. We further examine developments following the partial lifting and eventual removal of sanctions by the U.S. Office of Foreign Assets Control (OFAC) in March 2025. Although activity partially recovered, the rebound remained limited. The Tornado Cash case illustrates how regulatory interventions can affect decentralized protocols, while also highlighting the challenges of fully enforcing such measures in decentralized environments.

#42 DRIP: Defending Prompt Injection via Token-wise Representation Editing and Residual Instruction Fusion

著者: Ruofan Liu, Yun Lin, Zhiyong Huang, Jin Song Dong

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.00447

要約:
Large language models (LLMs) are increasingly integrated into IT infrastructures, where they process user data according to predefined instructions. However, conventional LLMs remain vulnerable to prompt injection, where malicious users inject directive tokens into the data to subvert model behavior. Existing defenses train LLMs to semantically separate data and instruction tokens, but still struggle to (1) balance utility and security and (2) prevent instruction-like semantics in the data from overriding the intended instructions. We propose DRIP, which (1) precisely removes instruction semantics from tokens in the data section while preserving their data semantics, and (2) robustly preserves the effect of the intended instruction even under strong adversarial content. To "de-instructionalize" data tokens, DRIP introduces a data curation and training paradigm with a lightweight representation-editing module that edits embeddings of instruction-like tokens in the data section, enhancing security without harming utility. To ensure non-overwritability of instructions, DRIP adds a minimal residual module that reduces the ability of adversarial data to overwrite the original instruction. We evaluate DRIP on LLaMA 8B and Mistral 7B against StruQ, SecAlign, ISE, and PFT on three prompt-injection benchmarks (SEP, AlpacaFarm, and InjecAgent). DRIP improves role-separation score by 12-49\%, reduces attack success rate by over 66\% under adaptive attacks, and matches the utility of the undefended model, establishing a new state of the art for prompt-injection robustness.

#43 Proportionate Cybersecurity for Micro-SMEs: A Governance Design Model under NIS2

著者: Roberto Garrone

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.02898

要約:
Micro and small enterprises (SMEs) remain structurally vulnerable to cyber threats while facing capacity constraints that make formal compliance burdensome. This article develops a governance design model for proportionate SME cybersecurity, grounded in an awareness-first logic and informed by the EU Squad 2025 experience. Using a qualitative policy-analysis and conceptual policy-design approach, we reconstruct a seven-dimension preventive architecture: awareness and visibility, human behaviour, access control, system hygiene, data protection, detection and response, and continuous review, and justify each dimension's contribution to proportionality and risk reduction. We then map the model's regulatory scope and limits against the NIS2 Directive, Commission Implementing Regulation (EU) 2024/2690, the Digital Operational Resilience Act (DORA), the Cyber Resilience Act (CRA), and the EU Action Plan on Cybersecurity for Hospitals, clarifying which obligations are supported and which require complementary governance (e.g. role accountability, incident timelines, statements of applicability, sector-specific testing and procurement). The analysis argues that raising awareness is the fastest, scalable lever to increase cyber-risk sensitivity in micro-SMEs and complements, rather than replaces, formal compliance. We conclude with policy implications for EU and national programmes seeking practical, proportionate pathways to SME cyber resilience under NIS2.

#44 Privacy on the Fly: A Predictive Adversarial Transformation Network for Mobile Sensor Data

privacy

著者: Tianle Song, Chenhao Lin, Yang Cao, Zhengyu Zhao, Jiahao Sun, Chong Zhang, Le Yang, Chao Shen

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.07242

要約:
Mobile motion sensors such as accelerometers and gyroscopes are now ubiquitously accessible by third-party apps via standard APIs. While enabling rich functionalities like activity recognition and step counting, this openness has also enabled unregulated inference of sensitive user traits, such as gender, age, and even identity, without user consent. Existing privacy-preserving techniques, such as GAN-based obfuscation or differential privacy, typically require access to the full input sequence, introducing latency that is incompatible with real-time scenarios. Worse, they tend to distort temporal and semantic patterns, degrading the utility of the data for benign tasks like activity recognition. To address these limitations, we propose the Predictive Adversarial Transformation Network (PATN), a real-time privacy-preserving framework that leverages historical signals to generate adversarial perturbations proactively. The perturbations are applied immediately upon data acquisition, enabling continuous protection without disrupting application functionality. Experiments on two datasets demonstrate that PATN substantially degrades the performance of privacy inference models, achieving Attack Success Rate (ASR) of 40.11% and 44.65% (reducing inference accuracy to near-random) and increasing the Equal Error Rate (EER) from 8.30% and 7.56% to 41.65% and 46.22%. On ASR, PATN outperforms baseline methods by 16.16% and 31.96%, respectively.

#45 VULPO: Context-Aware Vulnerability Detection via On-Policy LLM Optimization

著者: Youpeng Li, Fuxun Yu, Xinda Wang

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2511.11896

要約:
The widespread reliance on open-source software dramatically increases the risk of vulnerability exploitation, underscoring the need for effective and scalable vulnerability detection (VD). Existing VD techniques, whether traditional machine learning-based or LLM-based approaches like prompt engineering, supervised fine-tuning, or off-policy preference optimization, remain fundamentally limited in their ability to perform context-aware analysis: They depend on fixed inputs or static preference datasets, cannot adaptively explore repository-level dependencies, and are constrained by function-level benchmarks that overlook critical vulnerability context. This paper introduces Vulnerability-Adaptive Policy Optimization (VULPO), an on-policy LLM reinforcement learning framework for context-aware VD. To support training and evaluation, we first construct ContextVul, a new dataset that augments high-quality function-level samples with lightweight method to extract repository-level context information. We then design multi-dimensional reward structuring that jointly captures prediction correctness, vulnerability localization accuracy, and the semantic relevance of vulnerability analysis, thereby guiding the model toward comprehensive contextual reasoning. To address the asymmetric difficulty of different vulnerability cases and mitigate reward hacking, VULPO incorporates label-level and sample-level difficulty-adaptive reward scaling, encouraging the model to explore challenging cases while maintaining balanced reward distribution. Extensive experiments demonstrate the superiority of our VULPO framework in context-aware VD: Our VULPO-4B substantially outperforms existing VD baselines based on prompt engineering and off-policy optimization, improving F1 by 85% over Qwen3-4B and achieving performance comparable to a 150x larger-scale model, DeepSeek-R1-0528.

#46 High Dimensional Distributed Gradient Descent with Arbitrary Number of Byzantine Attackers

著者: Wenyu Liu, Tianqiang Huang, Pengfei Zhang, Zong Ke, Minghui Min, Puning Zhao

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2307.13352

要約:
Adversarial attacks pose a major challenge to distributed learning systems, prompting the development of numerous robust learning methods. However, most existing approaches suffer from the curse of dimensionality, i.e. the error increases with the number of model parameters. In this paper, we make a progress towards high dimensional problems, under arbitrary number of Byzantine attackers. The cornerstone of our design is a direct high dimensional semi-verified mean estimation method. The idea is to identify a subspace with large variance. The components of the mean value perpendicular to this subspace are estimated using corrupted gradient vectors uploaded from worker machines, while the components within this subspace are estimated using auxiliary dataset. As a result, a combination of large corrupted dataset and small clean dataset yields significantly better performance than using them separately. We then apply this method as the aggregator for distributed learning problems. The theoretical analysis shows that compared with existing solutions, our method gets rid of $\sqrt{d}$ dependence on the dimensionality, and achieves minimax optimal statistical rates. Numerical results validate our theory as well as the effectiveness of the proposed method.

#47 Sei Giga

著者: Benjamin Marsh, Steven Landers, Jayendra Jog

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.14914

要約:
We introduce the Sei Giga, a multi-concurrent producer parallelized execution EVM layer one blockchain. In an internal testnet Giga has achieved >5 gigagas/sec throughput and sub 400ms finality. Giga uses Autobahn for consensus with separate DA and consensus layers requiring f+1 votes for a PoA on the DA layer before consensus. Giga reaches consensus over ordering and uses async block execution and state agreement to remove execution from the consensus bottleneck.

#48 Efficient Decoding Methods for Language Models on Encrypted Data

著者: Matan Avitan, Moran Baruch, Nir Drucker, Itamar Zimerman, Yoav Goldberg

公開日: Wed, 19 Nov 2025 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.08383

要約:
Large language models (LLMs) power modern AI applications, but processing sensitive data on untrusted servers raises privacy concerns. Homomorphic encryption (HE) enables computation on encrypted data for secure inference. However, neural text generation requires decoding methods like argmax and sampling, which are non-polynomial and thus computationally expensive under encryption, creating a significant performance bottleneck. We introduce cutmax, an HE-friendly argmax algorithm that reduces ciphertext operations compared to prior methods, enabling practical greedy decoding under encryption. We also propose the first HE-compatible nucleus (top-p) sampling method, leveraging cutmax for efficient stochastic decoding with provable privacy guarantees. Both techniques are polynomial, supporting efficient inference in privacy-preserving settings. Moreover, their differentiability facilitates gradient-based sequence-level optimization as a polynomial alternative to straight-through estimators. We further provide strong theoretical guarantees for cutmax, proving its convergence via exponential amplification of the gap ratio between the maximum and runner-up elements. Evaluations on realistic LLM outputs show latency reductions of 24x-35x over baselines, advancing secure text generation.

cs.CR updates on arXiv.org

📋 論文タイトル一覧