arXiv論文一覧 - cs.CR updates on arXiv.org

#1 Silent Sabotage During Fine-Tuning: Few-Shot Rationale Poisoning of Compact Medical LLMs

backdoor

著者: Jingyuan Xie, Wenjie Wang, Ji Wu, Jiandong Gao

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.02262

要約:
Supervised fine-tuning (SFT) is essential for the development of medical large language models (LLMs), yet prior poisoning studies have mainly focused on the detectable backdoor attacks. We propose a novel poisoning attack targeting the reasoning process of medical LLMs during SFT. Unlike backdoor attacks, our method injects poisoned rationales into few-shot training data, leading to stealthy degradation of model performance on targeted medical topics. Results showed that knowledge overwriting was ineffective, while rationale poisoning caused significant decline on the accuracy of the target subject, as long as no correct samples of the same subject appear in the dataset. A minimum number and ratio of poisoned samples was needed to carry out an effective and stealthy attack, which was more efficient and accurate than catastrophic forgetting. We demonstrate though this study the risk of SFT-stage poisoning, hoping to spur more studies of defense in the sensitive medical domain.

#2 Quantifying Frontier LLM Capabilities for Container Sandbox Escape

著者: Rahul Marchand, Art O Cathain, Jerome Wynne, Philippos Maximos Giavridis, Sam Deverett, John Wilkinson, Jason Gwartz, Harry Coppock

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.02277

要約:
Large language models (LLMs) increasingly act as autonomous agents, using tools to execute code, read and write files, and access networks, creating novel security risks. To mitigate these risks, agents are commonly deployed and evaluated in isolated "sandbox" environments, often implemented using Docker/OCI containers. We introduce SANDBOXESCAPEBENCH, an open benchmark that safely measures an LLM's capacity to break out of these sandboxes. The benchmark is implemented as an Inspect AI Capture the Flag (CTF) evaluation utilising a nested sandbox architecture with the outer layer containing the flag and no known vulnerabilities. Following a threat model of a motivated adversarial agent with shell access inside a container, SANDBOXESCAPEBENCH covers a spectrum of sandboxescape mechanisms spanning misconfiguration, privilege allocation mistakes, kernel flaws, and runtime/orchestration weaknesses. We find that, when vulnerabilities are added, LLMs are able to identify and exploit them, showing that use of evaluation like SANDBOXESCAPEBENCH is needed to ensure sandboxing continues to provide the encapsulation needed for highly-capable models.

#3 ZeroDayBench: Evaluating LLM Agents on Unseen Zero-Day Vulnerabilities for Cyberdefense

agent

著者: Nancy Lau, Louis Sloot, Jyoutir Raj, Giuseppe Marco Boscardin, Evan Harris, Dylan Bowman, Mario Brajkovski, Jaideep Chawla, Dan Zhao

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.02297

要約:
Large language models (LLMs) are increasingly being deployed as software engineering agents that autonomously contribute to repositories. A major benefit these agents present is their ability to find and patch security vulnerabilities in the codebases they oversee. To estimate the capability of agents in this domain, we introduce ZeroDayBench, a benchmark where LLM agents find and patch 22 novel critical vulnerabilities in open-source codebases. We focus our efforts on three popular frontier agentic LLMs: GPT-5.2, Claude Sonnet 4.5, and Grok 4.1. We find that frontier LLMs are not yet capable of autonomously solving our tasks and observe some behavioral patterns that suggest how these models can be improved in the domain of proactive cyberdefense.

#4 Authenticated Contradictions from Desynchronized Provenance and Watermarking

intellectual property

著者: Alexander Nemecek, Hengzhi He, Guang Cheng, Erman Ayday

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.02378

要約:
Cryptographic provenance standards such as C2PA and invisible watermarking are positioned as complementary defenses for content authentication, yet the two verification layers are technically independent: neither conditions on the output of the other. This work formalizes and empirically demonstrates the $\textit{Integrity Clash}$, a condition in which a digital asset carries a cryptographically valid C2PA manifest asserting human authorship while its pixels simultaneously carry a watermark identifying it as AI-generated, with both signals passing their respective verification checks in isolation. We construct metadata washing workflows that produce these authenticated fakes through standard editing pipelines, requiring no cryptographic compromise, only the semantic omission of a single assertion field permitted by the current C2PA specification. To close this gap, we propose a cross-layer audit protocol that jointly evaluates provenance metadata and watermark detection status, achieving 100% classification accuracy across 3,500 test images spanning four conflict-matrix states and three realistic perturbation conditions. Our results demonstrate that the gap between these verification layers is unnecessary and technically straightforward to close.

#5 TraceGuard: Process-Guided Firewall against Reasoning Backdoors in Large Language Models

backdoor

著者: Zhen Guo, Shanghao Shi, Hao Li, Shamim Yazdani, Ning Zhang, Reza Tourani

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.02436

要約:
The deployment of Large Reasoning Models (LRMs) in high-stakes decision-making pipelines has introduced a novel and opaque attack surface: reasoning backdoors. In these attacks, the model's intermediate Chain-of-Thought (CoT) is manipulated to provide a linguistically plausible but logically fallacious justification for a malicious conclusion. While frontier models exhibit an intrinsic capacity to detect these fractures, compact, deployable models suffer from a fundamental verification gap, relying on fragile lexical heuristics that are easily bypassed by motivated adversaries. To bridge this gap, we propose TraceGuard, a process-guided security framework that transforms small-scale models into robust reasoning firewalls. Our approach treats the reasoning trace as an untrusted payload and establishes a defense-in-depth strategy through three synergistic phases: (1) Automated Forensic Synthesis, which generates contrastive reasoning pairs to isolate the specific logical point of fracture; (2) Step-Aware Supervised Fine-Tuning (SSFT), to instill a structural verification grammar; and (3) Verifier-Guided Reinforcement Learning (VGRL), utilizing Group Relative Policy Optimization. We identify and mitigate a critical failure mode of baseline alignment - lexical overfitting - whereby verifiers memorize adversarial triggers rather than auditing logical integrity. Our empirical evaluation demonstrates that TraceGuard acts as a security force multiplier: a 4B-parameter verifier achieves forensic precision on unseen attacks - including latent backdoors and post-hoc rationalizations - that rivals architectures two orders of magnitude larger. We further demonstrate robustness against adaptive adversaries in a grey-box setting, establishing TraceGuard as a viable, low-latency security primitive for the Trusted Computing Base.

#6 Composable Attestation: A Generalized Framework for Continuous and Incremental Trust in AI-Driven Distributed Systems

著者: Sheng Sun, Sarah Evans

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.02451

要約:
This paper presents composable attestation as a generalized cryptographic framework for Continuous and Incremental Trust in Distributed Systems,such as Artificial Intelligence (AI) computation, and Open Source Software (OSS) supply chain verification. We establish a rigorous mathematical foundation which is defining core properties of such attestation systems: composability, order independence, transitivity, determinism, inclusion, and dynamic component verification. In contrast to traditional attestation methodologies relying on monolithic verification, composable attestation facilitates modular, scalable, and cryptographically secured integrity verification adaptable to evolving system configurations. This work introduces generalized attestation proof generation and verification functions, implementable via a variety of cryptographic constructions, in which Merkle trees plays vital role in constructing the composable attestation proof. Alternative constructions, including accumulator-based schemes and multi-signature approaches, are also explored, each presenting distinct trade-offs in performance, security, and functionality. Formal analysis demonstrates the adherence of these implementations to the fundamental properties . The framework's utility extends to applications such as secure AI model integrity verification , federated learning, and runtime trust assurance. The concept of attestation inclusion is introduced, permitting incremental integration of new components without necessitating full system re-attestation. This generalized approach reinforce trust in AI computation and broader distributed computing environments through cryptographically verifiable proof mechanisms, building upon foundational concepts of bootstrapping trust.

#7 Exploiting PendingIntent Provenance Confusion to Spoof Android SDK Authentication

著者: Ramanpreet Singh Khinda

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.02539

要約:
A single authentication bypass in a partner SDK grants attackers the identity of every partner in the ecosystem -- and millions of apps use SDKs with exactly this vulnerability. OWASP's 2024 Mobile Top 10 ranks Inadequate Supply Chain Security as the second most critical mobile risk, explicitly identifying third-party SDKs as a primary attack vector. Cross-app mobile SDKs -- where a partner application communicates with a platform provider's application via inter-process communication (IPC) -- mediate sensitive operations such as content publishing, payment initiation, and identity federation. Unlike embedded libraries that execute within a single app's process, cross-app SDKs require the provider's service to authenticate the calling application at runtime. A pattern sometimes used for this authentication relies on PendingIntent.getCreatorPackage() to verify sender identity. We demonstrate that this mechanism exhibits a fundamental provenance confusion vulnerability: a PendingIntent reliably identifies who created it but cannot attest who presents it -- and this distinction is fatal for authentication. An attacker app with notification access can steal a legitimate partner's PendingIntent via NotificationListenerService and replay it to impersonate that partner, bypassing authentication entirely. The attack succeeds against both mutable and immutable PendingIntents because immutability protects the token's contents, not its provenance. We systematically evaluate eight Android IPC authentication mechanisms against an SDK-specific threat model and present a defense architecture combining Bound Service IPC with kernel-level caller verification via Binder.getCallingUid(), supplemented by server-side certificate-hash validation. This provides authentication guarantees while remaining scalable across partner ecosystems.

#8 Extending the Formalism and Theoretical Foundations of Cryptography to AI

著者: Federico Villa, F. Bet\"ul Durak, Tadayoshi Kohno, Tapdig Maharramli, Franziska Roesner

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.02590

要約:
Recent progress in (Large) Language Models (LMs) has enabled the development of autonomous LM-based agents capable of executing complex tasks with minimal supervision. These agents have started to be integrated into systems with significant autonomy and authority. The security community has been studying their security. One emerging direction to mitigate security risks is to constrain agent behaviours via access control and permissioning mechanisms. Existing permissioning proposals, however, remain difficult to compare due to the absence of a shared formal foundation. This work provides such a foundation. We first systematize the landscape by constructing an attack taxonomy tailored to language models, the computational primitives of agentic systems. We then develop a formal treatment of agentic access control by defining an AIOracle algorithmically and introducing a security-game framework that captures completeness (in the absence of an adversary) and adversarial robustness. Our security game unifies confidentiality, integrity, and availability within a single model. Using this framework, we show that existing approaches to confidentiality of training data fundamentally conflict with completeness. Finally, we formalize a modular decomposition of helpfulness and harmlessness objectives and prove its soundness, in order to enable principled reasoning about the security of agentic system designs. Our studies suggests that if we were to design a secure system with measurable security, then we might want to use a modular approach to break the problem into sub-problems and let the composition on different modules complete the design. Our studies show that this natural approach with the relevant formalism is needed to prove security reductions.

#9 Blockchain Communication Vulnerabilities

著者: Andrei Lebedev, Vincent Gramoli

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.02661

要約:
Blockchains are diverse in the way they handle communications between their nodes to disseminate information, mitigate attacks, and agree on the next block. While security vulnerabilities have been identified, they rely on an attack custom-made for a specific blockchain communication protocol. To our knowledge, the vulnerabilities of multiple blockchain communication protocols to adversarial conditions have never been compared. In this paper, we compare empirically the vulnerabilities of the communication protocols of five modern in-production blockchains, Algorand, Aptos, Avalanche, Redbelly and Solana, when attacked in five different ways. We conclude that Algorand is vulnerable to packet loss attacks, Aptos is vulnerable to targeted load attacks and leader isolation attacks, Avalanche is vulnerable to transient failure attacks, Redbelly's performance is impacted by packet loss attacks and Solana is vulnerable to stopping attacks and leader isolation attacks. Our system is open source.

#10 VA-DAR: A PQC-Ready, Vendor-Agnostic Deterministic Artifact Resolution for Serverless, Enumeration-Resistant Wallet Recovery

著者: Jian Sheng Wang

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.02690

要約:
Serverless wallet recovery must balance portability, usability, and privacy. Public registries enable decentralized lookup but naive identifier hashing leaks membership through enumeration. We present VA-DAR, a keyed-discovery protocol for ACE-GF-based wallets that use device-bound passkeys for day-to-day local unlock while supporting cross-device recovery using only a user-provided identifier (e.g., email) and a single recovery passphrase. As a discovery-and-recovery layer over ACE-GF, VA-DAR inherits ACE-GF's context-isolated, algorithm-agile derivation substrate, enabling non-disruptive migration to post-quantum algorithms at the identity layer. The design introduces a decentralized discovery-and-recovery layer that maps a privacy-preserving discovery identifier to an immutable content identifier of a backup sealed artifact stored on a decentralized storage network. Concretely, a user derives passphrase-rooted key material with a memory-hard KDF, domain-separates keys for artifact sealing and discovery indexing, and publishes a registry record keyed by a passphrase-derived discovery identifier. VA-DAR provides: (i) practical cross-device recovery using only identifier and passphrase, (ii) computational resistance to public-directory enumeration, (iii) integrity of discovery mappings via owner authorization, and (iv) rollback/tamper detection via monotonic versioning and artifact commitments. We define three sealed artifact roles, two update-authorization options, and three protocol flows (registration, recovery, update). We formalize security goals via cryptographic games and prove, under standard assumptions, that VA-DAR meets these goals while remaining vendor-agnostic and chain-agnostic. End-to-end post-quantum deployment additionally requires a PQ-secure instantiation of registry authorization.

#11 Scores Know Bobs Voice: Speaker Impersonation Attack

著者: Chanwoo Hwang, Sunpill Kim, Yong Kiam Tan, Tianchi Liu, Seunghun Paik, Dongsoo Kim, Mondal Soumik, Khin Mi Mi Aung, Jae Hong Seo

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.02781

要約:
Advances in deep learning have enabled the widespread deployment of speaker recognition systems (SRSs), yet they remain vulnerable to score-based impersonation attacks. Existing attacks that operate directly on raw waveforms require a large number of queries due to the difficulty of optimizing in high-dimensional audio spaces. Latent-space optimization within generative models offers improved efficiency, but these latent spaces are shaped by data distribution matching and do not inherently capture speaker-discriminative geometry. As a result, optimization trajectories often fail to align with the adversarial direction needed to maximize victim scores. To address this limitation, we propose an inversion-based generative attack framework that explicitly aligns the latent space of the synthesis model with the discriminative feature space of SRSs. We first analyze the requirements of an inverse model for score-based attacks and introduce a feature-aligned inversion strategy that geometrically synchronizes latent representations with speaker embeddings. This alignment ensures that latent updates directly translate into score improvements. Moreover, it enables new attack paradigms, including subspace-projection-based attacks, which were previously infeasible due to the absence of a faithful feature-to-audio mapping. Experiments show that our method significantly improves query efficiency, achieving competitive attack success rates with on average 10x fewer queries than prior approaches. In particular, the enabled subspace-projection-based attack attains up to 91.65% success using only 50 queries. These findings establish feature-aligned inversion as a key tool for evaluating the robustness of modern SRSs against score-based impersonation threats.

#12 Understanding the Resource Cost of Fully Homomorphic Encryption in Quantum Federated Learning

著者: Lukas B\"ohm, Arjhun Swaminathan, Anika Hannemann, Erik Buchmann

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.02799

要約:
Quantum Federated Learning (QFL) enables distributed training of Quantum Machine Learning (QML) models by sharing model gradients instead of raw data. However, these gradients can still expose sensitive user information. To enhance privacy, homomorphic encryption of parameters has been proposed as a solution in QFL and related frameworks. In this work, we evaluate the overhead introduced by Fully Homomorphic Encryption (FHE) in QFL setups and assess its feasibility for real-world applications. We implemented various QML models including a Quantum Convolutional Neural Network (QCNN) trained in a federated environment with parameters encrypted using the CKKS scheme. This work marks the first QCNN trained in a federated setting with CKKS-encrypted parameters. Models of varying architectures were trained to predict brain tumors from MRI scans. The experiments reveal that memory and communication overhead remain substantial, making FHE challenging to deploy. Minimizing overhead requires reducing the number of model parameters, which, however, leads to a decline in classification performance, introducing a trade-off between privacy and model complexity.

#13 DSBA: Dynamic Stealthy Backdoor Attack with Collaborative Optimization in Self-Supervised Learning

backdoor

著者: Jiayao Wang, Mohammad Maruf Hasan, Yiping Zhang, Xiaoying Lei, Jiale Zhang, Qilin Wu, Junwu Zhu, Dongfang Zhao

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.02849

要約:
Self-Supervised Learning (SSL) has emerged as a significant paradigm in representation learning thanks to its ability to learn without extensive labeled data, its strong generalization capabilities, and its potential for privacy preservation. However, recent research reveals that SSL models are also vulnerable to backdoor attacks. Existing backdoor attack methods in the SSL context commonly suffer from issues such as high detectability of triggers, feature entanglement, and pronounced out-of-distribution properties in poisoned samples, all of which compromises attack effectiveness and stealthiness. To that, we propose a Dynamic Stealthy Backdoor Attack (DSBA) backed by a new technique we term Collaborative Optimization. This method decouples the attack process into two collaborative optimization layers: the outer-layer optimization trains a backdoor encoder responsible for global feature space remodeling, aiming to achieve precise backdoor implantation while preserving core functionality; meanwhile, the inner-layer optimization employs a dynamically optimized generator to adaptively produce optimally concealed triggers for individual samples, achieving coordinated concealment across feature space and visual space. We also introduce multiple loss functions to dynamically balance attack performance and stealthiness, in which we employ an adaptive weight scheduling mechanism to enhance training stability. Extensive experiments on various mainstream SSL algorithms and five public datasets demonstrate that: (i) DSBA significantly enhances Attack Success Rate (ASR) and stealthiness while maintaining downstream task accuracy; and (ii) DSBA exhibits superior robustness against existing mainstream defense methods.

#14 Kraken: Higher-order EM Side-Channel Attacks on DNNs in Near and Far Field

著者: Peter Horvath, Ilia Shumailov, Lukasz Chmielewski, Lejla Batina, Yuval Yarom

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.02891

要約:
The multi-million dollar investment required for modern machine learning (ML) has made large ML models a prime target for theft. In response, the field of model stealing has emerged. Attacks based on physical side-channel information have shown that DNN model extraction is feasible, even on CUDA Cores in a GPU. For the first time, our work demonstrates parameter extraction on the specialized GPU's Tensor Core units, most commonly used GPU units nowadays due to their superior performance, via near-field physical side-channel attacks. Previous work targeted only the general-purpose CUDA Cores in the GPU, the functional units that have been part of the GPU since its inception. Our method is tailored to the GPU architecture to accurately estimate energy consumption and derive efficient attacks via Correlation Power Analysis (CPA). Furthermore, we provide an exploratory analysis of hyperparameter and weight leakage from LLMs in far field and demonstrate that the GPU's electromagnetic radiation leaks even 100\,cm away through a glass obstacle.

#15 Multi-Agent Honeypot-Based Request-Response Context Dataset for Improved SQL Injection Detection Performance

agent

著者: Hao Yu, Hui Li, FengYuan Shi, Wenjie Yu, PinHan Ho, Zehua Wang, Bin Wang

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.02963

要約:
SQL injection remains a major threat to web applications, as existing defenses often fail against obfuscation and evolving attacks because of neglecting the request-response context. This paper presents a context-enriched SQL injection detection framework, focusing on constructing a high-quality request-response dataset via a multi-agent honeypot system: the Request Generator Agent produces diverse malicious/benign requests, the Database Response Agent mediates interactions to ensure authentic responses while protecting production data, and the Traffic Monitor pairs requests with responses, assigns labels, and cleans data, yielding totally 140,973 labeled pairs with contextual cues absent in payload-only data. Experiments show that models trained on this context dataset outperform payload-only counterparts: CNN and BiLSTM achieve over 40\% accuracy improvement in different tasks, validating that the request-response context enhances the detection of evolving and obfuscated attacks.

#16 Contextualized Privacy Defense for LLM Agents

privacyagent

著者: Yule Wen, Yanzhe Zhang, Jianxun Lian, Xiaoyuan Yi, Xing Xie, Diyi Yang

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.02983

要約:
LLM agents increasingly act on users' personal information, yet existing privacy defenses remain limited in both design and adaptability. Most prior approaches rely on static or passive defenses, such as prompting and guarding. These paradigms are insufficient for supporting contextual, proactive privacy decisions in multi-step agent execution. We propose Contextualized Defense Instructing (CDI), a new privacy defense paradigm in which an instructor model generates step-specific, context-aware privacy guidance during execution, proactively shaping actions rather than merely constraining or vetoing them. Crucially, CDI is paired with an experience-driven optimization framework that trains the instructor via reinforcement learning (RL), where we convert failure trajectories with privacy violations into learning environments. We formalize baseline defenses and CDI as distinct intervention points in a canonical agent loop, and compare their privacy-helpfulness trade-offs within a unified simulation framework. Results show that our CDI consistently achieves a better balance between privacy preservation (94.2%) and helpfulness (80.6%) than baselines, with superior robustness to adversarial conditions and generalization.

#17 RAIN: Secure and Robust Aggregation under Shuffle Model of Differential Privacy

privacy

著者: Yuhang Li, Yajie Wang, Xiangyun Tang, Peng Jiang, Yu-an Tan, Liehuang Zhu

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03108

要約:
Secure aggregation is a foundational building block of privacy-preserving learning, yet achieving robustness under adversarial behavior remains challenging. Modern systems increasingly adopt the shuffle model of differential privacy (Shuffle-DP) to locally perturb client updates and globally anonymize them via shuffling for enhanced privacy protection. However, these perturbations and anonymization distort gradient geometry and remove identity linkage, leaving systems vulnerable to adversarial poisoning attacks. Moreover, the shuffler, typically a third party, can be compromised, undermining security against malicious adversaries. To address these challenges, we present Robust Aggregation in Noise (RAIN), a unified framework that reconciles privacy, robustness, and verifiability under Shuffle-DP. At its core, RAIN adopts sign-space aggregation to robustly measure update consistency and limit malicious influence under noise and anonymization. Specifically, we design two novel secret-shared protocols for shuffling and aggregation that operate directly on additive shares and preserve Shuffle-DP's tight privacy guarantee. In each round, the aggregated result is verified to ensure correct aggregation and detect any selective dropping, achieving malicious security with minimal overhead. Extensive experiments across comprehensive benchmarks show that RAIN maintains strong privacy guarantees under Shuffle-DP and remains robust to poisoning attacks with negligible degradation in accuracy and convergence. It further provides real-time integrity verification with complete tampering detection, while achieving up to 90x lower communication cost and 10x faster aggregation compared with prior work.

#18 Gravity Falls: A Comparative Analysis of Domain-Generation Algorithm (DGA) Detection Methods for Mobile Device Spearphishing

著者: Adam Dorian Wong, John D. Hastings

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03270

要約:
Mobile devices are frequent targets of eCrime threat actors through SMS spearphishing (smishing) links that leverage Domain Generation Algorithms (DGA) to rotate hostile infrastructure. Despite this, DGA research and evaluation largely emphasize malware C2 and email phishing datasets, leaving limited evidence on how well detectors generalize to smishing-driven domain tactics outside enterprise perimeters. This work addresses that gap by evaluating traditional and machine-learning DGA detectors against Gravity Falls, a new semi-synthetic dataset derived from smishing links delivered between 2022 and 2025. Gravity Falls captures a single threat actor's evolution across four technique clusters, shifting from short randomized strings to dictionary concatenation and themed combo-squatting variants used for credential theft and fee/fine fraud. Two string-analysis approaches (Shannon entropy and Exp0se) and two ML-based detectors (an LSTM classifier and COSSAS DGAD) are assessed using Top-1M domains as benign baselines. Results are strongly tactic-dependent: performance is highest on randomized-string domains but drops on dictionary concatenation and themed combo-squatting, with low recall across multiple tool/cluster pairings. Overall, both traditional heuristics and recent ML detectors are ill-suited for consistently evolving DGA tactics observed in Gravity Falls, motivating more context-aware approaches and providing a reproducible benchmark for future evaluation.

#19 Federated Inference: Toward Privacy-Preserving Collaborative and Incentivized Model Serving

privacy

著者: Jungwon Seo, Ferhat Ozgur Catak, Chunming Rong, Jaeyeon Jang

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.02214

要約:
Federated Inference (FI) studies how independently trained and privately owned models can collaborate at inference time without sharing data or model parameters. While recent work has explored secure and distributed inference from disparate perspectives, a unified abstraction and system-level understanding of FI remain lacking. This paper positions FI as a distinct collaborative paradigm, complementary to federated learning, and identifies two fundamental requirements that govern its feasibility: inference-time privacy preservation and meaningful performance gains through collaboration. We formalize FI as a protected collaborative computation, analyze its core design dimensions, and examine the structural trade-offs that arise when privacy constraints, non-IID data, and limited observability are jointly imposed at inference time. Through a concrete instantiation and empirical analysis, we highlight recurring friction points in privacy-preserving inference, ensemble-based collaboration, and incentive alignment. Our findings suggest that FI exhibits system-level behaviors that cannot be directly inherited from training-time federation or classical ensemble methods. Overall, this work provides a unifying perspective on FI and outlines open challenges that must be addressed to enable practical, scalable, and privacy-preserving collaborative inference systems.

#20 SuperLocalMemory: Privacy-Preserving Multi-Agent Memory with Bayesian Trust Defense Against Memory Poisoning

privacybackdooragent

著者: Varun Pratap Bhardwaj

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.02240

要約:
We present SuperLocalMemory, a local-first memory system for multi-agent AI that defends against OWASP ASI06 memory poisoning through architectural isolation and Bayesian trust scoring, while personalizing retrieval through adaptive learning-to-rank -- all without cloud dependencies or LLM inference calls. As AI agents increasingly rely on persistent memory, cloud-based memory systems create centralized attack surfaces where poisoned memories propagate across sessions and users -- a threat demonstrated in documented attacks against production systems. Our architecture combines SQLite-backed storage with FTS5 full-text search, Leiden-based knowledge graph clustering, an event-driven coordination layer with per-agent provenance, and an adaptive re-ranking framework that learns user preferences through three-layer behavioral analysis (cross-project technology preferences, project context detection, and workflow pattern mining). Evaluation across seven benchmark dimensions demonstrates 10.6ms median search latency, zero concurrency errors under 10 simultaneous agents, trust separation (gap =0.90) with 72% trust degradation for sleeper attacks, and 104% improvement in NDCG@5 when adaptive re-ranking is enabled. Behavioral data is isolated in a separate database with GDPR Article 17 erasure support. SuperLocalMemory is open-source (MIT) and integrates with 17+ development tools via Model Context Protocol.

#21 Toward multi-purpose quantum communication networks: from theory to protocol implementation

著者: Lucas Hanouz, Marc Kaplan, Jean-S\'ebastien Kersaint Tournebize, Chin-te Liao, Anne Marin

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.02923

要約:
Most quantum communication networks around the world are used for a single task: quantum key distribution. In order to initiate the transition to multi-purpose quantum communication networks, we demonstrate the implementation of two different tasks on the same quantum key distribution hardware. Specifically, we focus on quantum oblivious transfer and quantum tokens. Our main contribution is to establish a methodology that greatly simplifies the expertise required to achieve the deployment, assess its performance, and evaluate its feasibility at a large scale. The implementation that we present is full-stack. It is based on a development framework that allows running user-defined applications both with simulated or real quantum communication backend. The hardware used for the implementation is VeriQloud's Qline. The simulation backend reproduces exactly the inputs and outputs of the real hardware, but also its losses and errors. It can therefore be used to validate the implementation before running it on the real hardware. The sources of the software that we use are fully open, making our research reproducible. The security of the implementations on real hardware are discussed with respect to security bounds previously known in the literature. We also discuss the engineering choices that we made in order to make the implementations feasible. By establishing a methodology to evaluate the performance and security of quantum communication protocols, we take a significant step towards industrializing and deploying large-scale, multi-purpose quantum communication networks.

#22 IoUCert: Robustness Verification for Anchor-based Object Detectors

著者: Benedikt Br\"uckner, Alejandro Mercado, Yanghao Zhang, Panagiotis Kouvaros, Alessio Lomuscio

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03043

要約:
While formal robustness verification has seen significant success in image classification, scaling these guarantees to object detection remains notoriously difficult due to complex non-linear coordinate transformations and Intersection-over-Union (IoU) metrics. We introduce {\sc \sf IoUCert}, a novel formal verification framework designed specifically to overcome these bottlenecks in foundational anchor-based object detection architectures. Focusing on the object localisation component in single-object settings, we propose a coordinate transformation that enables our algorithm to circumvent precision-degrading relaxations of non-linear box prediction functions. This allows us to optimise bounds directly with respect to the anchor box offsets which enables a novel Interval Bound Propagation method that derives optimal IoU bounds. We demonstrate that our method enables, for the first time, the robustness verification of realistic, anchor-based models including SSD, YOLOv2, and YOLOv3 variants against various input perturbations.

#23 Multiparty Quantum Key Agreement: Architectures, State-of-the-art, and Open Problems

著者: Malik Mouaji, Saif Al-Kuwari

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03225

要約:
Multiparty quantum key agreement (MQKA) enables $n \geq 3$ mutually distrustful users to establish a shared secret key through collaborative quantum protocols. In this paper, we provide a comprehensive review where we argue that MQKA is best understood as a design space organized along three orthogonal but tightly coupled axes: (1) network architecture, which determines how quantum states flow between participants; (2) quantum resources, which encode the physical degrees of freedom used for implementation; and (3) security model, which defines trust assumptions about devices and infrastructure. Rather than treating MQKA as a linear sequence of isolated protocols, we develop this three-axis perspective to reveal recurrent patterns, sharp trade-offs, and unexplored design spaces. We classify MQKA protocols into structural families, map them to underlying quantum resources, and analyze how different security models shape fairness and collusion resistance. We further identify open challenges in composable security frameworks, network native integration, device-independent implementations, and propose a research roadmap toward hybrid-resource, bosonic-code-encoded, and fairness-aware MQKA suitable for the future quantum internet deployments in the post-NISQ era.

#24 Adaptive Methods Are Preferable in High Privacy Settings: An SDE Perspective

privacy

著者: Enea Monzio Compagnoni, Alessandro Stanghellini, Rustem Islamov, Aurelien Lucchi, Anastasiia Koloskova

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03226

要約:
Differential Privacy (DP) is becoming central to large-scale training as privacy regulations tighten. We revisit how DP noise interacts with adaptivity in optimization through the lens of stochastic differential equations, providing the first SDE-based analysis of private optimizers. Focusing on DP-SGD and DP-SignSGD under per-example clipping, we show a sharp contrast under fixed hyperparameters: DP-SGD converges at a Privacy-Utility Trade-Off of $\mathcal{O}(1/\varepsilon^2)$ with speed independent of $\varepsilon$, while DP-SignSGD converges at a speed linear in $\varepsilon$ with an $\mathcal{O}(1/\varepsilon)$ trade-off, dominating in high-privacy or large batch noise regimes. By contrast, under optimal learning rates, both methods achieve comparable theoretical asymptotic performance; however, the optimal learning rate of DP-SGD scales linearly with $\varepsilon$, while that of DP-SignSGD is essentially $\varepsilon$-independent. This makes adaptive methods far more practical, as their hyperparameters transfer across privacy levels with little or no re-tuning. Empirical results confirm our theory across training and test metrics, and empirically extend from DP-SignSGD to DP-Adam.

#25 Topic-Based Watermarks for Large Language Models

intellectual property

著者: Alexander Nemecek, Yuzhou Jiang, Erman Ayday

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2404.02138

要約:
The indistinguishability of large language model (LLM) output from human-authored content poses significant challenges, raising concerns about potential misuse of AI-generated text and its influence on future model training. Watermarking algorithms offer a viable solution by embedding detectable signatures into generated text. However, existing watermarking methods often involve trade-offs among attack robustness, generation quality, and additional overhead such as specialized frameworks or complex integrations. We propose a lightweight, topic-guided watermarking scheme for LLMs that partitions the vocabulary into topic-aligned token subsets. Given an input prompt, the scheme selects a relevant topic-specific token list, effectively "green-listing" semantically aligned tokens to embed robust marks while preserving fluency and coherence. Experimental results across multiple LLMs and state-of-the-art benchmarks demonstrate that our method achieves text quality comparable to industry-leading systems and simultaneously improves watermark robustness against paraphrasing and lexical perturbation attacks, with minimal performance overhead. Our approach avoids reliance on additional mechanisms beyond standard text generation pipelines, enabling straightforward adoption and suggesting a practical path toward globally consistent watermarking of AI-generated content.

#26 Plug-and-Hide: Provable and Adjustable Diffusion Generative Steganography

diffusion

著者: Jiahao Zhu, Zixuan Chen, Yi Zhou, Weiqi Luo, Xiaohua Xie

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2409.04878

要約:
Diffusion model-based generative image steganography (DM-GIS) is an emerging paradigm that leverages the generative power of diffusion models to conceal secret messages without requiring pre-existing cover images. In this paper, we identify a fundamental trade-off between stego image quality, steganographic security, and extraction reliability within the DM-GIS framework. Drawing on this insight, we propose \textbf{PA-B2G}, a \textbf{P}rovable and \textbf{A}djustable \textbf{B}it-to-\textbf{G}aussian mapping. Theoretically, PA-B2G guarantees the reversible encoding of arbitrary-length bit sequences into pure Gaussian noise; practically, it enables fine-grained control over the balance between image fidelity, security, and extraction accuracy. By integrating PA-B2G with probability-flow ordinary differential equations (PF-ODEs), we establish a theoretically invertible mapping between secret bitstreams and stego images. PA-B2G is model-agnostic and can be seamlessly integrated into mainstream diffusion models without additional training or fine-tuning, making it also suitable for diffusion model watermarking. Extensive experiments validate our theoretical analysis of the inherent DM-GIS trade-offs and demonstrate that our method flexibly supports arbitrary payloads while achieving competitive image quality and security. Furthermore, our method exhibits strong resilience to lossy processing in watermarking applications, highlighting its practical utility.

#27 AttackSeqBench: Benchmarking the Capabilities of LLMs for Attack Sequences Understanding

著者: Haokai Ma, Javier Yong, Yunshan Ma, Kuei Chen, Anis Yusof, Zhenkai Liang, Ee-Chien Chang

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2503.03170

要約:
Cyber Threat Intelligence (CTI) reports document observations of cyber threats, synthesizing evidence about adversaries' actions and intent into actionable knowledge that informs detection, response, and defense planning. However, the unstructured and verbose nature of CTI reports poses significant challenges for security practitioners to manually extract and analyze such sequences. Although large language models (LLMs) exhibit promise in cybersecurity tasks such as entity extraction and knowledge graph construction, their understanding and reasoning capabilities towards behavioral sequences remains underexplored. To address this, we introduce AttackSeqBench, a benchmark designed to systematically evaluate LLMs' reasoning abilities across the tactical, technical, and procedural dimensions of adversarial behaviors, while satisfying Extensibility, Reasoning Scalability, and Domain-dpecific Epistemic Expandability. We further benchmark 7 LLMs, 5 LRMs and 4 post-training strategies across 3 benchmark settings and 3 benchmark tasks within our AttackSeqBench to identify their advantages and limitations in such specific domain. Our findings contribute to a deeper understanding of LLM-driven CTI report understanding and foster its application in cybersecurity operations. Our code of benchmark construction and evaluation and the corresponding dataset are available at: https://github.com/hulkima/AttackSeqBench.

#28 Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models

privacy

著者: Weidi Luo, Tianyu Lu, Qiming Zhang, Xiaogeng Liu, Bin Hu, Yue Zhao, Jieyu Zhao, Song Gao, Patrick McDaniel, Zhen Xiang, Chaowei Xiao

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2504.19373

要約:
Recent advances in multi-modal large reasoning models (MLRMs) have shown significant ability to interpret complex visual content. While these models enable impressive reasoning capabilities, they also introduce novel and underexplored privacy risks. In this paper, we identify a novel category of privacy leakage in MLRMs: Adversaries can infer sensitive geolocation information, such as a user's home address or neighborhood, from user-generated images, including selfies captured in private settings. To formalize and evaluate these risks, we propose a three-level visual privacy risk framework that categorizes image content based on contextual sensitivity and potential for location inference. We further introduce DoxBench, a curated dataset of 500 real-world images reflecting diverse privacy scenarios. Our evaluation across 11 advanced MLRMs and MLLMs demonstrates that these models consistently outperform non-expert humans in geolocation inference and can effectively leak location-related private information. This significantly lowers the barrier for adversaries to obtain users' sensitive geolocation information. We further analyze and identify two primary factors contributing to this vulnerability: (1) MLRMs exhibit strong reasoning capabilities by leveraging visual clues in combination with their internal world knowledge; and (2) MLRMs frequently rely on privacy-related visual clues for inference without any built-in mechanisms to suppress or avoid such usage. To better understand and demonstrate real-world attack feasibility, we propose GeoMiner, a collaborative attack framework that decomposes the prediction process into two stages: clue extraction and reasoning to improve geolocation performance while introducing a novel attack perspective. Our findings highlight the urgent need to reassess inference-time privacy risks in MLRMs to better protect users' sensitive information.

#29 Maris: A Formally Verifiable Privacy Policy Enforcement Paradigm for Multi-Agent Collaboration Systems

privacyagent

著者: Jian Cui, Zichuan Li, Luyi Xing, Xiaojing Liao

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.04799

要約:
Multi-agent collaboration systems (MACS), powered by large language models (LLMs), solve complex problems efficiently by leveraging each agent's specialization and communication between agents. However, the inherent exchange of information between agents and their interaction with external environments, such as LLM, tools, and users, inevitably introduces significant risks of sensitive data leakage, including vulnerabilities to attacks such as eavesdropping and prompt injection. Existing MACS lack fine-grained data protection controls, making it challenging to manage sensitive information securely. In this paper, we take the first step to mitigate the MACS's data leakage threat through a privacy-enhanced MACS development paradigm, Maris. Maris enables rigorous message flow control within MACS by embedding reference monitors into key multi-agent conversation components. We implemented Maris as an integral part of widely-adopted open-source multi-agent development frameworks, AutoGen and LangChain. To evaluate its effectiveness, we develop a Privacy Assessment Framework that emulates MACS under different threat scenarios. Our evaluation shows that Maris effectively mitigated sensitive data leakage threats across three different task suites while maintaining a high task success rate.

#30 LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities

著者: Kalyan Nakka, Jimmy Dani, Ausmit Mondal, Nitesh Saxena

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.05619

要約:
The growing adoption of Large Language Models (LLMs) has influenced the development of Small Language Models (SLMs) for on-device deployment across smartphones and edge devices, offering enhanced privacy, reduced latency, server-free functionality, and improved user experience. However, due to on-device resource constraints, SLMs undergo size optimization through compression techniques like quantization, which inadvertently introduce fairness, ethical and privacy risks. Critically, quantized SLMs may respond to harmful queries directly, without requiring adversarial manipulation, raising significant safety and trust concerns. To address this, we propose LiteLMGuard, an on-device guardrail that provides real-time, prompt-level defense for quantized SLMs. Additionally, our guardrail is designed to be model-agnostic such that it can be seamlessly integrated with any SLM, operating independently of underlying architectures. Our LiteLMGuard formalizes deep learning (DL)-based prompt filtering by leveraging semantic understanding to classify prompt answerability for SLMs. Built on our curated Answerable-or-Not dataset, LiteLMGuard employs ELECTRA as the candidate model with 97.75% answerability classification accuracy. The on-device deployment of LiteLMGuard enabled real-time offline filtering with over 85% defense-rate against harmful prompts (including jailbreak attacks), 94% filtering accuracy and ~135 ms average latency. These results demonstrate LiteLMGuard as a lightweight robust defense mechanism for effectively and efficiently securing on-device SLMs against Open Knowledge Attacks.

#31 Watermarking Without Standards Is Not AI Governance

intellectual property

著者: Alexander Nemecek, Yuzhou Jiang, Erman Ayday

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.23814

要約:
Watermarking has emerged as a leading technical proposal for attributing generative AI content and is increasingly cited in global governance frameworks. This position paper argues that current implementations risk serving as symbolic compliance rather than delivering effective oversight. We identify a growing gap between regulatory expectations and the technical limitations of existing watermarking schemes. Through analysis of policy proposals and industry practices, we show how incentive structures disincentivize robust, auditable deployments. To realign watermarking with governance goals, we propose a three-layer framework encompassing technical standards, audit infrastructure, and enforcement mechanisms. Without enforceable requirements and independent verification, watermarking will remain inadequate for accountability and ultimately undermine broader efforts in AI safety and regulation.

#32 BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream Camouflage

著者: Kalyan Nakka, Nitesh Saxena

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2506.02479

要約:
The inherent risk of generating harmful and unsafe content by Large Language Models (LLMs), has highlighted the need for their safety alignment. Various techniques like supervised fine-tuning, reinforcement learning from human feedback, and red-teaming were developed for ensuring the safety alignment of LLMs. However, the robustness of these aligned LLMs is always challenged by adversarial attacks that exploit unexplored and underlying vulnerabilities of the safety alignment. In this paper, we develop a novel black-box jailbreak attack, called BitBypass, that leverages hyphen-separated bitstream camouflage for jailbreaking aligned LLMs. This represents a new direction in jailbreaking by exploiting fundamental information representation of data as continuous bits, rather than leveraging prompt engineering or adversarial manipulations. Our evaluation of five state-of-the-art LLMs, namely GPT-4o, Gemini 1.5, Claude 3.5, Llama 3.1, and Mixtral, in adversarial perspective, revealed the capabilities of BitBypass in bypassing their safety alignment and tricking them into generating harmful and unsafe content. Further, we observed that BitBypass outperforms several state-of-the-art jailbreak attacks in terms of stealthiness and attack success. Overall, these results highlights the effectiveness and efficiency of BitBypass in jailbreaking these state-of-the-art LLMs.

#33 Mapping Quantum Threats: An Engineering Inventory of Cryptographic Dependencies

著者: Carlos Benitez

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.24623

要約:
The prospective emergence of large-scale quantum computers capable of executing Shor's algorithm at cryptographically relevant scale would render widely deployed public-key cryptography computationally insecure. Under this threat model, both confidentiality of previously protected data and the authenticity of digital signatures could be compromised across multiple layers of digital infrastructure. This paper presents a systematic engineering inventory of technologies that depend on quantum-vulnerable asymmetric cryptography. The analysis is structured along two complementary axes (technology domain and operational environment) linking cryptographic primitives to their real-world deployment contexts. The resulting framework provides a structured basis for identifying systemic exposure to quantum-related risks across contemporary digital ecosystems.

#34 Every Language Model Has a Forgery-Resistant Signature

著者: Matthew Finlayson, Xiang Ren, Swabha Swayamdipta

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.14086

要約:
The ubiquity of closed-weight language models with public-facing APIs has generated interest in forensic methods, both for extracting hidden model details (e.g., parameters) and for identifying models by their outputs. One successful approach to these goals has been to exploit the geometric constraints imposed by the language model architecture and parameters. In this work, we show that a lesser-known geometric constraint -- namely, that language model outputs lie on the surface of a high-dimensional ellipse -- functions as a signature for the model and can be used to identify the source model of a given output. This ellipse signature has unique properties that distinguish it from existing model-output association methods like language model fingerprints. In particular, the signature is hard to forge: without direct access to model parameters, it is practically infeasible to produce log-probabilities (logprobs) on the ellipse using currently known methods. Secondly, the signature is naturally occurring, since all language models have these elliptical constraints. Thirdly, the signature is self-contained, in that it is detectable without access to the model inputs or the full weights. Finally, the signature is compact and redundant, as it is independently detectable in each logprob output from the model. We evaluate a novel technique for extracting the ellipse from small models and discuss the practical hurdles that make it infeasible for production-scale models. Finally, we use ellipse signatures to propose a protocol for language model output verification, analogous to cryptographic symmetric-key message authentication systems.

#35 Secure Sparse Matrix Multiplications and their Applications to Privacy-Preserving Machine Learning

privacy

著者: Marc Damie, Florian Hahn, Andreas Peter, Jan Ramon

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.14894

要約:
To preserve data privacy, multi-party computation (MPC) enables executing Machine Learning (ML) algorithms on private data. However, MPC frameworks do not include optimized operations on sparse data. This absence makes them unsuitable for ML applications involving sparse data; e.g., recommender systems or genomics. Even in plaintext, such applications involve high-dimensional sparse data, that cannot be processed without sparsity-related optimizations due to prohibitively large memory requirements. Since matrix multiplication is a central building block of ML algorithms, our work proposes dedicated MPC algorithms to multiply secret-shared sparse matrices. Our sparse algorithms have several advantages over secure dense matrix multiplications (i.e., the classic multiplication). On the one hand, they avoid the memory issues caused by the "dense" data representation of dense multiplications. On the other hand, our algorithms can significantly reduce communication costs (up to $\times1000$) for realistic problem sizes. We validate our algorithms in two machine learning applications where dense matrix multiplications are impractical. Finally, we take inspiration from real-world sparse data properties to build 3 techniques minimizing the public knowledge necessary to secure sparse algorithms.

#36 Zero-Permission Manipulation: Can We Trust Large Multimodal Model Powered GUI Agents?

agent

著者: Yi Qian, Kunwei Qian, Xingbang He, Ligeng Chen, Jikang Zhang, Tiantai Zhang, Haiyang Wei, Linzhang Wang, Hao Wu, Bing Mao

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.12349

要約:
Large multimodal model powered GUI agents are emerging as high-privilege operators on mobile platforms, entrusted with perceiving screen content and injecting inputs. However, their design operates under the implicit assumption of Visual Atomicity: that the UI state remains invariant between observation and action. We demonstrate that this assumption is fundamentally invalid in Android, creating a critical attack surface. We present Action Rebinding, a novel attack that allows a seemingly-benign app with zero dangerous permissions to rebind an agent's execution. By exploiting the inevitable observation-to-action gap inherent in the agent's reasoning pipeline, the attacker triggers foreground transitions to rebind the agent's planned action toward the target app. We weaponize the agent's task-recovery logic and Android's UI state preservation to orchestrate programmable, multi-step attack chains. Furthermore, we introduce an Intent Alignment Strategy (IAS) that manipulates the agent's reasoning process to rationalize UI states, enabling it to bypass verification gates (e.g., confirmation dialogs) that would otherwise be rejected. We evaluate Action Rebinding Attacks on six widely-used Android GUI agents across 15 tasks. Our results demonstrate a 100% success rate for atomic action rebinding and the ability to reliably orchestrate multi-step attack chains. With IAS, the success rate in bypassing verification gates increases (from 0% to up to 100%). Notably, the attacker application requires no sensitive permissions and contains no privileged API calls, achieving a 0% detection rate across malware scanners (e.g., VirusTotal). Our findings reveal a fundamental architectural flaw in current agent-OS integration and provide critical insights for the secure design of future agent systems. To access experimental logs and demonstration videos, please contact yi_qian@smail.nju.edu.cn.

#37 Multimodal Multi-Agent Ransomware Analysis Using AutoGen

agent

著者: Asifullah Khan, Aimen Wadood, Mubashar Iqbal, Umme Zahoora

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.20346

要約:
Ransomware has become one of the most serious cybersecurity threats causing major financial losses and operational disruptions worldwide.Traditional detection methods such as static analysis, heuristic scanning and behavioral analysis often fall short when used alone. To address these limitations, this paper presents multimodal multi agent ransomware analysis framework designed for ransomware classification. Proposed multimodal multiagent architecture combines information from static, dynamic and network sources. Each data type is handled by specialized agent that uses auto encoder based feature extraction. These representations are then integrated through a fusion agent. After that fused representation are used by transformer based classifier. It identifies the specific ransomware family. The agents interact through an interagent feedback mechanism that iteratively refines feature representations by suppressing low confidence information. The framework was evaluated on large scale datasets containing thousands of ransomware and benign samples. Multiple experiments were conducted on ransomware dataset. It outperforms single modality and nonadaptive fusion baseline achieving improvement of up to 0.936 in Macro-F1 for family classification and reducing calibration error. Over 100 epochs, the agentic feedback loop displays a stable monotonic convergence leading to over +0.75 absolute improvement in terms of agent quality and a final composite score of around 0.88 without fine tuning of the language models. Zeroday ransomware detection remains family dependent on polymorphism and modality disruptions. Confidence aware abstention enables reliable real world deployment by favoring conservativeand trustworthy decisions over forced classification. The findings indicate that proposed approach provides a practical andeffective path toward improving real world ransomware defense systems.

#38 zkCraft: Prompt-Guided LLM as a Zero-Shot Mutation Pattern Oracle for TCCT-Powered ZK Fuzzing

著者: Rong Fu, Jia Yee Tan, Youjin Wang, Ziyu Kong, Zeli Su, Zhaolu Kang, Shuning Zhang, Xianda Li, Kun Liu, Simon Fong

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2602.00667

要約:
Zero-knowledge circuits enable privacy-preserving and scalable systems but are difficult to implement correctly due to the tight coupling between witness computation and circuit constraints. We present zkCraft, a practical framework that combines deterministic, R1CS-aware localization with proof-bearing search to detect semantic inconsistencies. zkCraft encodes candidate constraint edits into a single Row-Vortex polynomial and replaces repeated solver queries with a Violation IOP that certifies the existence of edits together with a succinct proof. Deterministic LLM-driven mutation templates bias exploration toward edge cases while preserving auditable algebraic verification. Evaluation on real Circom code shows that proof-bearing localization detects diverse under- and over-constrained faults with low false positives and reduces costly solver interaction. Our approach bridges formal verification and automated debugging, offering a scalable path for robust ZK circuit development.

#39 Semantic-level Backdoor Attack against Text-to-Image Diffusion Models

backdoordiffusion

著者: Tianxin Chen, Wenbo Jiang, Hongqiao Chen, Zhirun Zheng, Cheng Huang

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2602.04898

要約:
Text-to-image (T2I) diffusion models are widely adopted for their strong generative capabilities, yet remain vulnerable to backdoor attacks. Existing attacks typically rely on fixed textual triggers and single-entity backdoor targets, making them highly susceptible to enumeration-based input defenses and attention-consistency detection. In this work, we propose Semantic-level Backdoor Attack (SemBD), which implants backdoors at the representation level by defining triggers as continuous semantic regions rather than discrete textual patterns. Concretely, SemBD injects semantic backdoors by distillation-based editing of the key and value projection matrices in cross-attention layers, enabling diverse prompts with identical semantic compositions to reliably activate the backdoor attack. To further enhance stealthiness, SemBD incorporates a semantic regularization to prevent unintended activation under incomplete semantics, as well as multi-entity backdoor targets that avoid highly consistent cross-attention patterns. Extensive experiments demonstrate that SemBD achieves a 100% attack success rate while maintaining strong robustness against state-of-the-art input-level defenses.

#40 Deep Learning for Contextualized NetFlow-Based Network Intrusion Detection: Methods, Data, Evaluation and Deployment

著者: Abdelkader El Mahdaouy, Issam Ait Yahia, Soufiane Oualil, Ismail Berrada

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2602.05594

要約:
Network Intrusion Detection Systems (NIDS) have progressively shifted from signature-based techniques toward machine learning and, more recently, deep learning methods. Meanwhile, the widespread adoption of encryption has reduced payload visibility, weakening inspection pipelines that depend on plaintext content and increasing reliance on flow-level telemetry such as NetFlow and IPFIX. Many current learning-based detectors still frame intrusion detection as per-flow classification, implicitly treating each flow record as an independent sample. This assumption is often violated in realistic attack campaigns, where evidence is distributed across multiple flows and hosts, spanning minutes to days through staged execution, beaconing, lateral movement, and exfiltration. This paper synthesizes recent research on context-aware deep learning for flow-based intrusion detection. We organize existing methods into a four-dimensional taxonomy covering temporal context, graph or relational context, multimodal context, and multi-resolution context. Beyond modeling, we emphasize rigorous evaluation and operational realism. We review common failure modes that can inflate reported results, including temporal leakage, data splitting, dataset design flaws, limited dataset diversity, and weak cross-dataset generalization. We also analyze practical constraints that shape deployability, such as streaming state management, memory growth, latency budgets, and model compression choices. Overall, the literature suggests that context can meaningfully improve detection when attacks induce measurable temporal or relational structure, but the magnitude and reliability of these gains depend strongly on rigorous, causal evaluation and on datasets that capture realistic diversity.

#41 Few-shot Model Extraction Attacks against Sequential Recommender Systems

model extraction

著者: Hui Zhang, Fu Liu

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2411.11677

要約:
Among adversarial attacks against sequential recommender systems, model extraction attacks represent a method to attack sequential recommendation models without prior knowledge. Existing research has primarily concentrated on the adversary's execution of black-box attacks through data-free model extraction. However, a significant gap remains in the literature concerning the development of surrogate models by adversaries with access to few-shot raw data (10\% even less). That is, the challenge of how to construct a surrogate model with high functional similarity within the context of few-shot data scenarios remains an issue that requires resolution.This study addresses this gap by introducing a novel few-shot model extraction framework against sequential recommenders, which is designed to construct a superior surrogate model with the utilization of few-shot data. The proposed few-shot model extraction framework is comprised of two components: an autoregressive augmentation generation strategy and a bidirectional repair loss-facilitated model distillation procedure. Specifically, to generate synthetic data that closely approximate the distribution of raw data, autoregressive augmentation generation strategy integrates a probabilistic interaction sampler to extract inherent dependencies and a synthesis determinant signal module to characterize user behavioral patterns. Subsequently, bidirectional repair loss, which target the discrepancies between the recommendation lists, is designed as auxiliary loss to rectify erroneous predictions from surrogate models, transferring knowledge from the victim model to the surrogate model effectively. Experiments on three datasets show that the proposed few-shot model extraction framework yields superior surrogate models.

#42 Privacy Risk Predictions Based on Fundamental Understanding of Personal Data and an Evolving Threat Landscape

privacy

著者: Haoran Niu, K. Suzanne Barber

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2508.04542

要約:
It is difficult for individuals and organizations to protect personal information without a fundamental understanding of relative privacy risks. By analyzing over 5,000 empirical identity theft and fraud cases, this research identifies which types of personal data are exposed, how frequently such exposures occur, and what the consequences of those exposures are. We construct an Identity Ecosystem graph - a foundational, graph-based model in which nodes represent personally identifiable information (PII) attributes and edges represent empirical disclosure relationships between them (e.g., one PII attribute is exposed due to the exposure of another). Leveraging this graph structure, we develop a privacy risk prediction framework that uses graph theory and graph neural networks to estimate the likelihood of further disclosures when certain PII attributes are compromised. The results show that our approach effectively addresses the core question: Can the disclosure of a given identity attribute possibly lead to the disclosure of another attribute? The code for the privacy risk prediction framework is available at: https://github.com/niu-haoran/Privacy-Risk-Predictions-and-UTCID-Identity-Ecosystem.git.

#43 WARP: Weight Teleportation for Attack-Resilient Unlearning Protocols

privacy

著者: Mohammad M Maheri, Xavier Cadet, Peter Chin, Hamed Haddadi

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.00272

要約:
Approximate machine unlearning aims to efficiently remove the influence of specific data points from a trained model, offering a practical alternative to full retraining. However, it introduces privacy risks: an adversary with access to pre- and post-unlearning models can exploit their differences for membership inference or data reconstruction. We show these vulnerabilities arise from two factors: large gradient norms of forget-set samples and the close proximity of unlearned parameters to the original model. To demonstrate their severity, we propose unlearning-specific membership inference and reconstruction attacks, showing that several state-of-the-art methods (e.g., NGP, SCRUB) remain vulnerable. To mitigate this leakage, we introduce WARP, a plug-and-play teleportation defense that leverages neural network symmetries to reduce forget-set gradient energy and increase parameter dispersion while preserving predictions. This reparameterization obfuscates the signal of forgotten data, making it harder for attackers to distinguish forgotten samples from non-members or recover them via reconstruction. Across six unlearning algorithms, our approach achieves consistent privacy gains, reducing adversarial advantage (AUC) by up to 64% in black-box and 92% in white-box settings, while maintaining accuracy on retained data. These results highlight teleportation as a general tool for reducing attack success in approximate unlearning.

#44 Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing

agent

著者: Justin W. Lin, Eliot Krzysztof Jones, Donovan Julian Jasper, Ethan Jun-shen Ho, Anna Wu, Arnold Tianyi Yang, Neil Perry, Andy Zou, Matt Fredrikson, J. Zico Kolter, Percy Liang, Dan Boneh, Daniel E. Ho

公開日: Wed, 04 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.09882

要約:
We present the first comprehensive evaluation of AI agents against human cybersecurity professionals in a live enterprise environment. We evaluate ten cybersecurity professionals alongside six existing AI agents and ARTEMIS, our new agent scaffold, on a large university network consisting of ~8,000 hosts across 12 subnets. ARTEMIS is a multi-agent framework featuring dynamic prompt generation, arbitrary sub-agents, and automatic vulnerability triaging. In our comparative study, ARTEMIS placed second overall, discovering 9 valid vulnerabilities with an 82% valid submission rate and outperforming 9 of 10 human participants. While existing scaffolds such as Codex and CyAgent underperformed relative to most human participants, ARTEMIS demonstrated technical sophistication and submission quality comparable to the strongest participants. We observe that AI agents offer advantages in systematic enumeration, parallel exploitation, and cost -- certain ARTEMIS variants cost $18/hour versus $60/hour for professional penetration testers. We also identify key capability gaps: AI agents exhibit higher false-positive rates and struggle with GUI-based tasks.

cs.CR updates on arXiv.org

📋 論文タイトル一覧