arXiv論文一覧 - cs.CR updates on arXiv.org

#1 Beyond Static Sandboxing: Learned Capability Governance for Autonomous AI Agents

agent

著者: Bronislav Sidik, Lior Rokach

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.11839

要約:
Autonomous AI agents built on open-source runtimes such as OpenClaw expose every available tool to every session by default, regardless of the task. A summarization task receives the same shell execution, subagent spawning, and credential access capabilities as a code deployment task, a 15x overprovision ratio that we call the capability overprovisioning problem. Existing defenses, including the NemoClaw container sandbox and the Cisco DefenseClaw skill scanner, address containment and threat detection but do not learn the minimum viable capability set for each task type. We present Aethelgard, a four layer adaptive governance framework that enforces least privilege for AI agents through a learned policy. Layer 1, the Capability Governor, dynamically scopes which tools the agent is aware of in each session. Layer 3, the Safety Router, intercepts tool calls before execution using a hybrid rule based and fine tuned classifier. Layer 2, the RL Learning Policy, trains a PPO policy on the accumulated audit log to learn the minimum viable skill set for each task type.

#2 Evaluating Lightweight Block Cipher Payload Encryption for Real-Time CAN Traffic

著者: Kevin Setterstrom, Jeremy Straub

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.11853

要約:
This study evaluates the feasibility of integrating lightweight block cipher payload encryption into a real-time embedded controller area network (CAN) node using a QT PY ESP32-S2 microcontroller. This work seeks to determine whether the use of a block cipher can prevent semantic taxonomy-based reverse engineering, which infers signal meaning from unencrypted CAN traffic using observation and statistical analysis. CAN payloads are encrypted using a lightweight block cipher and evaluated through experiments that measure timing impact, payload pattern observability, and correlation-based inference. Results indicate that encryption masks constant values and predictable signal patterns while preserving a 100 Hz transmission schedule. These findings suggest that lightweight payload encryption can reduce passive, observation based inference of CAN signal semantics on resource-constrained hardware with limited timing overhead impact.

#3 SIR-Bench: Evaluating Investigation Depth in Security Incident Response Agents

agent

著者: Daniel Begimher, Cristian Leo, Jack Huang, Pat Gaw, Bonan Zheng

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12040

要約:
We present SIR-Bench, a benchmark of 794 test cases for evaluating autonomous security incident response agents that distinguishes genuine forensic investigation from alert parroting. Derived from 129 anonymized incident patterns with expert-validated ground truth, SIR-Bench measures not only whether agents reach correct triage decisions, but whether they discover novel evidence through active investigation. To construct SIR-Bench, we develop Once Upon A Threat (OUAT), a framework that replays real incident patterns in controlled cloud environments, producing authentic telemetry with measurable investigation outcomes. Our evaluation methodology introduces three complementary metrics: triage accuracy (M1), novel finding discovery (M2), and tool usage appropriateness (M3), assessed through an adversarial LLM-as-Judge that inverts the burden of proof -- requiring concrete forensic evidence to credit investigations. Evaluating our SIR agent on the benchmark demonstrates 97.1% true positive (TP) detection, 73.4% false positive (FP) rejection, and 5.67 novel key findings per case, establishing a baseline against which future investigation agents can be measured.

#4 Can we Watermark Low-Entropy LLM Outputs?

intellectual property

著者: Noam Mazor, Andrew Morgan, Rafael Pass

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12051

要約:
A recent and exciting thread of work focuses on developing methods for watermarking the output of large language models (LLMs). We focus on provably undetectable watermarking-that is, schemes that do not alter the output distribution of the LLM, yet enable embedding a watermark in the output that identifies the output as having been generated by the particular LLM. Furthermore, the watermark should be hard to remove by an adversary that may potentially edit, insert, or delete tokens from the watermarked output. Indeed, recent work (Christ et al. [COLT'24], Christ et al. [CRYPTO'24], Golowich et al. [NeuroIPS'24]) shows how to develop such schemes that are robust against a constant fraction of substitutions, or even against a constant fraction of arbitrary edits. These works, however, make strong assumptions on the entropy present in the output of the LLM. Most notably, they all require constant entropy rate-that is, a constant fraction of the tokens in a sufficiently long substring of the output need to have empirical entropy at least O(log |T|), where T is the alphabet of tokens, and Golowich et al. additionally require T to be larger than the security parameter. In this work, we consider whether we can also watermark the outputs of LLMs when the per-token entropy is just a constant, discarding the dependence on the alphabet size or security parameter. In this regime, we construct: - A watermarking scheme robust against random substitutions (assuming subexponential LPN, as in Christ et al. [CRYPTO'24]) - A watermarking scheme robust against random substitutions and random deletions, given either the additional heuristic assumption that the output of the LLM only introduces random errors (analogous to the assumption made by Christ et al. [CRYPTO'24]) or a construction of a pseudorandom error-correcting code robust to adversarial substitutions and random deletions.

#5 LLM-Redactor: An Empirical Evaluation of Eight Techniques for Privacy-Preserving LLM Requests

privacy

著者: Justice Owusu Agyemang, Jerry John Kponyo, Elliot Amponsah, Godfred Manu Addo Boakye, Kwame Opuni-Boachie Obour Agyekum

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12064

要約:
Coding agents and LLM-powered applications routinely send potentially sensitive content to cloud LLM APIs where it may be logged, retained, used for training, or subpoenaed. Existing privacy tooling focuses on network-level encryption and organization-level DLP, neither of which addresses the content of prompts themselves. We present a systematic empirical evaluation of eight techniques for privacy-preserving LLM requests: (A) local-only inference, (B) redaction with placeholder restoration, (C) semantic rephrasing, (D) Trusted Execution Environment hosted inference, (E) split inference, (F) fully homomorphic encryption, (G) secret sharing via multi-party computation, and (H) differential-privacy noise. We implement all eight (or a tractable research-stage subset where deployment is not yet feasible) in an open-source shim compatible with MCP and any OpenAI-compatible API. We evaluate the four practical options (A, B, C, H) and their combinations across four workload classes using a ground-truth-labelled leak benchmark of 1,300 samples with 4,014 annotations. Our headline finding is that no single technique dominates: the combination A+B+C (route locally when possible, redact and rephrase the rest) achieves 0.6% combined leak on PII and 31.3% on proprietary code, with zero exact leaks on PII across 500 samples. We present a decision rule that selects the appropriate option(s) from a threat-model budget and workload characterisation. Code, benchmarks, and evaluation harness are released at https://github.com/jayluxferro/llm-redactor.

#6 Fully Homomorphic Encryption on Llama 3 model for privacy preserving LLM inference

privacy

著者: Anes Abdennebi, Nadjia Kara, Laaziz Lahlou

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12168

要約:
The applications of Generative Artificial Intelligence (GenAI) and their intersections with data-driven fields, such as healthcare, finance, transportation, and information security, have led to significant improvements in service efficiency and low latency. However, this synergy raises serious concerns regarding the security of large language models (LLMs) and their potential impact on the privacy of companies and users' data. Many technology companies that incorporate LLMs in their services with a certain level of command and control bear a risk of data exposure and secret divulgence caused by insecure LLM pipelines, making them vulnerable to multiple attacks such as data poisoning, prompt injection, and model theft. Although several security techniques (input/output sanitization, decentralized learning, access control management, and encryption) were implemented to reduce this risk, there is still an imminent risk of quantum computing attacks, which are expected to break existing encryption algorithms, hence, retrieving secret keys, encrypted sensitive data, and decrypting encrypted models. In this extensive work, we integrate the Post-Quantum Cryptography (PQC) based Lattice-based Homomorphic Encryption (HE) main functions in the LLM's inference pipeline to secure some of its layers against data privacy attacks. We modify the inference pipeline of the transformer architecture for the LLAMA-3 model while injecting the main homomorphic encryption operations provided by the concrete-ml library. We demonstrate high text generation accuracies (up to 98%) with reasonable latencies (237 ms) on an i9 CPU, reaching up to 80 tokens per second, which proves the feasibility and validity of our work while running a FHE-secured LLAMA-3 inference model. Further experiments and analysis are discussed to justify models' text generation latencies and behaviours.

#7 COBALT-TLA: A Neuro-Symbolic Verification Loop for Cross-Chain Bridge Vulnerability Discovery

著者: Dominik Blain

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12172

要約:
We present COBALT-TLA, a neuro-symbolic verification loop that pairs an LLM with TLC, the TLA+ model checker, in an automated REPL. The LLM generates bounded TLA+ specifications; TLC acts as a semantic oracle; structured error traces are parsed and injected back into the model's context to drive convergence. We evaluate the system against three cross-chain bridge targets, including a faithful model of the Nomad $190M exploit. COBALT-TLA reaches a verified BUG_FOUND state in at most 2 iterations on all targets, with TLC execution consistently below 0.30 seconds. Notably, the system autonomously discovers an unprompted vulnerability class -- the Optimistic Relay Attack -- not present in the human-written baseline specification. We argue that deterministic prover feedback is sufficient to neutralize LLM hallucination in formal methods, transforming zero-shot code generation into a convergent proof-finding strategy.

#8 Mitigating S-RAHA: An On-device Framework to Prevent Forwarding of Re-Captured Images

著者: Keshav Sood, Iynkaran Natgunanathan, Purathani Praitheeshan, Praitheeshan Kirupananthan

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12178

要約:
Protecting sensitive visual content from unauthorized redistribution is a growing challenge for privacy focused mobile applications, including dating platforms. Screenshot prevention mechanisms, rely on server side monitoring or are limited to digital screenshot detection, are commonly deployed to stop forwarding sensitive images. However, an adversary uses another smartphone to take a photo of the mobile screen, in this scenario the existing solutions offer no protection against psychically screen recapture attacks. Since the attack happens in the physical plane rather than on a digital plane and shows a void or hole in the existing solutions, we name this the Screen Recaptured Analog Hole Attack (S RAHA). Such physically recaptured images bypass digital safeguards and can be freely forwarded, creating substantial privacy, personal safety, and forensic risks. We present a low computational secure by design on device framework that aims to detect and prevent the forwarding of recaptured images directly to the users device. The proposed system integrates a deep learning assisted recapture detection model capable of distinguishing original digital content from camera to screen captures under diverse environmental conditions, together with an on device enforcement mechanism that automatically blocks the sharing of suspected recaptured images between applications. We also introduce the concept of an invisible metadata identifier (IMI) that can be embedded into protected images to enable forensic traceability of potential leakage paths. Although the IMI component is explored at a conceptual and feasibility level rather than fully implemented, it demonstrates a promising direction for integrating lightweight, invisible identifiers into client side security architectures.

#9 TimeMark: A Trustworthy Time Watermarking Framework for Exact Generation-Time Recovery from AIGC

intellectual property

著者: Shangkun Che, Silin Du, Ge Gao

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12216

要約:
The widespread use of Large Language Models (LLMs) in text generation has raised increasing concerns about intellectual property disputes. Watermarking techniques, which embed meta information into AI-generated content (AIGC), have the potential to serve as judicial evidence. However, existing methods rely on statistical signals in token distributions, leading to inherently probabilistic detection and reduced reliability, especially in multi-bit encoding (e.g., timestamps). Moreover, such methods introduce detectable statistical patterns, making them vulnerable to forgery attacks and enabling model providers to fabricate arbitrary watermarks. To address these issues, we propose the concept of trustworthy watermark, which achieves reliable recovery with 100% identification accuracy while resisting both user-side statistical attacks and provider-side forgery. We focus on trustworthy time watermarking for use as judicial evidence. Our framework integrates cryptographic techniques and encodes time information into time-dependent secret keys under regulatory supervision, preventing arbitrary timestamp fabrication. The watermark payload is decoupled from time and generated as a random, non-stored bit sequence for each instance, eliminating statistical patterns. To ensure verifiability, we design a two-stage encoding mechanism, which, combined with error-correcting codes, enables reliable recovery of generation time with theoretically perfect accuracy. Both theoretical analysis and experiments demonstrate that our framework satisfies the reliability requirements for judicial evidence and offers a practical solution for future AIGC-related intellectual property disputes.

#10 From IOCs to Regex: Automating CTI Operationalization for SOC with LLMs

著者: Pei-Yu Tseng (The Pennsylvania State University, USA), Lan Zhang (Northern Arizona University, USA), ZihDwo Yeh (The Pennsylvania State University, USA), Xiaoyan Sun (Worcester Polytechnic Institute, USA), Xushu Dai (The Pennsylvania State University, USA), Peng Liu (The Pennsylvania State University, USA)

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12228

要約:
Cyber Threat Intelligence (CTI) reports contain Indicators of Compromise (IOCs) that are critical for security operations. To operationalize these IOCs across heterogeneous logs, analysts often convert them into regular expressions (regexes) for tasks such as digital forensics, log parsing, and SIEM rule creation. However, regex construction is still largely manual, requiring analysts to extract IOCs from CTI reports and transform them into syntactically valid and semantically precise patterns. This process is slow, error-prone, and increasingly impractical as CTI volumes grow. Although recent studies have applied Large Language Models (LLMs) to IOC extraction, they typically output plain strings rather than regexes, limiting practical deployment. Plain IOCs cannot effectively capture variations in system context, log format, or attacker behavior. To address this gap, we propose IOCRegex-gen, a fully automated LLM-based regex generation system that converts IOCs into regexes. The system introduces two key innovations: (i) a group-aware mechanism that identifies which IOC segments should be represented as capture or non-capture groups, and (ii) an iterative reasoning and multi-stage validation pipeline to ensure syntactic validity and semantic correctness. Experiments on over 3,000 real CTI reports and 2,400 ground-truth strings from the MITRE ATT&CK Evaluation framework show that IOCRegex-gen achieves an average hit rate of 99.1% and a false-positive rate of only 0.8%, demonstrating its effectiveness for large-scale CTI processing and automated regex generation.

#11 TEMPLATEFUZZ: Fine-Grained Chat Template Fuzzing for Jailbreaking and Red Teaming LLMs

著者: Qingchao Shen, Zibo Xiao, Lili Huang, Enwei Hu, Yongqiang Tian, Junjie Chen

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12232

要約:
Large Language Models (LLMs) are increasingly deployed across diverse domains, yet their vulnerability to jailbreak attacks, where adversarial inputs bypass safety mechanisms to elicit harmful outputs, poses significant security risks. While prior work has primarily focused on prompt injection attacks, these approaches often require resource-intensive prompt engineering and overlook other critical components, such as chat templates. This paper introduces TEMPLATEFUZZ, a fine-grained fuzzing framework that systematically exposes vulnerabilities in chat templates, a critical yet underexplored attack surface in LLMs. Specifically, TEMPLATEFUZZ (1) designs a series of element-level mutation rules to generate diverse chat template variants, (2) proposes a heuristic search strategy to guide the chat template generation toward the direction of amplifying the attack success rate (ASR) while preserving model accuracy, and (3) integrates an active learning-based strategy to derive a lightweight rule-based oracle for accurate and efficient jailbreak evaluation. Evaluated on twelve open-source LLMs across multiple attack scenarios, TEMPLATEFUZZ achieves an average ASR of 98.2% with only 1.1% accuracy degradation, outperforming state-of-the-art methods by 9.1%-47.9% in ASR and 8.4% in accuracy degradation. Moreover, even on five industry-leading commercial LLMs where chat templates cannot be specified, TEMPLATEFUZZ attains a 90% average ASR via chat template-based prompt injection attacks.

#12 SpanKey: Dynamic Key Space Conditioning for Neural Network Access Control

著者: WenBin Yan

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12254

要約:
SpanKey is a lightweight way to gate inference without encrypting weights or chasing leaderboard accuracy on gated inference. The idea is to condition activations on secret keys. A basis matrix $B$ defines a low-dimensional key subspace $Span(B)$; during training we sample coefficients $\alpha$ and form keys $k=\alpha^\top B$, then inject them into intermediate activations with additive or multiplicative maps and strength $\gamma$. Valid keys lie in $Span(B)$; invalid keys are sampled outside that subspace. We make three points. (i) Mechanism: subspace key injection and a multi-layer design space. (ii) Failure mode: key absorption, together with two analytical results (a Beta-energy split and margin-tail diagnostics), explains weak baseline separation in energy and margin terms -- these are not a security theorem. iii) Deny losses and experiments: Modes A--C and extensions, with CIFAR-10 ResNet-18 runs and MNIST ablations for Mode B. We summarize setup and first-order analysis, injectors, absorption, deny losses and ablations, a threat discussion that does not promise cryptography, and closing remarks on scale. Code: \texttt{https://github.com/mindmemory-ai/dksc}

#13 WebAgentGuard: A Reasoning-Driven Guard Model for Detecting Prompt Injection Attacks in Web Agents

agent

著者: Yulin Chen, Tri Cao, Haoran Li, Yue Liu, Yibo Li, Yufei He, Le Minh Khoi, Yangqiu Song, Shuicheng Yan, Bryan Hooi

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12284

要約:
Web agents powered by vision-language models (VLMs) enable autonomous interaction with web environments by perceiving and acting on both visual and textual webpage content to accomplish user-specified tasks. However, they are highly vulnerable to prompt injection attacks, where adversarial instructions embedded in HTML or rendered screenshots can manipulate agent behavior and lead to harmful outcomes such as information leakage. Existing defenses, including system prompt defenses and direct fine-tuning of agents, have shown limited effectiveness. To address this issue, we propose a defense framework in which a web agent operates in parallel with a dedicated guard agent, decoupling prompt injection detection from the agent's own reasoning. Building on this framework, we introduce WebAgentGuard, a reasoning-driven, multimodal guard model for prompt injection detection. We construct a synthetic multimodal dataset using GPT-5 spanning 164 topics and 230 visual and UI design styles, and train the model via reasoning-intensive supervised fine-tuning followed by reinforcement learning. Experiments across multiple benchmarks show that WebAgentGuard consistently outperforms strong baselines while preserving agent utility, without introducing additional latency.

#14 UniDetect: LLM-Driven Universal Fraud Detection across Heterogeneous Blockchains

著者: Shuyi Miao, Wangjie Qiu, Shengda Zhuo, Fei Shen, Dan Lin, Xingtong Yu, Chua Tat-Seng, Zhiming Zheng

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12329

要約:
As cross-chain interoperability advances, decentralized finance (DeFi) protocols enable illicit funds to be reorganized into uniform liquid assets that flow throughout the cryptocurrency market. Such operations can bypass monitoring targeted at individual blockchains and thereby weaken current regulatory frameworks. Motivated by these, we introduce UniDetect, a multi-chain cryptocurrency fraud account detection method based on large language models (LLMs). Specifically, we use domain knowledge to guide the LLM to generate general transaction summary texts applicable to heterogeneous blockchain accounts, which serve as evidence for fraud account detection. Furthermore, we introduce a two-stage alternating training strategy to continuously and dynamically enhance the multimodal joint reasoning for detecting fraudulent accounts based on both the textual evidence and the transaction graph patterns. Experiments on multiple blockchains show that UniDetect outperforms existing methods 5.57% to 7.58% in Kolmogorov-Smirnov (KS). For cross-chain zero-shot detection, UniDetect identifies over 94.58% of fraudulent accounts. It also generalizes well to non-blockchain data, delivering a 6.06% improvement in F1 over existing methods. The dataset and source code are available at https://github.com/msy0513/UniDetect.

#15 CoLA: A Choice Leakage Attack Framework to Expose Privacy Risks in Subset Training

privacy

著者: Qi Li, Cheng-Long Wang, Yinzhi Cao, Di Wang

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12342

要約:
Training models on a carefully chosen portion of data rather than the full dataset is now a standard preprocess for modern ML. From vision coreset selection to large-scale filtering in language models, it enables scalability with minimal utility loss. A common intuition is that training on fewer samples should also reduce privacy risks. In this paper, we challenge this assumption. We show that subset training is not privacy free: the very choices of which data are included or excluded can introduce new privacy surface and leak more sensitive information. Such information can be captured by adversaries either through side-channel metadata from the subset selection process or via the outputs of the target model. To systematically study this phenomenon, we propose CoLA (Choice Leakage Attack), a unified framework for analyzing privacy leakage in subset selection. In CoLA, depending on the adversary's knowledge of the side-channel information, we define two practical attack scenarios: Subset-aware Side-channel Attacks and Black-box Attacks. Under both scenarios, we investigate two privacy surfaces unique to subset training: (1) Training-membership MIA (TM-MIA), which concerns only the privacy of training data membership, and (2) Selection-participation MIA (SP-MIA), which concerns the privacy of all samples that participated in the subset selection process. Notably, SP-MIA enlarges the notion of membership from model training to the entire data-model supply chain. Experiments on vision and language models show that existing threat models underestimate subset-training privacy risks: the expanded privacy surface leaks both training and selection membership, extending risks from individual models to the broader ML ecosystem.

#16 Compiling Activation Steering into Weights via Null-Space Constraints for Stealthy Backdoors

backdoor

著者: Rui Yin, Tianxu Han, Naen Xu, Changjiang Li, Ping He, Chunyi Zhou, Jun Wang, Zhihui Fu, Tianyu Du, Jinbao Li, Shouling Ji

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12359

要約:
Safety-aligned large language models (LLMs) are increasingly deployed in real-world pipelines, yet this deployment also enlarges the supply-chain attack surface: adversaries can distribute backdoored checkpoints that behave normally under standard evaluation but jailbreak when a hidden trigger is present. Recent post-hoc weight-editing methods offer an efficient approach to injecting such backdoors by directly modifying model weights to map a trigger to an attacker-specified response. However, existing methods typically optimize a token-level mapping that forces an affirmative prefix (e.g., ``Sure''), which does not guarantee sustained harmful output -- the model may begin with apparent agreement yet revert to safety-aligned refusal within a few decoding steps. We address this reliability gap by shifting the backdoor objective from surface tokens to internal representations. We extract a steering vector that captures the difference between compliant and refusal behaviors, and compile it into a persistent weight modification that activates only when the trigger is present. To preserve stealthiness and benign utility, we impose a null-space constraint so that the injected edit remains dormant on clean inputs. The method is efficient, requiring only a small set of examples and admitting a closed-form solution. Across multiple safety-aligned LLMs and jailbreak benchmarks, our method achieves high triggered attack success while maintaining non-triggered safety and general utility.

#17 Tamper-Proofing with Self-Modifying Code

著者: Gregory Morse, Tam\'as Kozsik

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12407

要約:
Classical computability theory tells us that self-modifying code (SMC) on a deterministic universal Turing machine can be simulated by non-SMC code on the same model. That abstraction, however, omits the external timing inputs, concurrency, and microarchitectural state that dominate practical execution on modern processors. We argue that once timing, ordering, and self-introspective effects are treated as observables, a practically faithful non-SMC reproduction of timed SMC becomes detectably expensive on commodity systems. We present a tamper-proofing model that combines introspective and polymorphic SMC, reliable clocks, and runtime timing predicates to bind integrity checks to execution behavior. We distinguish static and dynamic SMC generation, characterize the timing semantics needed to avoid catastrophic pipeline clears, and give x86-64 design primitives for checksum-driven self-patching. We also report timer measurements, performance comparisons, and performance-monitoring counter evidence showing that careful engineering -- especially loop unrolling and cross-page modification -- substantially reduces the overhead of SMC while preserving its tamper-detection value. The paper concludes with an efficiency analysis, a threat model, and deployment guidance for trusted code executing in untrusted environments.

#18 Security and Resilience in Autonomous Vehicles: A Proactive Design Approach

著者: Chieh Tsai, Murad Mehrab Abrar, Salim Hariri

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12408

要約:
Autonomous vehicles (AVs) promise efficient, clean and cost-effective transportation systems, but their reliance on sensors, wireless communications, and decision-making systems makes them vulnerable to cyberattacks and physical threats. This chapter presents novel design techniques to strengthen the security and resilience of AVs. We first provide a taxonomy of potential attacks across different architectural layers, from perception and control manipulation to Vehicle-to-Any (V2X) communication exploits and software supply chain compromises. Building on this analysis, we present an AV Resilient architecture that integrates redundancy, diversity, and adaptive reconfiguration strategies, supported by anomaly- and hash-based intrusion detection techniques. Experimental validation on the Quanser QCar platform demonstrates the effectiveness of these methods in detecting depth camera blinding attacks and software tampering of perception modules. The results highlight how fast anomaly detection combined with fallback and backup mechanisms ensures operational continuity, even under adversarial conditions. By linking layered threat modeling with practical defense implementations, this work advances AV resilience strategies for safer and more trustworthy autonomous vehicles.

#19 Practical Evaluation of the Crypto-Agility Maturity Model

著者: Leonie Wolf, Samson Umezulike, Gurur \"Ondar\"o, Sebastian Schinzel, Fabian Ising

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12428

要約:
Cryptographic agility is a key prerequisite for maintaining the long-term security of digital communication, particularly in light of the transition to post-quantum cryptography. To systematically assess this capability, Hohm et al. proposed the Crypto Agility Maturity Model (CAMM). In this work, we present the first evaluation of the CAMM against established design principles for maturity models. Our analysis reveals that the CAMM only partially satisfies these principles: its scope and target groups remain ambiguous; acceptance criteria are insufficiently operationalized, limiting verifiability and replicability; and dependency relations exhibit redundancies, cycles, and omissions. Applying the CAMM to a simple real-world scenario further confirmed these issues, as several requirements at higher maturity levels proved inapplicable or unclear. Based on these findings, we propose concrete improvements to the CAMM to enable more consistent and reliable assessments of cryptographic agility.

#20 VeriX-Anon: A Multi-Layered Framework for Mathematically Verifiable Outsourced Target-Driven Data Anonymization

著者: Miit Daga, Swarna Priya Ramu

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12431

要約:
Organisations increasingly outsource privacy-sensitive data transformations to cloud providers, yet no practical mechanism lets the data owner verify that the contracted algorithm was faithfully executed. VeriX-Anon is a multi-layered verification framework for outsourced Target-Driven k-anonymization combining three orthogonal mechanisms: deterministic verification via Merkle-style hashing of an Authenticated Decision Tree, probabilistic verification via Boundary Sentinels near the Random Forest decision boundary and exact-duplicate Twins with cryptographic identifiers, and utility-based verification via Explainable AI fingerprinting that compares SHAP value distributions before and after anonymization using the Wasserstein distance. Evaluated on three cross-domain datasets against Lazy (drops 5 percent of records), Dumb (random splitting, fake hash), and Approximate (random splitting, valid hash) adversaries, VeriX-Anon correctly detected deviations in 11 of 12 scenarios. No single layer achieved this alone. The XAI layer was the only mechanism that caught the Approximate adversary, succeeding on Adult and Bank but failing on the severely imbalanced Diabetes dataset where class imbalance suppresses the SHAP signal, confirming the need for adaptive thresholding. An 11-point k-sweep showed Target-Driven anonymization preserves significantly more utility than Blind anonymization (Wilcoxon $p = 0.000977$, Cohen's $d = 1.96$, mean F1 gap $+0.1574$). Client-side verification completes under one second at one million rows. The threat model covers three empirically evaluated profiles and one theoretical profile (Informed Attacker) aware of trap embedding but unable to defeat the cryptographic salt. Sentinel evasion probability ranges from near-zero for balanced datasets to 0.52 for imbalanced ones, a limitation the twin layer compensates for in every tested scenario.

#21 Scaling Exposes the Trigger: Input-Level Backdoor Detection in Text-to-Image Diffusion Models via Cross-Attention Scaling

backdoordiffusion

著者: Zida Li, Jun Li, Yuzhe Sha, Ziqiang Li, Lizhi Xiong, Zhangjie Fu

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12446

要約:
Text-to-image (T2I) diffusion models have achieved remarkable success in image synthesis, but their reliance on large-scale data and open ecosystems introduces serious backdoor security risks. Existing defenses, particularly input-level methods, are more practical for deployment but often rely on observable anomalies that become unreliable under stealthy, semantics-preserving trigger designs. As modern backdoor attacks increasingly embed triggers into natural inputs, these methods degrade substantially, raising a critical question: can more stable, implicit, and trigger-agnostic differences between benign and backdoor inputs be exploited for detection? In this work, we address this challenge from an active probing perspective. We introduce controlled scaling perturbations on cross-attention and uncover a novel phenomenon termed Cross-Attention Scaling Response Divergence (CSRD), where benign and backdoor inputs exhibit systematically different response evolution patterns across denoising steps. Building on this insight, we propose SET, an input-level backdoor detection framework that constructs response-offset features under multi-scale perturbations and learns a compact benign response space from a small set of clean samples. Detection is then performed by measuring deviations from this learned space, without requiring prior knowledge of the attack or access to model training. Extensive experiments demonstrate that SET consistently outperforms existing baselines across diverse attack methods, trigger types, and model settings, with particularly strong gains under stealthy implicit-trigger scenarios. Overall, SET improves AUROC by 9.1% and ACC by 6.5% over the best baseline, highlighting its effectiveness and robustness for practical deployment.

#22 DeepSeek Robustness Against Semantic-Character Dual-Space Mutated Prompt Injection

著者: Junyu Ren, Xingjian Pan, Wensheng Gan, Philip S. Yu

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12548

要約:
Prompt injection has emerged as a critical security threat to large language models (LLMs), yet existing studies predominantly focus on single-dimensional attack strategies, such as semantic rewriting or character-level obfuscation, which fail to capture the combined effects of multi-space perturbations in realistic scenarios. In addition, systematic black-box robustness evaluations of recent Chinese LLMs, such as DeepSeek, remain limited. To address these gaps, we propose PromptFuzz-SC, a semantic-character dual-space mutation framework for evaluating LLM robustness against prompt injection. The framework integrates semantic transformations (e.g., paraphrasing and word-order perturbation) with character-level obfuscation (e.g., zero-width insertion and encoding-based mutation), forming a unified and extensible mutation operator library. A hybrid search strategy combining epsilon-greedy exploration and hill-climbing refinement is adopted to efficiently discover high-quality adversarial prompts. We further introduce a unified evaluation protocol based on three metrics: misuse success rate (MSR), Average Queries to Success (AQS), and Stealth. Experimental results on DeepSeek demonstrate that dual-space mutation achieves the strongest overall attack performance among the evaluated strategies, attaining the highest mean MSR (0.189), peak MSR (0.375), and mean Stealth. Compared with semantic-only and character-only mutation, it improves mean MSR by 12.5% and 5.6%, respectively. While not consistently minimizing query cost, the proposed method achieves competitive best-case efficiency and maintains strong imperceptibility, indicating a more favorable balance between attack effectiveness and concealment. These findings highlight the importance of composite mutation strategies for robust red-teaming of LLMs and provide practical insights for the design of multi-layer defense mechanisms.

#23 LLM-Guided Prompt Evolution for Password Guessing

著者: Vladimir A. Mazin, Mikhail A. Zorin, Dmitrii S. Korzh, Elvir Z. Karimov, Dmitrii A. Bolokhov, Oleg Y. Rogov

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12601

要約:
Passwords still remain a dominant authentication method, yet their security is routinely subverted by predictable user choices and large-scale credential leaks. Automated password guessing is a key tool for stress-testing password policies and modeling attacker behavior. This paper applies LLM-driven evolutionary computation to automatically optimize prompts for the LLM password guessing framework. Using OpenEvolve, an open-source system combining MAP-Elites quality-diversity search with an island population model we evolve prompts that maximize cracking rate on a RockYou-derived test set. We evaluate three configurations: a local setup with Qwen3 8B, a single compact cloud model Gemini-2.5 Flash, and a two-model ensemble of frontier LLMs. The approach raises the cracking rates from 2.02\% to 8.48\%. Character distribution analysis further confirms how evolved prompts produce statistically more realistic passwords. Automated prompt evolution is a low-barrier yet effective way to strengthen LLM-based password auditing and underlining how attack pipelines show tendency via automated improvements.

#24 Evaluating Differential Privacy Against Membership Inference in Federated Learning: Insights from the NIST Genomics Red Team Challenge

privacy

著者: Gustavo de Carvalho Bertoli

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12737

要約:
While Federated Learning (FL) mitigates direct data exposure, the resulting trained models remain susceptible to membership inference attacks (MIAs). This paper presents an empirical evaluation of Differential Privacy (DP) as a defense mechanism against MIAs in FL, leveraging the environment of the 2025 NIST Genomics Privacy-Preserving Federated Learning (PPFL) Red Teaming Event. To improve inference accuracy, we propose a stacking attack strategy that ensembles seven black-box estimators to train a meta-classifier on prediction probabilities and cross-entropy losses. We evaluate this methodology against target models under three privacy configurations: an unprotected convolutional neural network (CNN, $\epsilon=\infty$), a low-privacy DP model ($\epsilon=200$), and a high-privacy DP model ($\epsilon=10$). The attack outperforms all baselines in the No DP and Low Privacy settings and, critically, maintains measurable membership leakage at $\epsilon=200$ where a single-signal LiRA baseline collapses. Evaluated on an independent third-party benchmark, these results provide an empirical characterisation of how stacking-based inference degrades across calibrated DP tiers in FL.

#25 EXTree: Towards Supporting Explainability in Attribute-based Access Control

著者: Shanampudi Pranaya Chowdary (Indian Institute of Technology Kharagpur, India), Shamik Sural (Indian Institute of Technology Kharagpur, India)

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12850

要約:
With increasing emphasis on transparency in digital governance, users expect more than silence when their access requests are denied by a system. However, authorization methods are notorious for their inability to provide any form of meaningful feedback under such situations. This paper shows a direction towards how the problem of explainability can be mitigated in the context of Attribute-based Access Control (ABAC), arguably the most researched topic in access control in recent years. We introduce EXTree, which represents ABAC policies optimized for both fast evaluation (Efficiency) and human-centric feedback (Explainability) in the form of a tree. Two strategic dimensions are investigated, namely, Feedback Evaluation Strategies - how to craft actionable explanations when access is denied, and Tree Construction Strategies - how the policy trees should be structured for efficient yet interpretable decisions. Through extensive experiments, we compare entropy-based, changeability-based, and randomly generated trees across multiple configurations. Our results demonstrate that EXTree, built for efficiency and interpretability, can bridge the gap between complex authorization logic and human understanding.

#26 Distinguishers for Skew and Linearized Reed-Solomon Codes

著者: Felicitas H\"ormann, Anna-Lena Horlemann

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12954

要約:
Generalized Reed-Solomon (GRS) and Gabidulin codes have been proposed for various code-based cryptosystems, though most such schemes without elaborate disguising techniques have been successfully attacked. Both code classes are prominent examples of the isometric families of (generalized) skew and linearized Reed-Solomon ((G)SRS and (G)LRS) codes which are obtained as evaluation codes from skew polynomials. Both GSRS and GLRS codes share the advantage of achieving the maximum possible error-decoding radius and thus promise smaller key sizes than e.g. Classic McEliece. We investigate whether these generalizations can avoid the known structural attacks on GRS and Gabidulin codes. In particular, we prove that both GSRS and GLRS codes decompose into GRS subcodes and are thus efficiently distinguishable from random codes with a square code method. This applies to all parameters for which the code length $n$ and its dimension $k$ over the field $\mathbb{F}_{q^m}$ satisfy $m + 1 < k < n - \tfrac{1}{2} (m^2 + 3m)$. The distinguishability extends to GSRS and GLRS codes with Hamming-isometric disguising. We further relate these findings to existing distinguishers for GRS, Gabidulin, and LRS codes, and extend known results on duals of SRS and LRS codes to the generalized setting allowing nonzero column multipliers. Finally, we provide explicit transformations between GSRS and GLRS codes, clarifying the algebraic relationship between the skew and linearized frameworks.

#27 Parallax: Why AI Agents That Think Must Never Act

agent

著者: Joel Fokou

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12986

要約:
Autonomous AI agents are rapidly transitioning from experimental tools to operational infrastructure, with projections that 80% of enterprise applications will embed AI copilots by the end of 2026. As agents gain the ability to execute real-world actions (reading files, running commands, making network requests, modifying databases), a fundamental security gap has emerged. The dominant approach to agent safety relies on prompt-level guardrails: natural language instructions that operate at the same abstraction level as the threats they attempt to mitigate. This paper argues that prompt-based safety is architecturally insufficient for agents with execution capability and introduces Parallax, a paradigm for safe autonomous AI execution grounded in four principles: Cognitive-Executive Separation, which structurally prevents the reasoning system from executing actions; Adversarial Validation with Graduated Determinism, which interposes an independent, multi-tiered validator between reasoning and execution; Information Flow Control, which propagates data sensitivity labels through agent workflows to detect context-dependent threats; and Reversible Execution, which captures pre-destructive state to enable rollback when validation fails. We present OpenParallax, an open-source reference implementation in Go, and evaluate it using Assume-Compromise Evaluation, a methodology that bypasses the reasoning system entirely to test the architectural boundary under full agent compromise. Across 280 adversarial test cases in nine attack categories, Parallax blocks 98.9% of attacks with zero false positives under its default configuration, and 100% of attacks under its maximum-security configuration. When the reasoning system is compromised, prompt-level guardrails provide zero protection because they exist only within the compromised system; Parallax's architectural boundary holds regardless.

#28 LogicEval: A Systematic Framework for Evaluating Automated Repair Techniques for Logical Vulnerabilities in Real-World Software

著者: Syed Md Mukit Rashid, Abdullah Al Ishtiaq, Kai Tu, Yilu Dong, Tianwei Wu, Ali Ranjbar, Tianchang Yang, Najrin Sultana, Shagufta Mehnaz, Syed Rafiul Hussain

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12994

要約:
Logical vulnerabilities in software stem from flaws in program logic rather than memory safety, which can lead to critical security failures. Although existing automated program repair techniques primarily focus on repairing memory corruption vulnerabilities, they struggle with logical vulnerabilities because of their limited semantic understanding of the vulnerable code and its expected behavior. On the other hand, recent successes of large language models (LLMs) in understanding and repairing code are promising. However, no framework currently exists to analyze the capabilities and limitations of such techniques for logical vulnerabilities. This paper aims to systematically evaluate both traditional and LLM-based repair approaches for addressing real-world logical vulnerabilities. To facilitate our assessment, we created the first ever dataset, LogicDS, of 86 logical vulnerabilities with assigned CVEs reflecting tangible security impact. We also developed a systematic framework, LogicEval, to evaluate patches for logical vulnerabilities. Evaluations suggest that compilation and testing failures are primarily driven by prompt sensitivity, loss of code context, and difficulty in patch localization.

#29 INTARG: Informed Real-Time Adversarial Attack Generation for Time-Series Regression

著者: Gamze Kirman Tokgoz, Onat Gungor, Tajana Rosing, Baris Aksanli

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.11928

要約:
Time-series forecasting aims to predict future values by modeling temporal dependencies in historical observations. It is a critical component of many real-world systems, where accurate forecasts improve operational efficiency and help mitigate uncertainty and risk. More recently, machine learning (ML), and especially deep learning (DL)-based models, have gained widespread adoption for time-series forecasting, but they remain vulnerable to adversarial attacks. However, many state-of-the-art attack methods are not directly applicable in time-series settings, where storing complete historical data or performing attacks at every time step is often impractical. This paper proposes an adversarial attack framework for time-series forecasting under an online bounded-buffer setting, leveraging an informed and selective attack strategy. By selectively targeting time steps where the model exhibits high confidence and the expected prediction error is maximal, our framework produces fewer but substantially more effective attacks. Experiments show that our framework can increase the prediction error up to 2.42x, while performing attacks in fewer than 10% of time steps.

#30 AnyPoC: Universal Proof-of-Concept Test Generation for Scalable LLM-Based Bug Detection

著者: Zijie Zhao, Chenyuan Yang, Weidong Wang, Yihan Yang, Ziqi Zhang, Lingming Zhang

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.11950

要約:
While recent LLM-based agents can identify many candidate bugs in source code, their reports remain static hypotheses that require manual validation, limiting the practicality of automated bug detection. We frame this challenge as a test generation task: given a candidate report, synthesizing an executable proof-of-concept test, or simply a PoC - such as a script, command sequence, or crafted input - to trigger the suspected defect. Automated PoC generation can act as a scalable validation oracle, enabling end-to-end autonomous bug detection by providing concrete execution evidence. However, naive LLM agents are unreliable validators: they are biased toward "success" and may reward-hack by producing plausible but non-functional PoCs or even hallucinated traces. To address this, we present AnyPoC, a general multi-agent framework that (1) analyzes and fact-checks a candidate bug report, (2) iteratively synthesizes and executes a PoC while collecting execution traces, and (3) independently re-executes and scrutinizes the PoC to mitigate hallucination and reward hacking. In addition, AnyPoC also continuously extracts and evolves a PoC knowledge base to handle heterogeneous tasks. AnyPoC operates on candidate bug reports regardless of their source and can be paired with different bug reporters. To demonstrate practicality and generality, we apply AnyPoC, with a simple agentic bug reporter, on 12 critical software systems across diverse languages/domains (many with millions of lines of code) including Firefox, Chromium, LLVM, OpenSSL, SQLite, FFmpeg, and Redis. Compared to the state-of-the-art coding agents, e.g., Claude Code and Codex, AnyPoC produces 1.3x more valid PoCs for true-positive bug reports and rejects 9.8x more false-positive bug reports. To date, AnyPoC has discovered 122 new bugs (105 confirmed, 86 already fixed), with 45 generated PoCs adopted as official regression tests.

#31 Policy-Invisible Violations in LLM-Based Agents

agent

著者: Jie Wu, Ming Gong

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12177

要約:
LLM-based agents can execute actions that are syntactically valid, user-sanctioned, and semantically appropriate, yet still violate organizational policy because the facts needed for correct policy judgment are hidden at decision time. We call this failure mode policy-invisible violations: cases in which compliance depends on entity attributes, contextual state, or session history absent from the agent's visible context. We present PhantomPolicy, a benchmark spanning eight violation categories with balanced violation and safe-control cases, in which all tool responses contain clean business data without policy metadata. We manually review all 600 model traces produced by five frontier models and evaluate them using human-reviewed trace labels. Manual review changes 32 labels (5.3%) relative to the original case-level annotations, confirming the need for trace-level human review. To demonstrate what world-state-grounded enforcement can achieve under favorable conditions, we introduce Sentinel, an enforcement framework based on counterfactual graph simulation. Sentinel treats every agent action as a proposed mutation to an organizational knowledge graph, performs speculative execution to materialize the post-action world state, and verifies graph-structural invariants to decide Allow/Block/Clarify. Against human-reviewed trace labels, Sentinel substantially outperforms a content-only DLP baseline (68.8% vs. 93.0% accuracy) while maintaining high precision, though it still leaves room for improvement on certain violation categories. These results demonstrate what becomes achievable once policy-relevant world state is made available to the enforcement layer.

#32 Clustering-Enhanced Domain Adaptation for Cross-Domain Intrusion Detection in Industrial Control Systems

著者: Luyao Wang

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12183

要約:
Industrial control systems operate in dynamic environments where traffic distributions vary across scenarios, labeled samples are limited, and unknown attacks frequently emerge, posing significant challenges to cross-domain intrusion detection. To address this issue, this paper proposes a clustering-enhanced domain adaptation method for industrial control traffic. The framework contains two key components. First, a feature-based transfer learning module projects source and target domains into a shared latent subspace through spectral-transform-based feature alignment and iteratively reduces distribution discrepancies, enabling accurate cross-domain detection. Second, a clustering enhancement strategy combines K-Medoids clustering with PCA-based dimensionality reduction to improve cross-domain correlation estimation and reduce performance degradation caused by manual parameter tuning. Experimental results show that the proposed method significantly improves unknown attack detection. Compared with five baseline models, it increases detection accuracy by up to 49%, achieves larger gains in F-score, and demonstrates stronger stability. Moreover, the clustering enhancement strategy further boosts detection accuracy by up to 26% on representative tasks. These results suggest that the proposed method effectively alleviates data scarcity and domain shift, providing a practical solution for robust cross-domain intrusion detection in dynamic industrial environments.

#33 Safety Training Modulates Harmful Misalignment Under On-Policy RL, But Direction Depends on Environment Design

著者: Leon Eshuijs, Shihan Wang, Antske Fokkens

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12500

要約:
Specification gaming under Reinforcement Learning (RL) is known to cause LLMs to develop sycophantic, manipulative, or deceptive behavior, yet the conditions under which this occurs remain unclear. We train 11 instruction-tuned LLMs (0.5B--14B) with on-policy RL across 3 environments and find that model size acts as a safety buffer in some environments but enables greater harmful exploitation in others. Controlled ablations trace this reversal to environment-specific features such as role framing and implicit gameability cues. We further show that most safety benchmarks do not predict RL-induced misalignment, except in the case of Sycophancy scores when the exploit relies on inferring the user's preference. Finally, we find that on-policy RL preserves a safety buffer inherent in the model's own generation distribution, one that is bypassed during off-policy settings.

#34 Robust Semi-Supervised Temporal Intrusion Detection for Adversarial Cloud Networks

著者: Anasuya Chattopadhyay, Daniel Reti, Hans D. Schotten

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12655

要約:
Cloud networks increasingly rely on machine learning based Network Intrusion Detection Systems to defend against evolving cyber threats. However, real-world deployments are challenged by limited labeled data, non-stationary traffic, and adaptive adversaries. While semi-supervised learning can alleviate label scarcity, most existing approaches implicitly assume benign and stationary unlabeled traffic, leading to degraded performance in adversarial cloud environments. This paper proposes a robust semi-supervised temporal learning framework for cloud intrusion detection that explicitly addresses adversarial contamination and temporal drift in unlabeled network traffic. Operating on flow-level data, this framework combines supervised learning with consistency regularization, confidence-aware pseudo-labeling, and selective temporal invariance to conservatively exploit unlabeled traffic while suppressing unreliable samples. By leveraging the temporal structure of network flows, the proposed method improves robustness and generalization across heterogeneous cloud environments. Extensive evaluations on publicly available datasets (CIC-IDS2017, CSE-CIC-IDS2018, and UNSW-NB15) under limited-label conditions demonstrate that the proposed framework consistently outperforms state-of-the-art supervised and semi-supervised network intrusion detection systems in detection performance, label efficiency, and resilience to adversarial and non-stationary traffic.

#35 Understanding and Improving Continuous Adversarial Training for LLMs via In-context Learning Theory

著者: Shaopeng Fu, Di Wang

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12817

要約:
Adversarial training (AT) is an effective defense for large language models (LLMs) against jailbreak attacks, but performing AT on LLMs is costly. To improve the efficiency of AT for LLMs, recent studies propose continuous AT (CAT) that searches for adversarial inputs within the continuous embedding space of LLMs during AT. While CAT has achieved empirical success, its underlying mechanism, i.e., why adversarial perturbations in the embedding space can help LLMs defend against jailbreak prompts synthesized in the input token space, remains unknown. This paper presents the first theoretical analysis of CAT on LLMs based on in-context learning (ICL) theory. For linear transformers trained with adversarial examples from the embedding space on in-context linear regression tasks, we prove a robust generalization bound that has a negative correlation with the perturbation radius in the embedding space. This clearly explains why CAT can defend against jailbreak prompts from the LLM's token space. Further, the robust bound shows that the robustness of an adversarially trained LLM is closely related to the singular values of its embedding matrix. Based on this, we propose to improve LLM CAT by introducing an additional regularization term, which depends on singular values of the LLM's embedding matrix, into the objective function of CAT. Experiments on real-world LLMs demonstrate that our method can help LLMs achieve a better jailbreak robustness-utility tradeoff. The code is available at https://github.com/fshp971/continuous-adv-icl.

#36 Rapid LoRA Aggregation for Wireless Channel Adaptation in Open-Set Radio Frequency Fingerprinting

著者: Mingxi Zhang, Renjie Xie, Jincheng Wang, Guyue Li, Wei Xu

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12834

要約:
Radio frequency fingerprints (RFFs) enable secure wireless authentication but struggle in open-set scenarios with unknown devices and varying channels. Existing methods face challenges in generalization and incur high computational costs. We propose a lightweight, self-adaptive RFF extraction framework using Low-Rank Adaptation (LoRA). By pretraining LoRA modules per environment, our method enables fast adaptation to unseen channel conditions without full retraining. During inference, a weighted combination of LoRAs dynamically enhances feature extraction. Experimental results demonstrate a 15% reduction in equal error rate (EER) compared to non-finetuned baselines and an 83% decrease in training time relative to full fine-tuning, using the same training dataset. This approach provides a scalable and efficient solution for open-set RFF authentication in dynamic wireless vehicular networks.

#37 CoDe-R: Refining Decompiler Output with LLMs via Rationale Guidance and Adaptive Inference

著者: Qiang Zhang, Zhongnian Li

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12913

要約:
Binary decompilation is a critical reverse engineering task aimed at reconstructing high-level source code from stripped executables. Although Large Language Models (LLMs) have recently shown promise, they often suffer from "logical hallucinations" and "semantic misalignment" due to the irreversible semantic loss during compilation, resulting in generated code that fails to re-execute. In this study, we propose Cognitive Decompiler Refinement with Robustness (CoDe-R), a lightweight two-stage code refinement framework. The first stage introduces Semantic Cognitive Enhancement (SCE), a Rationale-Guided Semantic Injection strategy that trains the model to recover high-level algorithmic intent alongside code. The second stage introduces a Dynamic Dual-Path Fallback (DDPF) mechanism during inference, which adaptively balances semantic recovery and syntactic stability via a hybrid verification strategy. Evaluation on the HumanEval-Decompile benchmark demonstrates that CoDe-R (using a 1.3B backbone) establishes a new State-of-the-Art (SOTA) in the lightweight regime. Notably, it is the first 1.3B model to exceed an Average Re-executability Rate of 50.00%, significantly outperforming the baseline and effectively bridging the gap between efficient models and expert-level performance. Our code is available at https://github.com/Theaoi/CoDe-R.

#38 A2-DIDM: Privacy-preserving Accumulator-enabled Auditing for Distributed Identity of DNN Model

privacy

著者: Tianxiu Xie, Keke Gai, Jing Yu, Liehuang Zhu

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2405.04108

要約:
Recent booming development of Generative Artificial Intelligence (GenAI) has facilitated model commercialization to reinforce the model performance, including licensing or trading Deep Neural Network (DNN) models. However, DNN model trading may violate the benefit of the model owner due to unauthorized replications or misuse of the model. Model identity auditing is a challenging issue in protecting DNN model ownership, and verifying the integrity and ownership of models is one of the critical obstacles. In this paper, we focus on the above issue and propose an \underline{A}ccumulator-enabled \underline{A}uditing for \underline{D}ecentralized \underline{Id}entity of DNN \underline{M}odel (A2-DIDM) that utilizes blockchain and zero-knowledge techniques to protect data and function privacy while ensuring the lightweight on-chain ownership verification. The proposed model presents a scheme of identity records via configuring model weight checkpoints with zero-knowledge proofs, which incorporates predicates to capture incremental state changes in model weight checkpoints. Our scheme ensures both computational integrity and programmability in DNN training process so that the uniqueness of the weight checkpoint sequence in a DNN model is preserved. %to ensure the correctness of model identity auditing, so that the uniqueness of the weight checkpoint sequence in a DNN model is preserved. A2-DIDM also addresses privacy protections in decentralized identity. We systematically analyze the security and robustness of our proposed model and further evaluate the effectiveness and usability of auditing DNN model identities. The code is available at https://github.com/xtx123456/A2-DIDM.git.

#39 Neuro-symbolic Static Analysis with LLM-generated Vulnerability Patterns

著者: Penghui Li, Songchen Yao, Josef Sarfati Korich, Changhua Luo, Jianjia Yu, Yinzhi Cao, Junfeng Yang

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2504.16057

要約:
In this work, we present MoCQ, a neuro-symbolic static analysis framework that leverages large language models (LLMs) to automatically generate vulnerability detection patterns. This approach combines the precision and scalability of pattern-based static analysis with the semantic understanding and automation capabilities of LLMs. MoCQ extracts the domain-specific languages for expressing vulnerability patterns and employs an iterative refinement loop with trace-driven symbolic validation that provides precise feedback for pattern correction. We evaluated MoCQ on 12 vulnerability types across four languages (C/C++, Java, PHP, JavaScript). MoCQ achieves detection performance comparable to expert-developed patterns while requiring only hours of generation versus weeks of manual effort. Notably, MoCQ uncovered 46 new vulnerability patterns that security experts had missed and discovered 25 previously unknown vulnerabilities in real-world applications. MoCQ also outperforms prior approaches with stronger analysis capabilities and broader applicability.

#40 To trust or not to trust: Attention-based Trust Management for LLM Multi-Agent Systems

agent

著者: Pengfei He, Zhenwei Dai, Xianfeng Tang, Yue Xing, Hui Liu, Jingying Zeng, Qiankun Peng, Shrivats Agrawal, Samarth Varshney, Suhang Wang, Jiliang Tang, Qi He

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2506.02546

要約:
Large Language Model-based Multi-Agent Systems (LLM-MAS) have demonstrated strong capabilities in solving complex tasks but remain vulnerable when agents receive unreliable messages. This vulnerability stems from a fundamental gap: LLM agents treat all incoming messages equally without evaluating their trustworthiness. While some existing studies approach trustworthiness, they focus on a single type of harmfulness rather than analyze it in a holistic approach from multiple trustworthiness perspectives. We address this gap by proposing a comprehensive definition of trustworthiness inspired by human communication theory (Grice, 1975). Our definition identifies six orthogonal trust dimensions that provide interpretable measures of trustworthiness. Building on this definition, we introduce the Attention Trust Score (A -Trust), a lightweight, attention-based method for evaluating the trustworthiness of messages. We then develop a principled trust management system (TMS) for LLM -MAS that supports both message-level and agent-level trust assessments. Experiments across diverse multi-agent settings and tasks demonstrate that our TMS significantly improves robustness against malicious inputs.

#41 Hermes: Efficient Global Homomorphic Aggregation over Mutable Packed Ciphertexts

著者: Dongfang Zhao

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2506.03308

要約:
Fully Homomorphic Encryption (FHE) promises the ability to compute over encrypted data without revealing sensitive contents. However, enabling this feature for high-frequency updates and statistical analysis in outsourced databases remains elusive due to the structural mismatch between mutable database records and the cryptographically expensive mutability of FHE ciphertexts. This paper presents Hermes, a prototype system tailored for efficient global aggregation queries and dynamic tuple updates on homomorphically encrypted databases. The core design of Hermes is twofold. First, to amortize FHE costs and accelerate unconditional aggregations, Hermes introduces a data model aware of SIMD structures. Precomputed aggregate statistics become a primary element, dynamically maintained within the ciphertext to support constant time global aggregations without expensive Galois automorphisms. Second, to support mutable ciphertexts in-place, we develop data oblivious homomorphic algorithms built upon polynomial slot masking and shifting, provably secure under standard security models. Hermes is implemented as a suite of C++ loadable functions in MySQL. Extensive evaluations on the TPC-H benchmark and three real-world datasets demonstrate significant performance improvements in global query throughput, tuple insertions, and tuple deletions compared to conventional FHE implementations, validating its efficacy for highly dynamic and analytical workloads.

#42 Mobile GUI Agents under Real-world Threats: Are We There Yet?

agent

著者: Guohong Liu, Jialei Ye, Jiacheng Liu, Yuanchun Li, Wei Liu, Pengzhi Gao, Jian Luan, Yunxin Liu

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2507.04227

要約:
Recent years have witnessed a rapid development of mobile GUI agents powered by large language models (LLMs), which can autonomously execute diverse device-control tasks based on natural language instructions. The increasing accuracy of these agents on standard benchmarks has raised expectations for large-scale real-world deployment, and there are already several commercial agents released and used by early adopters. However, are we really ready for GUI agents integrated into our daily devices as system building blocks? We argue that an important pre-deployment validation is missing to examine whether the agents can maintain their performance under real-world threats. Specifically, unlike existing common benchmarks that are based on simple static app contents (they have to do so to ensure environment consistency between different tests), real-world apps are filled with contents from untrustworthy third parties, such as advertisement emails, user-generated posts and medias, etc. ... To this end, we introduce a scalable app content instrumentation framework to enable flexible and targeted content modifications within existing applications. Leveraging this framework, we create a test suite comprising both a dynamic task execution environment and a static dataset of challenging GUI states. The dynamic environment encompasses 122 reproducible tasks, and the static dataset consists of over 3,000 scenarios constructed from commercial apps. We perform experiments on both open-source and commercial GUI agents. Our findings reveal that all examined agents can be significantly degraded due to third-party contents, with an average misleading rate of 42.0% and 36.1% in dynamic and static environments respectively. The framework and benchmark has been released at https://agenthazard.github.io.

#43 SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From

著者: Yao Tong, Haonan Wang, Siquan Li, Kenji Kawaguchi, Tianyang Hu

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2509.26404

要約:
Fingerprinting Large Language Models (LLMs)is essential for provenance verification and model attribution. Existing fingerprinting methods are primarily evaluated after fine-tuning, where models have already acquired stable signatures from training data, optimization dynamics, or hyperparameters. However, most of a model's capacity and knowledge are acquired during pretraining rather than downstream fine-tuning, making large-scale pretraining a more fundamental regime for lineage verification. We show that existing fingerprinting methods become unreliable in this regime, as they rely on post-hoc signatures that only emerge after substantial training. This limitation contradicts the classical Galton notion of a fingerprint as an intrinsic and persistent identity. In contrast, we propose a stronger and more intrinsic notion of LLM fingerprinting: SeedPrints, a method that leverages random initialization biases as persistent, seed-dependent identifiers present even before training begins. We show that untrained models exhibit reproducible prediction biases induced by their initialization seed, and that these weak signals remain statistically detectable throughout training, enabling high-confidence lineage verification. Unlike prior techniques that fail during early pretraining or degrade under distribution shifts, SeedPrints remains effective across all training stages, from initialization to large-scale pretraining and downstream adaptation. Experiments on LLaMA-style and Qwen-style models demonstrate seed-level distinguishability and enable birth-to-lifecycle identity verification. Evaluations on large-scale pretraining trajectories and real-world fingerprinting benchmarks further confirm its robustness under prolonged training, domain shifts, and parameter modifications.

#44 Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain

backdooragent

著者: L\'eo Boisvert, Abhay Puri, Chandra Kiran Reddy Evuru, Nazanin Sepahvand, Nicolas Chapados, Quentin Cappart, Alexandre Lacoste, Krishnamurthy Dj Dvijotham, Alexandre Drouin

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2510.05159

要約:
While finetuning AI agents on interaction data -- such as web browsing or tool use -- improves their capabilities, it also introduces critical security vulnerabilities within the agentic AI supply chain. We show that adversaries can effectively poison the data collection pipeline at multiple stages to embed hard-to-detect backdoors that, when triggered, cause unsafe or malicious behavior. We formalize three realistic threat models across distinct layers of the supply chain: direct poisoning of finetuning data, pre-backdoored base models, and environment poisoning, a novel attack vector that exploits vulnerabilities specific to agentic training pipelines. Evaluated on two widely adopted agentic benchmarks, all three threat models prove effective: poisoning only a small number of demonstrations is sufficient to embed a backdoor that causes an agent to leak confidential user information with over 80\% success.

#45 SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents

agent

著者: Zonghao Ying, Yangguang Shao, Jianle Gan, Gan Xu, Wenxin Zhang, Quanchen Zou, Junzheng Shi, Zhenfei Yin, Mingchuan Zhang, Aishan Liu, Xianglong Liu

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2510.10073

要約:
Large vision-language model (LVLM)-based web agents are emerging as powerful tools for automating complex online tasks. However, when deployed in real-world environments, they face serious security risks, motivating the design of security evaluation benchmarks. Existing benchmarks provide only partial coverage, typically restricted to narrow scenarios such as user-level prompt manipulation, and thus fail to capture the broad range of agent vulnerabilities. To address this gap, we present \tool{}, the first holistic benchmark for evaluating the security of LVLM-based web agents. \tool{} first introduces a unified evaluation suite comprising six simulated but realistic web environments (\eg, e-commerce platforms, community forums) and includes 2,970 high-quality trajectories spanning diverse tasks and attack settings. The suite defines a structured taxonomy of six attack vectors spanning both user-level and environment-level manipulations. In addition, we introduce a multi-layered evaluation protocol that analyzes agent failures across three critical dimensions: internal reasoning, behavioral trajectory, and task outcome, facilitating a fine-grained risk analysis that goes far beyond simple success metrics. Using this benchmark, we conduct large-scale experiments on 9 representative LVLMs, which fall into three categories: general-purpose, agent-specialized, and GUI-grounded. Our results show that all tested agents are consistently vulnerable to subtle adversarial manipulations and reveal critical trade-offs between model specialization and security. By providing (1) a comprehensive benchmark suite with diverse environments and a multi-layered evaluation pipeline, and (2) empirical insights into the security challenges of modern LVLM-based web agents, \tool{} establishes a foundation for advancing trustworthy web agent deployment.

#46 From Coordinates to Context: An LLM-Bootstrapped Semantic Encoding Framework for Privacy-Preserving Mobile Sensing Stress Recognition

privacy

著者: Hoang Khang Phan, Nhat Tan Le

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2511.23200

要約:
Psychological stress is a widespread issue that significantly impacts student well-being and academic performance. Effective remote stress recognition is crucial, yet existing methods often rely on wearable devices or GPS-based clustering techniques that pose privacy risks and lack of human understandable explanations. In this study, we introduce a novel, end-to-end privacy-enhanced framework for semantic location encoding using a self-hosted OSM engine and an LLM-bootstrapped static map for human-friendly feature extraction, and pave a pathway for privacy-aware location data transformation for dataset sharing. We rigorously quantify the privacy-utility-explainability trilemma and demonstrate (via LOSO validation) that our Privacy-Aware (PA) model achieves robust privacy protection without being statistically distinguishable in stress recognition performance from a non-private model. Model explanation analysis highlights that our extracted features, which are user-friendly features, match with psychological literature about stress. In addition, an ablation study on the GeoLife dataset also demonstrates that our privacy framework improves privacy by 2-3 times compared to a non-privacy-aware approach. This suggests that our system can be utilized for the next generation of GPS transformations in open-source datasets for future researchers.

#47 Red Teaming Large Reasoning Models

著者: Jiawei Chen, Yang Yang, Chao Yu, Yu Tian, Zhi Cao, Xue Yang, Linghao Li, Hang Su, Zhaoxia Yin

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2512.00412

要約:
Large Reasoning Models (LRMs) have emerged as a powerful advancement in multi-step reasoning tasks, offering enhanced transparency and logical consistency through explicit chains of thought (CoT). However, these models introduce novel safety and reliability risks, such as CoT-hijacking and prompt-induced inefficiencies, which are not fully captured by existing evaluation methods. To address this gap, we propose RT-LRM, a unified benchmark designed to assess the trustworthiness of LRMs. RT-LRM evaluates three core dimensions: truthfulness, safety and efficiency. Beyond metric-based evaluation, we further introduce the training paradigm as a key analytical perspective to investigate the systematic impact of different training strategies on model trustworthiness. We achieve this by designing a curated suite of 30 reasoning tasks from an observational standpoint. We conduct extensive experiments on 26 models and identify several valuable insights into the trustworthiness of LRMs. For example, LRMs generally face trustworthiness challenges and tend to be more fragile than Large Language Models (LLMs) when encountering reasoning-induced risks. These findings uncover previously underexplored vulnerabilities and highlight the need for more targeted evaluations. In addition, we release a scalable toolbox for standardized trustworthiness research to support future advancements in this important field. Our code and datasets will be open-sourced.

#48 CKG-LLM: LLM-Assisted Detection of Smart Contract Access Control Vulnerabilities Based on Knowledge Graphs

著者: Xiaoqi Li, Hailu Kuang, Wenkai Li, Zongwei Li, Shipeng Ye

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2512.06846

要約:
Traditional approaches for smart contract analysis often rely on intermediate representations such as abstract syntax trees, control-flow graphs, or static single assignment form. However, these methods face limitations in capturing both semantic structures and control logic. Knowledge graphs, by contrast, offer a structured representation of entities and relations, enabling richer intermediate abstractions of contract code and supporting the use of graph query languages to identify rule-violating elements. This paper presents CKG-LLM, a framework for detecting access-control vulnerabilities in smart contracts. Leveraging the reasoning and code generation capabilities of large language models, CKG-LLM translates natural-language vulnerability patterns into executable queries over contract knowledge graphs to automatically locate vulnerable code elements. Experimental evaluation demonstrates that CKG-LLM achieves superior performance in detecting access-control vulnerabilities compared to existing tools. Finally, we discuss potential extensions of CKG-LLM as part of future research directions.

#49 Safe-FedLLM: Delving into the Safety of Federated Large Language Models

著者: Mingxiang Tao, Yu Tian, Wenxuan Tu, Yue Yang, Xue Yang, Xiangyan Tang

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2601.07177

要約:
Federated learning (FL) addresses privacy and data-silo issues in the training of large language models (LLMs). Most prior work focuses on improving the efficiency of federated learning for LLMs (FedLLM). However, security in open federated environments, particularly defenses against malicious clients, remains underexplored. To investigate the security of FedLLM, we conduct a preliminary study to analyze potential attack surfaces and defensive characteristics from the perspective of LoRA updates. We find two key properties of FedLLM: 1) LLMs are vulnerable to attacks from malicious clients in FL, and 2) LoRA updates exhibit distinct behavioral patterns that can be effectively distinguished by lightweight classifiers. Based on these properties, we propose Safe-FedLLM, a probe-based defense framework for FedLLM, which constructs defenses across three levels: Step-Level, Client-Level, and Shadow-Level. The core concept of Safe-FedLLM is to perform probe-based discrimination on each client's local LoRA updates, treating them as high-dimensional behavioral features and using a lightweight classifier to determine whether they are malicious. Extensive experiments demonstrate that Safe-FedLLM effectively improves FedLLM's robustness against malicious clients while maintaining competitive performance on benign data. Notably, our method effectively suppresses the impact of malicious data without significantly affecting training speed, and remains effective even under high malicious client ratios.

#50 Self-Sovereign Identity and eIDAS 2.0: An Analysis of Control, Privacy, and Legal Implications

privacy

著者: Nacereddine Sitouah, Marco Esposito, Francesco Bruschi

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2601.19837

要約:
European digital identity initiatives are grounded in regulatory frameworks designed to ensure interoperability and robust, harmonized security standards. The evolution of these frameworks culminates in eIDAS 2.0, whose origins trace back to the Electronic Signatures Directive 1999/93/EC, the first EU-wide legal foundation for the use of electronic signatures in cross-border electronic transactions. As technological capabilities advanced, the initial eIDAS 1.0 framework was increasingly criticized for its limitations and lack of comprehensiveness. Emerging decentralized approaches further exposed these shortcomings and introduced the possibility of integrating innovative identity paradigms, such as Self-Sovereign Identity (SSI) models. In this article, we contribute to the ongoing legal and policy debate on the European Digital Identity Framework by analyzing key provisions of eIDAS 2.0 and its accompanying recitals, drawing on a systematic literature review guided by defined Research Questions (RQ). This work employs a structured methodological approach that combines descriptive and comparative analysis, systematic gap analysis supported by a defined scoring matrix, and normative analysis to evaluate the compatibility of SSI properties with eIDAS 2.0 regulation, as operationalized via its Architecture and Reference Framework (ARF). Furthermore, we assess the ARF's guidelines and examine the extent to which it aligns with SSI. The analysis adopts a complementary perspective demonstrating how the regulation can be further developed to better support SSI in the future by identifying existing limitations and potential adoption opportunities within the current legal foundations of the framework.

#51 Hidden Elo: Private Matchmaking through Encrypted Rating Systems

privacy

著者: Mindaugas Budzys, Bin Liu, Antonis Michalas

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.26407

要約:
Matchmaking has become a prevalent part in contemporary applications, being used in dating apps, social media, online games, contact tracing and in various other use-cases. However, most implementations of matchmaking require the collection of sensitive/personal data for proper functionality. As such, with this work we aim to reduce the privacy leakage inherent in matchmaking applications. We propose H-Elo, a Fully Homomorphic Encryption (FHE)-based, private rating system, which allows for secure matchmaking through the use of traditional rating systems. In this work, we provide the construction of H-Elo, analyse the security of it against a capable adversary as well as benchmark our construction in a chess-based rating update scenario. Through our experiments we show that H-Elo can achieve similar accuracy to a plaintext implementation, while keeping rating values private and secure. Additionally, we compare our work to other private matchmaking solutions as well as cover some future directions in the field of private matchmaking. To the best of our knowledge we provide one of the first private and secure rating system-based matchmaking protocols.

#52 Broken Quantum: A Systematic Formal Verification Study of Security Vulnerabilities Across the Open-Source Quantum Computing Simulator Ecosystem

著者: Dominik Blain

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.06712

要約:
Quantum computing simulators form the classical software foundation on which virtually all quantum algorithm research depends. We present Broken Quantum, the first comprehensive formal security audit of the open-source quantum computing simulator ecosystem. Applying COBALT QAI -- a four-module static analysis engine backed by the Z3 SMT solver -- we analyze 45 open-source quantum simulation frameworks from 22 organizations spanning 12 countries. We identify 547 security findings (40 CRITICAL, 492 HIGH, 15 MEDIUM) across four vulnerability classes: CWE-125/190 (C++ memory corruption), CWE-400 (Python resource exhaustion), CWE-502/94 (unsafe deserialization and code injection), and CWE-77/22 (QASM injection -- a novel, quantum-specific attack vector with no classical analog). All 13 vulnerability patterns are formally verified via Z3 satisfiability proofs (13/13 SAT). The 32-qubit boundary emerges as a consistent formal threshold in both C++ and Python vulnerability chains. Supply chain analysis identifies the first documented case of vulnerability transfer from a commercial quantum framework into US national laboratory infrastructure (IBM Qiskit Aer to XACC/Oak Ridge National Laboratory). Nine frameworks score 100/100 under all four scanners; Qiskit Aer,Cirq, tequila, PennyLane, and 5 others score 0/100.

#53 A Relay a Day Keeps the AirTag Away: Practical Relay Attacks on Apple's AirTags

著者: Gabriel K. Gegenhuber, Leonid Liadveikin, Florian Holzbauer, Sebastian Strobl

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.10138

要約:
Apple AirTags use Apple's Find My network: when nearby iDevices detect a lost tag, they anonymously forward an encrypted location report to Apple, which the tag's owner can then fetch to locate the item. That encryption protects privacy -- neither the finder nor Apple learns the owner's identity -- but it also prevents Apple from validating the correctness of received reports. We show that this design weakness can be exploited: using a relay attack, we can inject manipulated location reports so the Find My service reports a false position for a lost AirTag. The same technique can be used to deny recovery of a targeted tag (a focused DoS), since the owner is misled about its whereabouts.

#54 Towards Generalized Certified Robustness with Multi-Norm Training

著者: Enyi Jiang, David S. Cheung, Gagandeep Singh

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2410.03000

要約:
Existing certified training methods can only train models to be robust against a certain perturbation type (e.g. $l_\infty$ or $l_2$). However, an $l_\infty$ certifiably robust model may not be certifiably robust against $l_2$ perturbation (and vice versa) and also has low robustness against other perturbations (e.g. geometric and patch transformation). By constructing a theoretical framework to analyze and mitigate the tradeoff, we propose the first multi-norm certified training framework \textbf{CURE}, consisting of several multi-norm certified training methods, to attain better \emph{union robustness} when training from scratch or fine-tuning a pre-trained certified model. Inspired by our theoretical findings, we devise bound alignment and connect natural training with certified training for better union robustness. Compared with SOTA-certified training, \textbf{CURE} improves union robustness to $32.0\%$ on MNIST, $25.8\%$ on CIFAR-10, and $10.6\%$ on TinyImagenet across different epsilon values. It leads to better generalization on a diverse set of challenging unseen geometric and patch perturbations to $6.8\%$ and $16.0\%$ on CIFAR-10. Overall, our contributions pave a path towards \textit{generalized certified robustness}.

#55 IMU: Influence-guided Machine Unlearning

privacy

著者: Xindi Fan, Jing Wu, Mingyi Zhou, Pengwei Liang, Mehrtash Harandi, Dinh Phung

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2508.01620

要約:
Machine Unlearning (MU) aims to selectively erase the influence of specific data points from pretrained models. However, most existing MU methods rely on the retain set to preserve model utility, which is often impractical due to privacy restrictions and storage constraints. While several retain-data-free methods attempt to bypass this using geometric feature shifts or auxiliary statistics, they typically treat forgetting samples uniformly, overlooking their heterogeneous contributions. To address this, we propose \ul{I}nfluence-guided \ul{M}achine \ul{U}nlearning (IMU), a principled method that conducts MU using only the forget set. Departing from uniform Gradient Ascent (GA) or implicit weighting mechanisms, IMU leverages influence functions as an explicit priority signal to allocate unlearning strength. To circumvent the prohibitive cost of full-model Hessian inversion, we introduce a theoretically grounded classifier-level influence approximation. This efficient design allows IMU to dynamically reweight unlearning updates, aggressively targeting samples that most strongly support the forgetting objective while minimizing unnecessary perturbation to retained knowledge. Extensive experiments across vision and language tasks show that IMU achieves highly competitive results. Compared to standard uniform GA, IMU maintains identical unlearning depth while enhancing model utility by an average of 30%, effectively overcoming the inherent utility-forgetting trade-off.

#56 NoisePrints: Distortion-Free Watermarks for Authorship in Private Diffusion Models

privacyintellectual propertydiffusion

著者: Nir Goren, Oren Katzir, Abhinav Nakarmi, Eyal Ronen, Mahmood Sharif, Or Patashnik

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2510.13793

要約:
With the rapid adoption of diffusion models for visual content generation, proving authorship and protecting copyright have become critical. This challenge is particularly important when model owners keep their models private and may be unwilling or unable to handle authorship issues, making third-party verification essential. A natural solution is to embed watermarks for later verification. However, existing methods require access to model weights and rely on computationally heavy procedures, rendering them impractical and non-scalable. To address these challenges, we propose NoisePrints, a lightweight watermarking scheme that utilizes the random seed used to initialize the diffusion process as a proof of authorship without modifying the generation process. Our key observation is that the initial noise derived from a seed is highly correlated with the generated visual content. By incorporating a hash function into the noise sampling process, we further ensure that recovering a valid seed from the content is infeasible. We also show that sampling an alternative seed that passes verification is infeasible, and demonstrate the robustness of our method under various manipulations. Finally, we show how to use cryptographic zero-knowledge proofs to prove ownership without revealing the seed. By keeping the seed secret, we increase the difficulty of watermark removal. In our experiments, we validate NoisePrints on multiple state-of-the-art diffusion models for images and videos, demonstrating efficient verification using only the seed and output, without requiring access to model weights.

#57 Classport: Designing Runtime Dependency Introspection for Java

著者: Serena Cofano, Daniel Williams, Aman Sharma, Martin Monperrus

公開日: Wed, 15 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2510.20340

要約:
Runtime introspection of dependencies, i.e., the ability to observe which dependencies are currently used during program execution, is fundamental for Software Supply Chain security. Yet, Java has no support for it. We solve this problem with Classport, a blueprint and system that embeds dependency information into Java class files, enabling the retrieval of dependency information at runtime. We evaluate Classport on six real-world projects, demonstrating the feasibility in identifying dependencies at runtime.

cs.CR updates on arXiv.org

📋 論文タイトル一覧