arXiv論文一覧 - cs.CR updates on arXiv.org

#1 TAS-GNN: A Status-Aware Signed Graph Neural Network for Anomaly Detection in Bitcoin Trust Systems

著者: Chang Xue, Fang Liu, Jiaye Wang, Jinming Xing, Chen Yang

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13290

要約:
Decentralized financial platforms rely heavily on Web of Trust reputation systems to mitigate counterparty risk in the absence of centralized identity verification. However, these pseudonymous networks are inherently vulnerable to adversarial behaviors, such as Sybil attacks and camouflaged fraud, where malicious actors cultivate artificial reputations before executing exit scams. Traditional anomaly detection in this domain faces two critical limitations. First, reliance on naive statistical heuristics (e.g., flagging the lowest 5% of rated users) fails to distinguish between victims of bad-mouthing attacks and actual fraudsters. Second, standard Graph Neural Networks (GNNs) operate on the assumption of homophily and cannot effectively process the semantic inversion inherent in signed (trust vs. distrust) and directed (status) edges. We propose TAS-GNN (Topology-Aware Signed Graph Neural Network), a novel framework designed for feature-sparse signed networks like Bitcoin-Alpha. TAS-GNN integrates recursive Web-of-Trust labeling and a dual-channel message-passing architecture that separately models trust and distrust signals, fused through a Status-Aware Attention mechanism. Experiments demonstrate that TAS-GNN achieves state-of-the-art performance, significantly outperforming existing signed GNN baselines.

#2 Accelerating Suffix Jailbreak attacks with Prefix-Shared KV-cache

著者: Xinhai Wang, Shaopeng Fu, Shu Yang, Liangyu Wang, Tianhang Zheng, Di Wang

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13420

要約:
Suffix jailbreak attacks serve as a systematic method for red-teaming Large Language Models (LLMs) but suffer from prohibitive computational costs, as a large number of candidate suffixes need to be evaluated before identifying a jailbreak suffix. This paper presents Prefix-Shared KV Cache (PSKV), a plug-and-play inference optimization technique tailored for jailbreak suffix generation. Our method is motivated by a key observation that when performing suffix jailbreaking, while a large number of candidate prompts need to be evaluated, they share the same targeted harmful instruction as the prefix. Therefore, instead of performing redundant inference on the duplicated prefix, PSKV maintains a single KV cache for this prefix and shares it with every candidate prompt, enabling the parallel inference of diverse suffixes with minimal memory overhead. This design enables more aggressive batching strategies that would otherwise be limited by memory constraints. Extensive experiments on six widely used suffix attacks across five widely deployed LLMs demonstrate that PSKV reduces inference time by 40\% and peak memory usage by 50\%, while maintaining the original Attack Success Rate (ASR). The code has been submitted and will be released publicly.

#3 Agent Privilege Separation in OpenClaw: A Structural Defense Against Prompt Injection

agent

著者: Darren Cheng, Wen-Kwang Tsao

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13424

要約:
Prompt injection remains one of the most practical attack vectors against LLM-integrated applications. We replicate the Microsoft LLMail-Inject benchmark (Greshake et al., 2024) against current generation models running inside OpenClaw, an open source multitool agent platform. Our proposed defense combines two mechanisms: agent isolation, implemented as a privilege separated two-agent pipeline with tool partitioning, and JSON formatting, which produces structured output that strips persuasive framing before the action agent processes it. We run four experiments on the same 649 attacks that succeeded against our single-agent baseline. The full pipeline achieves 0 percent attack success rate (ASR) on the evaluated benchmark. Agent isolation alone achieves 0.31 percent ASR, approximately 323 times lower than the baseline. JSON formatting alone achieves 14.18 percent ASR, about 7.1 times lower. Our ablation study confirms that agent isolation is the dominant mechanism. JSON formatting provides additional hardening but is not sufficient on its own. The defense is structural: the action agent never receives raw injection content regardless of model behavior on any individual input.

#4 Technical Case Study of Privacy-Enhancing Technologies (PETs) for Public Health

privacy

著者: Avinash Laddha, Danil Mikhailov, Uyi Stewart

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13444

要約:
We present a technical case study on the Privacy-Enhancing Technologies (PETs) for Public Health Challenge, a collaborative effort to safely leverage sensitive private sector data for social impact, specifically pandemic management. The project utilized Differential Privacy (DP) to create realistic, privacy-preserved synthetic financial transaction data, which was then combined with public health and mobility datasets. This approach successfully addressed the critical hurdle of sharing sensitive financial information for research and policy. The analysis demonstrated that this synthetic, DP-protected data possesses significant spatial-temporal and predictive power for public health. Key outcomes include the development of six reusable tools and frameworks supporting diagnostic nowcasting (e.g., Hotspot Detection, Pandemic Adherence Monitoring) and predictive forecasting (e.g., Mobility Analysis, Contact Matrix Estimation) for epidemiological decision-making. The study provides best practices for advancing data sharing in a privacy-compliant manner.

#5 Purifying Generative LLMs from Backdoors without Prior Knowledge or Clean Reference

backdoor

著者: Jianwei Li, Jung-Eun Kim

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13461

要約:
Backdoor attacks pose severe security threats to large language models (LLMs), where a model behaves normally under benign inputs but produces malicious outputs when a hidden trigger appears. Existing backdoor removal methods typically assume prior knowledge of triggers, access to a clean reference model, or rely on aggressive finetuning configurations, and are often limited to classification tasks. However, such assumptions fall apart in real-world instruction-tuned LLM settings. In this work, we propose a new framework for purifying instruction-tuned LLM without any prior trigger knowledge or clean references. Through systematic sanity checks, we find that backdoor associations are redundantly encoded across MLP layers, while attention modules primarily amplify trigger signals without establishing the behavior. Leveraging this insight, we shift the focus from isolating specific backdoor triggers to cutting off the trigger-behavior associations, and design an immunization-inspired elimination approach: by constructing multiple synthetic backdoored variants of the given suspicious model, each trained with different malicious trigger-behavior pairs, and contrasting them with their clean counterparts. The recurring modifications across variants reveal a shared "backdoor signature"-analogous to antigens in a virus. Guided by this signature, we neutralize highly suspicious components in LLM and apply lightweight finetuning to restore its fluency, producing purified models that withstand diverse backdoor attacks and threat models while preserving generative capability.

#6 An Ideal Random Number Generator Based on Quantum Fluctuations and Rotating Wheel for Secure Image Encryption

著者: Subhadip Rana, Sanku Paul, Mrinal Kanti Mandal

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13472

要約:
In the era of digitization secure transmission of digital images has become essential in real world applications. Image encryption is an effective technique for protecting image data from unauthorized access. The security of encrypted data strongly depends on the quality of the random numbers used as the encryption key. In this paper, we proposed a hybrid random number generator based on quantum fluctuations and an algorithmically inspired rotating wheel. The wheel contains integer values from 0 to 255 that are shuffled using quantum fluctuations generated by time-evolving the quantum kicked rotor model. There are four pre-defined tapping positions in the rotating wheel to collect the number sequences. The wheel rotation speed is dynamically varied after each set of tapping to enhance unpredictability. The entropy of the number sequence obtained from the rotating wheel attains the ideal value of 8 (in an 8 bit representation). Further, the generated number sequences exhibit a flat histogram and nearly zero correlation, indicating strong randomness. The generated sequences are applied to the image encryption and analyzed cryptographically. Experimental results demonstrate a near ideal entropy of 7.997, an NPCR of 99.60%, low correlation in all directions, and low PSNR for encrypted images. These results confirm that the proposed random number generator achieves efficient and high-security performance, making it suitable for the security of consumer applications such as mobile healthcare imaging, biometric authentication, QR-based and multimedia communication on smart devices.

#7 CTI-REALM: Benchmark to Evaluate Agent Performance on Security Detection Rule Generation Capabilities

agent

著者: Arjun Chakraborty, Sandra Ho, Adam Cook, Manuel Mel\'endez

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13517

要約:
CTI-REALM (Cyber Threat Real World Evaluation and LLM Benchmarking) is a benchmark designed to evaluate AI agents' ability to interpret cyber threat intelligence (CTI) and develop detection rules. The benchmark provides a realistic environment that replicates the security analyst workflow. This enables agents to examine CTI reports, execute queries, understand schema structures, and construct detection rules. Evaluation involves emulated attacks of varying complexity across Linux systems, cloud platforms, and Azure Kubernetes Service (AKS), with ground truth data for accurate assessment. Agent performance is measured through both final detection results and trajectory-based rewards that capture decision-making effectiveness. This work demonstrates the potential of AI agents to support labor-intensive aspects of detection engineering. Our comprehensive evaluation of 16 frontier models shows that Claude Opus 4.6 (High) achieves the highest overall reward (0.637), followed by Claude Opus 4.5 (0.624) and the GPT-5 family. An ablation study confirms that CTI-specific tools significantly improve agent performance, a variance analysis across repeated runs demonstrates result stability. Finally, a memory augmentation study shows that seeded context can close 33\% of the performance gap between smaller and larger models.

#8 SecDTD: Dynamic Token Drop for Secure Transformers Inference

著者: Yifei Cai, Zhuoran Li, Yizhou Feng, Qiao Zhang, Hongyi Wu, Danella Zhao, Chunsheng Xin

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13670

要約:
The rapid adoption of Transformer-based AI has been driven by accessible models such as ChatGPT, which provide API-based services for developers and businesses. However, as these online inference services increasingly handle sensitive inputs, privacy concerns have emerged as a significant challenge. To address this, secure inference frameworks have been proposed, but their high computational and communication overhead often limit practical deployment. In plaintext settings, token drop is an effective technique for reducing inference cost; however, our analysis reveals that directly applying such methods to ciphertext scenarios is suboptimal due to distinct cost distributions in secure computation. We propose SecDTD, a dynamic token drop scheme tailored for secure Transformer inference. SecDTD advances token drop by shifting the dropping to earlier inference stages, effectively reducing the cost of key components such as Softmax. To support this, we introduce two core techniques. Max-Centric Normalization (MCN): A novel, Softmax-independent scoring method that enables early token drop with minimal overhead and improved normalization, supporting more aggressive dropping without accuracy loss. OMSel: A faster, oblivious median selection protocol that securely identifies the median of importance scores to support token drop. Compared to existing sorting-based methods, OMSel achieves a 16.9$\times$ speedup while maintaining security, obliviousness and randomness. We evaluate SecDTD through 48 experiments across eight GLUE datasets under various network settings using the BOLT and BumbleBee frameworks. SecDTD achieves 4.47 times end-to-end inference acceleration without degradation in accuracy.

#9 Hidden Risks of Unmonitored GPUs in Intelligent Transportation Systems

著者: Sefatun-Noor Puspa, Mashrur Chowdhury

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13675

要約:
Graphics processing units (GPUs) power many intelligent transportation systems (ITS) and automated driving applications, but remain largely unmonitored for safety and security. This article highlights GPU misuse as a critical blind spot, showing how unmanaged GPU workloads silently degrade real-time performance, demonstrating the need for stronger security measures in ITS.

#10 Graph Neural Network-Based DDoS Protection for Data Center Infrastructure

著者: Kartikeya Sharma, Craig Jacobik

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13694

要約:
In light of rising cybersecurity threats, data center providers face growing pressure to protect their own management infrastructure from Distributed Denial-of-Service (DDoS) attacks. While tenant-managed cages generally fall outside the data center's direct security purview, a successful DDoS assault on core provider systems can indirectly disrupt network services. To address this availability assault, the authors developed a Graph Neural Network (GNN) based detection system which leverages Graph U-Nets to automatically classify and mitigate DDoS traffic. Although the model was developed using open-source network flows rather than proprietary data center logs, the model effectively identifies multi-layer DDoS attacks that resemble the malicious patterns threatening modern data centers. Adopting this system to data center environments requires minimal changes to existing operational workflows and processes. Specifically, the GNN based system can be integrated at critical areas within a data center's network infrastructure. Our model achieved an F1 score of over 95% when evaluated on various open-source datasets, significantly reducing the likelihood of service disruptions and reputational damage. This Graph U-Nets architecture delivers unprecedented precision (98.5%) in complex cloud environments, thereby helping data center operators uphold reliable service availability and increase customer trust and goodwill in an era of increasingly sophisticated cyber threats.

#11 REAEDP: Entropy-Calibrated Differentially Private Data Release with Formal Guarantees and Attack-Based Evaluation

privacy

著者: Bo Ma, Jinsong Wu, Wei Qi Yan

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13709

要約:
Sensitive data release is vulnerable to output-side privacy threats such as membership inference, attribute inference, and record linkage. This creates a practical need for release mechanisms that provide formal privacy guarantees while preserving utility in measurable ways. We propose REAEDP, a differential privacy framework that combines entropy-calibrated histogram release, a synthetic-data release mechanism, and attack-based evaluation. On the theory side, we derive an explicit sensitivity bound for Shannon entropy, together with an extension to R\'enyi entropy, for adjacent histogram datasets, enabling calibrated differentially private release of histogram statistics. We further study a synthetic-data mechanism $\mathcal{F}$ with a privacy-test structure and show that it satisfies a formal differential privacy guarantee under the stated parameter conditions. On multiple public tabular datasets, the empirical entropy change remains below the theoretical bound in the tested regime, standard Laplace and Gaussian baselines exhibit comparable trends, and both membership-inference and linkage-style attack performance move toward random-guess behavior as the privacy parameter decreases. These results support REAEDP as a practically usable privacy-preserving release pipeline in the tested settings. Source code: https://github.com/mabo1215/REAEDP.git

#12 TableMark: A Multi-bit Watermark for Synthetic Tabular Data

synthetic dataintellectual property

著者: Yuyang Xia, Yaoqiang Xu, Chen Qian, Yang Li, Guoliang Li, Jianhua Feng

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13722

要約:
Watermarking has emerged as an effective solution for copyright protection of synthetic data. However, applying watermarking techniques to synthetic tabular data presents challenges, as tabular data can easily lose their watermarks through shuffling or deletion operations. The major challenge is to provide traceability for tracking multiple users of the watermarked tabular data while maintaining high data utility and robustness (resistance to attacks). To address this, we design a multi-bit watermarking scheme TableMark that encodes watermarks into synthetic tabular data, ensuring superior traceability and robustness while maintaining high utility. We formulate the watermark encoding process as a constrained optimization problem, allowing the data owner to effectively trade off robustness and utility. Additionally, we propose effective optimization mechanisms to solve this problem to enhance the data utility. Experimental results on four widely used real-world datasets show that TableMark effectively traces a large number of users, is resilient to attacks, and preserves high utility. Moreover, TableMark significantly outperforms state-of-the-art tabular watermarking schemes.

#13 Ransomware and Artificial Intelligence: A Comprehensive Systematic Review of Reviews

著者: Therdpong Daengsi, Phisit Pornpongtechavanich, Paradorn Boonpoor, Kathawut Wattanachukul, Korn Puangnak, Kritphon Phanrattanachai, Pongpisit Wuttidittachotti, Paramate Horkaew

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13734

要約:
This study provides a comprehensive synthesis of Artificial Intelligence (AI), especially Machine Learning (ML) and Deep Learning (DL), in ransomware defense. Using a "review of reviews" methodology based on PRISMA, this paper gathers insights on how AI is transforming ransomware detection, prevention, and mitigation strategies during the past five years (2020-2024). The findings highlight the effectiveness of hybrid models that combine multiple analysis techniques such as code inspection (static analysis) and behavior monitoring during execution (dynamic analysis). The study also explores anomaly detection and early warning mechanisms before encryption to address the increasing complexity of ransomware. In addition, it examines key challenges in ransomware defense, including techniques designed to deceive AI-driven detection systems and the lack of strong and diverse datasets. The results highlight the role of AI in early detection and real-time response systems, improving scalability and resilience. Using a systematic review-of-reviews approach, this study consolidates insights from multiple review articles, identifies effective AI models, and bridges theory with practice to support collaboration among academia, industry, and policymakers. Future research directions and practical recommendations for cybersecurity practitioners are also discussed. Finally, this paper proposes a roadmap for advancing AI-driven countermeasures to protect critical systems and infrastructures against evolving ransomware threats.

#14 Unlinkability and History Preserving Bisimilarity

著者: Cl\'ement Aubert, Ross Horne, Christian Johansen, Sjouke Mauw

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13735

要約:
An ever-increasing number of critical infrastructures rely heavily on the assumption that security protocols satisfy a wealth of requirements. Hence, the importance of certifying e.g., privacy properties using methods that are better at detecting attacks can hardly be overstated. This paper scrutinises the "unlinkability" privacy property using relations equating behaviours that cannot be distinguished by attackers. Starting from the observation that some reasonable design choice can lead to formalisms missing attacks, we draw attention to a classical concurrent semantics accounting for relationship between past events, and show that there are concurrency-aware semantics that can discover attacks on all protocols we consider.More precisely, we focus on protocols where trace equivalence is known to miss attacks that are observable using branching-time equivalences. We consider the impact of three dimensions: design decisions made by the programmer specifying an unlinkability problem (style), semantics respecting choices during execution (branching-time), and semantics sensitive to concurrency (non-interleaving), and discover that reasonable styles miss attacks unless we give attackers enough power to observe choices and concurrency. Our main contribution is to draw attention to how a popular concurrent semantics -- history-preserving bisimilarity -- when defined for the non-interleaving applied $\pi$-calculus, can discover attacks on all protocols we consider, regardless of the choice of style. Furthermore, we can describe all such attacks using a novel modal logic that is hence suitable to formally certify attacks on privacy properties.

#15 Switching Coordinator: An SDN Application for Flexible QKD-Networks

著者: Rub\'en B. Mendez, Hans H. Brunner, Juan P. Brito, Hamid Taramit, Chi-Hang Fred Fung, Antonio Pastor, Rafael Cant\'o, Jes\'us Folgueira, Diego R. Lopez, Momtchil Peev, Vicente Martin

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13812

要約:
A monitor and control framework for quantum-key-distribution (QKD) networks equipped with switching capabilities was developed. On the one hand, this framework provides real-time visibility into operational metrics. Specifically, it extracts essential data, such as the switching capabilities of QKD modules, the number of keys stored in buffer queues of the QKD links, and the respective key generation and consumption rates along these links. On the other hand, this framework allows software-defined networking (SDN) applications to operate on the collected information and address the cryptographic needs of the network. The SDN applications dynamically adapt the configuration of the switched network to align with its changing demands, e.g.,~prioritizing key availability on critical paths, responding to link failures, or reallocating generation capacity to prevent bottlenecks. This contribution demonstrates that the combination of switched QKD, centralized control, and global optimization strategies enables efficient, policy-driven operation of QKD networks. The cryptographic resources are allocated to maximize performance and resilience while remaining aligned with the specific policies set by network administrators.

#16 Sirens' Whisper: Inaudible Near-Ultrasonic Jailbreaks of Speech-Driven LLMs

著者: Zijian Ling, Pingyi Hu, Xiuyong Gao, Xiaojing Ma, Man Zhou, Jun Feng, Songfeng Lu, Dongmei Zhang, Bin Benjamin Zhu

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13847

要約:
Speech-driven large language models (LLMs) are increasingly accessed through speech interfaces, introducing new security risks via open acoustic channels. We present Sirens' Whisper (SWhisper), the first practical framework for covert prompt-based attacks against speech-driven LLMs under realistic black-box conditions using commodity hardware. SWhisper enables robust, inaudible delivery of arbitrary target baseband audio-including long and structured prompts-on commodity devices by encoding it into near-ultrasound waveforms that demodulate faithfully after acoustic transmission and microphone nonlinearity. This is achieved through a simple yet effective approach to modeling nonlinear channel characteristics across devices and environments, combined with lightweight channel-inversion pre-compensation. Building on this high-fidelity covert channel, we design a voice-aware jailbreak generation method that ensures intelligibility, brevity, and transferability under speech-driven interfaces. Experiments across both commercial and open-source speech-driven LLMs demonstrate strong black-box effectiveness. On commercial models, SWhisper achieves up to 0.94 non-refusal (NR) and 0.925 specific-convincing (SC). A controlled user study further shows that the injected jailbreak audio is perceptually indistinguishable from background-only playback for human listeners. Although jailbreaks serve as a case study, the underlying covert acoustic channel enables a broader class of high-fidelity prompt-injection and commandexecution attacks.

#17 Inevitable Encounters: Backdoor Attacks Involving Lossy Compression

backdoor

著者: Qian Li, Yunuo Chen, Yuntian Chen

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13864

要約:
Real-world backdoor attacks often require poisoned datasets to be stored and transmitted before being used to compromise deep learning systems. However, in the era of big data, the inevitable use of lossy compression poses a fundamental challenge to invisible backdoor attacks. We find that triggers embedded in RGB images often become ineffective after the images are lossily compressed into binary bitstreams (e.g., JPEG files) for storage and transmission. As a result, the poisoned data lose its malicious effect after compression, causing backdoor injection to fail. In this paper, we highlight the necessity of explicitly accounting for the lossy compression process in backdoor attacks. This requires attackers to ensure that the transmitted binary bitstreams preserve malicious trigger information, so that effective triggers can be recovered in the decompressed data. Building on the region-of-interest (ROI) coding mechanism in image compression, we propose two poisoning strategies tailored to inevitable lossy compression. First, we introduce Universal Attack Activation, a universal method that uses sample-specific ROI masks to reactivate trigger information in binary bitstreams for learned image compression (LIC). Second, we present Compression-Adapted Attack, a new attack strategy that employs customized ROI masks to encode trigger information into binary bitstreams and is applicable to both traditional codecs and LIC. Extensive experiments demonstrate the effectiveness of both strategies.

#18 CONFETTY: A Tool for Enforcement and Data Confidentiality on Blockchain-Based Processes

著者: Michele Kryston, Edoardo Marangone, Alessandro Marcelletti, Claudio Di Ciccio

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13900

要約:
Blockchain technology enforces the security, robustness, and traceability of operations of Process-Aware Information Systems (PAISs). In particular, transparency ensures that all data is publicly available, fostering trust among participants in the system. Although this is a crucial property to enable notarization and auditing, it hinders the adoption of blockchain in scenarios where confidentiality is required, as sensitive data is handled. Current solutions rely on cryptographic techniques or consortium blockchains, hindering the enforcement capabilities of smart contracts and the public verifiability of transactions. This work presents the CONFETTY open-source web application, a platform for public-blockchain based process execution that preserves data confidentiality and operational transparency. We use smart contracts to enact, enforce, and store public interactions, while we adopt attribute-based encryption techniques for fine-grained access to confidential information. This approach effectively balances the transparency inherent in public blockchains with the enforcement of the business logic.

#19 On secret sharing from extended norm-trace curves

著者: Olav Geil

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.14009

要約:
In [4] Camps-Moreno et al. treated (relative) generalized Hamming weights of codes from extended norm-trace curves and they gave examples of resulting good asymmetric quantum error-correcting codes employing information on the relative distances. In the present paper we study ramp secret sharing schemes which are objects that require an analysis of higher relative weights and we show that not only do schemes defined from one-point algebraic geometric codes from extended norm-trace curves have good parameters, they also posses a second layer of security along the lines of [11]. It is left undecided in [4, page 2889] if the ``footprint-like approach'' as employed by Camps-Moreno herein is strictly better for codes related to extended norm-trace codes than the general approach for treating one-point algebraic geometric codes and their likes as presented in [12]. We demonstrate that the method used in [4] to estimate (relative) generalized Hamming weights of codes from extended norm-trace curves can be viewed as a clever application of the enhanced Goppa bound in [12] rather than a competing approach.

#20 Sovereign-OS: A Charter-Governed Operating System for Autonomous AI Agents with Verifiable Fiscal Discipline

agent

著者: Aojie Yuan, Haiyue Zhang, Ziyi Wang, Yue Zhao

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.14011

要約:
As AI agents evolve from text generators into autonomous economic actors that accept jobs, manage budgets, and delegate to sub-agents, the absence of runtime governance becomes a critical gap. Existing frameworks orchestrate agent behavior but impose no fiscal constraints, require no earned permissions, and offer no tamper-evident audit trail. We introduce Sovereign-OS, a governance-first operating system that places every agent action under constitutional control. A declarative Charter (YAML) defines mission scope, fiscal boundaries, and success criteria. A CEO (Strategist) decomposes goals into dependency-aware task DAGs; a CFO (Treasury) gates each expenditure against budget caps, daily burn limits, and profitability floors via an auction-based bidding engine; Workers operate under earned-autonomy permissions governed by a dynamic TrustScore; and an Auditor (ReviewEngine) verifies outputs against Charter KPIs, sealing each report with a SHA-256 proof hash. Across our evaluation suite, Sovereign-OS blocks 100% of fiscal violations (30 scenarios), achieves 94% correct permission gating (200 trust-escalation missions), and maintains zero integrity failure over 1,200+ audit reports. The system further integrates Stripe for real-world payment processing, closing the loop from task planning to revenue collection. Our live demonstration walks through three scenarios: loading distinct Charters to observe divergent agent behavior, triggering CFO fiscal denials under budget and profitability constraints, and escalating a new worker's TrustScore from restricted to fully authorized with on-the-spot cryptographic audit verification.

#21 Missing Mass for Differentially Private Domain Discovery

privacy

著者: Travis Dick, Matthew Joseph, Vinod Raman

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.14016

要約:
We study several problems in differentially private domain discovery, where each user holds a subset of items from a shared but unknown domain, and the goal is to output an informative subset of items. For set union, we show that the simple baseline Weighted Gaussian Mechanism (WGM) has a near-optimal $\ell_1$ missing mass guarantee on Zipfian data as well as a distribution-free $\ell_\infty$ missing mass guarantee. We then apply the WGM as a domain-discovery precursor for existing known-domain algorithms for private top-$k$ and $k$-hitting set and obtain new utility guarantees for their unknown domain variants. Finally, experiments demonstrate that all of our WGM-based methods are competitive with or outperform existing baselines for all three problems.

#22 Towards Agentic Honeynet Configuration

agent

著者: Federico Mirra, Matteo Boffa, Idilio Drago, Danilo Giordano, Marco Mellia

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.14122

要約:
Honeypots are deception systems that emulate vulnerable services to collect threat intelligence. While deploying many honeypots increases the opportunity to observe attacker behaviour, in practise network and computational resources limit the number of honeypots that can be exposed. Hence, practitioners must select the assets to deploy, a decision that is typically made statically despite attackers' tactics evolving over time. This work investigates an AI-driven agentic architecture that autonomously manages honeypot exposure in response to ongoing attacks. The proposed agent analyses Intrusion Detection System (IDS) alerts and network state to infer the progression of the attack, identify compromised assets, and predict likely attacker targets. Based on this assessment, the agent dynamically reconfigures the system to maintain attacker engagement while minimizing unnecessary exposure. The approach is evaluated in a simulated environment where attackers execute Proof-of-Concept exploits for known CVEs. Preliminary results indicate that the agent can effectively infer the intent of the attacker and improve the efficiency of exposure under resource constraints

#23 Experimental Evaluation of Security Attacks on Self-Driving Car Platforms

著者: Viet K. Nguyen, Nathan Lee, Mohammad Husain

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.14124

要約:
Deep learning-based perception pipelines in autonomous ground vehicles are vulnerable to both adversarial manipulation and network-layer disruption. We present a systematic, on-hardware experimental evaluation of five attack classes: FGSM, PGD, man-in-the-middle (MitM), denial-of-service (DoS), and phantom attacks on low-cost autonomous vehicle platforms (JetRacer and Yahboom). Using a standardized 13-second experimental protocol and comprehensive automated logging, we systematically characterize three dimensions of attack behavior:(i) control deviation, (ii) computational cost, and (iii) runtime responsiveness. Our analysis reveals that distinct attack classes produce consistent and separable "fingerprints" across these dimensions: perception attacks (MitM output manipulation and phantom projection) generate high steering deviation signatures with nominal computational overhead, PGD produces combined steering perturbation and computational load signatures across multiple dimensions, and DoS exhibits frame rate and latency degradation signatures with minimal control-plane perturbation. We demonstrate that our fingerprinting framework generalizes across both digital attacks (adversarial perturbations, network manipulation) and environmental attacks (projected false features), providing a foundation for attack-aware monitoring systems and targeted, signature-based defense mechanisms.

#24 Membership Inference for Contrastive Pre-training Models with Text-only PII Queries

privacy

著者: Ruoxi Cheng, Yizhong Ding, Hongyi Zhang, Yiyan Huang

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.14222

要約:
Contrastive pretraining models such as CLIP and CLAP underpin many vision-language and audio-language systems, yet their reliance on web-scale data raises growing concerns about memorizing Personally Identifiable Information (PII). Auditing such models via membership inference is challenging in practice: shadow-model MIAs are computationally prohibitive for large multimodal backbones, and existing multimodal attacks typically require querying the target with paired biometric inputs, thereby directly exposing sensitive biometric information to the target model. We propose Unimodal Membership Inference Detector (UMID), a text-only auditing framework that performs text-guided cross-modal latent inversion and extracts two complementary signals, similarity (alignment to the queried text) and variability (consistency across randomized inversions). UMID compares these statistics to a lightweight non-member reference constructed from synthetic gibberish and makes decisions via an ensemble of unsupervised anomaly detectors. Comprehensive experiments across diverse CLIP and CLAP architectures demonstrate that UMID significantly improves the effectiveness and efficiency over prior MIAs, delivering strong detection performance with sub-second auditing cost while complying with realistic privacy constraints.

#25 Activation Surgery: Jailbreaking White-box LLMs without Touching the Prompt

著者: Ma\"el Jenny, J\'er\'emie Dentan, Sonia Vanier, Micha\"el Krajecki

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.14278

要約:
Most jailbreak techniques for Large Language Models (LLMs) primarily rely on prompt modifications, including paraphrasing, obfuscation, or conversational strategies. Meanwhile, abliteration techniques (also known as targeted ablations of internal components) have been used to study and explain LLM outputs by probing which internal structures causally support particular responses. In this work, we combine these two lines of research by directly manipulating the model's internal activations to alter its generation trajectory without changing the prompt. Our method constructs a nearby benign prompt and performs layer-wise activation substitutions using a sequential procedure. We show that this activation surgery method reveals where and how refusal arises, and prevents refusal signals from propagating across layers, thereby inhibiting the model's safety mechanisms. Finally, we discuss the security implications for open-weights models and instrumented inference environments.

#26 AEX: Non-Intrusive Multi-Hop Attestation and Provenance for LLM APIs

著者: Yongjie Guan

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.14283

要約:
Hosted large language models are increasingly accessed through remote APIs, but the API boundary still offers little direct evidence that a returned output actually corresponds to the client-visible request. Recent audits of shadow APIs show that unofficial or intermediary endpoints can diverge from claimed behavior, while existing approaches such as fingerprinting, model-equality testing, verifiable inference, and TEE attestation either remain inferential or answer different questions. We propose AEX, a non-intrusive attestation extension for existing JSON-based LLM APIs. AEX preserves request, response, tool-calling, streaming, and error semantics, and instead adds a signed top-level attestation object that binds a client-visible request projection to either a complete response object or a committed streaming output. To support realistic deployments, AEX provides explicit request-binding modes, signed request-transform receipts for trusted intermediaries, and source-output / output-transform receipts for trusted output rewriting. For streaming, it separates checkpoint proofs for verified prefixes of an unmodified source stream from complete-output lineage for outputs that have been rewritten, buffered, aggregated, or re-packaged, preventing transformed outputs from being mistaken for source-stream prefixes. AEX therefore makes a deliberately narrow claim: a trusted issuer attests to a specific request-output relation, or to a specific complete-output lineage, at the API boundary. We present the protocol design, threat model, verification state machine, security and privacy analysis, an OpenAI-compatible chat-completions profile, and a reference TypeScript prototype with local conformance tests and microbenchmarks.

#27 Governing Dynamic Capabilities: Cryptographic Binding and Reproducibility Verification for AI Agent Tool Use

agent

著者: Ziling Zhou

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.14332

要約:
AI agents dynamically acquire capabilities at runtime via MCP and A2A, yet no framework detects when capabilities change post-authorization. We term this the capability-identity gap}: it enables silent capability escalation and violates EU AI Act traceability requirements. We propose three mechanisms. Capability-bound agent certificates extend X.509 v3 with a skills manifest hash; any tool change invalidates the certificate. Reproducibility commitments leverage LLM inference near-determinism for post-hoc replay verification. A verifiable interaction ledger provides hash-linked, signed records for multi-agent forensic reconstruction. We formalize nine security properties and prove they hold under a realistic adversary model. Our Rust prototype achieves 97us certificate verification (<1ns capability binding overhead, ~1,200,000 faster than BAID's zkVM), 0.62ms total governance overhead per tool call (0.1--1.2% of typical latency), and 4.7X separation from cross-provider outputs (Cohen's d > 1.0 on all four metrics), with best classification at F_1=0.876 (Jaccard, \theta=0.408); single-provider deployments achieve F_1=0.990 with 11.5 times separation. We evaluate 12 attack scenarios -- silent escalation, tool trojanization, phantom delegation, evidence tampering, collusion, and runtime behavioral attacks validated against NVIDIA's Nemotron-AIQ traces -- each detected with a traceable mechanism, while the MCP+OAuth 2.1 baseline detects none. An end-to-end evaluation over a 5-to-20-agent pipeline with real LLM calls confirms that full governance (G1--G3) adds ~10.8ms per pipeline run (0.12% overhead), scales sub-linearly per agent, and detects all five in-situ attacks with zero false positives.

#28 Generation of Human Comprehensible Access Control Policies from Audit Logs

著者: Gautam Kumar (Indian Institute of Technology Kharagpur, India), Ravi Sundaram (Northeastern University, Boston, USA), Shamik Sural (Indian Institute of Technology Kharagpur, India)

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.14341

要約:
Over the years, access control systems have become increasingly more complex, often causing a disconnect between what is envisaged by the stakeholders in decision-making positions and the actual permissions granted as evidenced from access logs. For instance, Attribute-based Access Control (ABAC), which is a flexible yet complex model typically configured by system security officers, can be made understandable to others only when presented at a high level in natural language. Although several algorithms have been proposed in the literature for automatic extraction of ABAC rules from access logs, there is no attempt yet to bridge the semantic gap between the machine-enforceable formal logic and human-centric policy intent. Our work addresses this problem by developing a framework that generates human understandable natural language access control policies from logs. We investigate to what extent the power of Large Language Models (LLMs) can be harnessed to achieve both accuracy and scalability in the process. Named LANTERN (LLM-based ABAC Natural Translation and Explanation for Rule Navigation), we have instantiated the framework as a publicly accessible web based application for reproducibility of our results.

#29 Toward Secure Web to ERP Payment Flows: A Case Study of HTTP Header Trust Failures in SAP Based Systems

著者: Vick Dini

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.14365

要約:
Electronic banking portals often sit in front of enterprise resource planning (ERP) systems such as SAP, mediating payment requests between users and back end financial infrastructure. When these integrations place excessive trust in client supplied HTTP metadata, subtle design flaws can arise that undermine payment integrity. This article presents a retrospective, anonymized case study of an SAP based payment flow in which weaknesses in HTTP level validation allowed the front end application to incorrectly treat unpaid transactions as completed. Rather than provide a reproducible exploit, we abstract the scenario into a general vulnerability pattern, analyze contributing architectural decisions, and propose concrete design and verification practices for secure web to ERP payment processing. The discussion emphasizes formalizing payment state machines, strengthening trust boundaries, and incorporating regular security review into integration projects.

#30 Oblivis: A Framework for Delegated and Efficient Oblivious Transfer

著者: Aydin Abadi, Yvo Desmedt

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.14492

要約:
As database deployments shift toward cloud platforms and edge devices, thin clients need to securely retrieve sensitive records without leaking their query intent or metadata to the proxies that mediate access. Oblivious Transfer (OT) is a core tool for private retrieval, yet existing OTs assume direct client-database interaction and lack support for delegated querying or lightweight clients. We present Oblivis, a modular framework of new OT protocols that enable delegated, privacy-preserving query execution. Oblivis allows clients to retrieve database records without direct access, protects against leakage to both databases and proxies, and is designed with practical efficiency in mind. Its components include: (1) Delegated-Query OT, which permits secure outsourcing of query generation; (2) Multi-Receiver OT for merged, cloud-hosted databases; (3) a compiler producing constant-size responses suitable for thin clients; and (4) Supersonic OT, a proxy-based, informationtheoretic, and highly efficient 1-out-of-2 OT. The protocols are formally defined and proven secure in the simulation-based paradigm, under non-colluding assumption. We implement and empirically evaluate Supersonic OT. It achieves at least a 92x speedup over a highly efficient 1-out-of-2 OT, and a 2.6x-106x speedup over a standard OT extension across 200-100,000 invocations. Our implementation further shows that Supersonic OT remains efficient even on constrained hardware, e.g., it completes an end-to-end transfer in 1.36 ms on a Raspberry Pi 4.

#31 When Scanners Lie: Evaluator Instability in LLM Red-Teaming

著者: Lidor Erez, Omer Hofman, Tamir Nizri, Roman Vainshtein

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.14633

要約:
Automated LLM vulnerability scanners are increasingly used to assess security risks by measuring different attack type success rates (ASR). Yet the validity of these measurements hinges on an often-overlooked component: the evaluator who determines whether an attack has succeeded. In this study, we demonstrate that commonly used open-source scanners exhibit measurement instability that depends on the evaluator component. Consequently, changing the evaluator while keeping the attacks and model outputs constant can significantly alter the reported ASR. To tackle this problem, we present a two-phase, reliability-aware evaluation framework. In the first phase, we quantify evaluator disagreement to identify attack categories where ASR reliability cannot be assumed. In the second phase, we propose a verification-based evaluation method where evaluators are validated by an independent verifier, enabling reliability assessment without relying on extensive human annotation. Applied to the widely used Garak scanner, we observe that 22 of 25 attack categories exhibit evaluator instability, reflected in high disagreement among evaluators. Our approach raises evaluator accuracy from 72% to 89% while enabling selective deployment to control cost and computational overhead. We further quantify evaluator uncertainty in ASR estimates, showing that reported vulnerability scores can vary by up to 33% depending on the evaluator. Our results indicate that the outputs of vulnerability scanners are highly sensitive to the choice of evaluators. Our framework offers a practical approach to quantify unreliable evaluations and enhance the reliability of measurements in automated LLM security assessments.

#32 $p^2$RAG: Privacy-Preserving RAG Service Supporting Arbitrary Top-$k$ Retrieval

privacy

著者: Yulong Ming, Mingyue Wang, Jijia Yang, Cong Wang, Xiaohua Jia

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.14778

要約:
Retrieval-Augmented Generation (RAG) enables large language models to use external knowledge, but outsourcing the RAG service raises privacy concerns for both data owners and users. Privacy-preserving RAG systems address these concerns by performing secure top-$k$ retrieval, which typically is secure sorting to identify relevant documents. However, existing systems face challenges supporting arbitrary $k$ due to their inability to change $k$, new security issues, or efficiency degradation with large $k$. This is a significant limitation because modern long-context models generally achieve higher accuracy with larger retrieval sets. We propose $p^2$RAG, a privacy-preserving RAG service that supports arbitrary top-$k$ retrieval. Unlike existing systems, $p^2$RAG avoids sorting candidate documents. Instead, it uses an interactive bisection method to determine the set of top-$k$ documents. For security, $p^2$RAG uses secret sharing on two semi-honest non-colluding servers to protect the data owner's database and the user's prompt. It enforces restrictions and verification to defend against malicious users and tightly bound the information leakage of the database. The experiments show that $p^2$RAG is 3--300$\times$ faster than the state-of-the-art PRAG for $k = 16$--$1024$.

#33 Architecture-Agnostic Feature Synergy for Universal Defense Against Heterogeneous Generative Threats

著者: Bingxue Zhang, Yang Gao, Feida Zhu, Yanyan Shen, Yang Shi

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.14860

要約:
Generative AI deployment poses unprecedented challenges to content safety and privacy. However, existing defense mechanisms are often tailored to specific architectures (e.g., Diffusion Models or GANs), creating fragile "defense silos" that fail against heterogeneous generative threats. This paper identifies a fundamental optimization barrier in naive pixel-space ensemble strategies: due to divergent objective functions, pixel-level gradients from heterogeneous generators become statistically orthogonal, causing destructive interference. To overcome this, we observe that despite disparate low-level mechanisms, high-level feature representations of generated content exhibit alignment across architectures. Based on this, we propose the Architecture-Agnostic Targeted Feature Synergy (ATFS) framework. By introducing a target guidance image, ATFS reformulates multi-model defense as a unified feature space alignment task, enabling intrinsic gradient alignment without complex rectification. Extensive experiments show ATFS achieves SOTA protection in heterogeneous scenarios (e.g., Diffusion+GAN). It converges rapidly, reaching over 90% performance within 40 iterations, and maintains strong attack potency even under tight perturbation budgets. The framework seamlessly extends to unseen architectures (e.g., VQ-VAE) by switching the feature extractor, and demonstrates robust resistance to JPEG compression and scaling. Being computationally efficient and lightweight, ATFS offers a viable pathway to dismantle defense silos and enable universal generative security. Code and models are open-sourced for reproducibility.

#34 Fine-tuning RoBERTa for CVE-to-CWE Classification: A 125M Parameter Model Competitive with LLMs

著者: Nikita Mosievskiy

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.14911

要約:
We present a fine-tuned RoBERTa-base classifier (125M parameters) for mapping Common Vulnerabilities and Exposures (CVE) descriptions to Common Weakness Enumeration (CWE) categories. We construct a large-scale training dataset of 234,770 CVE descriptions with AI-refined CWE labels using Claude Sonnet 4.6, and agreement-filtered evaluation sets where NVD and AI labels agree. On our held-out test set (27,780 samples, 205 CWE classes), the model achieves 87.4% top-1 accuracy and 60.7% Macro F1 -- a +15.5 percentage-point Macro F1 gain over a TF-IDF baseline that already reaches 84.9% top-1, demonstrating the model's advantage on rare weakness categories. On the external CTI-Bench benchmark (NeurIPS 2024), the model achieves 75.6% strict accuracy (95% CI: 72.8-78.2%) -- statistically indistinguishable from Cisco Foundation-Sec-8B-Reasoning (75.3%, 8B parameters) at 64x fewer parameters. We release the dataset, model, and training code.

#35 Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework

intellectual property

著者: Zhuoshang Wang, Yubing Ren, Yanan Cao, Fang Fang, Xiaoxue Li, Li Guo

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.14968

要約:
While watermarking serves as a critical mechanism for LLM provenance, existing secret-key schemes tightly couple detection with injection, requiring access to keys or provider-side scheme-specific detectors for verification. This dependency creates a fundamental barrier for real-world governance, as independent auditing becomes impossible without compromising model security or relying on the opaque claims of service providers. To resolve this dilemma, we introduce TTP-Detect, a pioneering black-box framework designed for non-intrusive, third-party watermark verification. By decoupling detection from injection, TTP-Detect reframes verification as a relative hypothesis testing problem. It employs a proxy model to amplify watermark-relevant signals and a suite of complementary relative measurements to assess the alignment of the query text with watermarked distributions. Extensive experiments across representative watermarking schemes, datasets and models demonstrate that TTP-Detect achieves superior detection performance and robustness against diverse attacks.

#36 From Storage to Steering: Memory Control Flow Attacks on LLM Agents

agent

著者: Zhenlin Xu, Xiaogang Zhu, Yu Yao, Minhui Xue, Yiliao Song

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.15125

要約:
Modern agentic systems allow Large Language Model (LLM) agents to tackle complex tasks through extensive tool usage, forming structured control flows of tool selection and execution. Existing security analyses often treat these control flows as ephemeral, one-off sessions, overlooking the persistent influence of memory. This paper identifies a new threat from Memory Control Flow Attacks (MCFA) that memory retrieval can dominate the control flow, forcing unintended tool usage even against explicit user instructions and inducing persistent behavioral deviations across tasks. To understand the impact of this vulnerability, we further design MEMFLOW, an automated evaluation framework that systematically identifies and quantifies MCFA across heterogeneous tasks and long interaction horizons. To evaluate MEMFLOW, we attack state-of-the-art LLMs, including GPT-5 mini, Claude Sonnet 4.5 and Gemini 2.5 Flash on real-world tools from two major LLM agent development frameworks, LangChain and LlamaIndex. The results show that in general over 90% trials are vulnerable to MCFA even under strict safety constraints, highlighting critical security risks that demand immediate attention.

#37 vCause: Efficient and Verifiable Causality Analysis for Cloud-based Endpoint Auditing

著者: Qiyang Song, Qihang Zhou, Xiaoqi Jia, Zhenyu Song, Wenbo Jiang, Heqing Huang, Yong Liu, Dan Meng

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.15216

要約:
In cloud-based endpoint auditing, security administrators often rely on the cloud to perform causality analysis over log-derived versioned provenance graphs to investigate suspicious attack behaviors. However, the cloud may be distrusted or compromised by attackers, potentially manipulating the final causality analysis results. Consequently, administrators may not accurately understand attack behaviors and fail to implement effective countermeasures. This risk underscores the need for a defense scheme to ensure the integrity of causality analysis. While existing tamper-evident logging schemes and trusted execution environments show promise for this task, they are not specifically designed to support causality analysis and thus face inherent security and efficiency limitations. This paper presents vCause, an efficient and verifiable causality analysis system for cloud-based endpoint auditing. vCause integrates two authenticated data structures: a graph accumulator and a verifiable provenance graph. The data structures enable validation of two critical steps in causality analysis: (i) querying a point-of-interest node on a versioned provenance graph, and (ii) identifying its causally related components. Formal security analysis and experimental evaluation show that vCause can achieve secure and verifiable causality analysis with only <1% computational overhead on endpoints and 3.36% on the cloud.

#38 Comparative Analysis of SRAM PUF Temperature Susceptibility on Embedded Systems

著者: Martina Zeinzinger, Josef Langer, Florian Eibensteiner, Phillip Petz, Lucas Drack, Daniel Dorfmeister, Rudolf Ramler

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.15320

要約:
An SRAM Physical Unclonable Function (PUF) can distinguish SRAM modules by analyzing the inherent randomness of their start-up behavior. However, the effectiveness of this technique varies depending on the design and fabrication of the SRAM module. This study compares two similar microcontrollers, both equipped with on-chip SRAM, to determine which device produces a better SRAM PUF. Both microcontrollers are programmed with an identical SRAM PUF authentication routine and tested under varying ambient temperatures (ranging from 10 {\deg}C to 50 {\deg}C) to evaluate the impact of temperature on SRAM PUF performance. One embedded SRAM works significantly better than the other, even though the two models are closely related. The presented results can be used early in the design process to compare arbitrary on-chip SRAM models and see which is best suited for implementing an SRAM PUF.

#39 Unsupervised Cross-Protocol Anomaly Analysis in Mobile Core Networks via Multi-Embedding Models Consensus

著者: Aayush Garg, Orlando Amaral Cejas

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.15344

要約:
Mobile core networks rely on several signalling protocols in parallel, such as SS7, Diameter, and GTP, so many security-relevant problems become visible only when their interactions are analyzed jointly. At the same time, labeled examples of real attacks and cross-protocol misconfigurations are scarce, which complicates supervised detection. We therefore study unsupervised cross-protocol anomaly analysis on fused representations that combine SS7, Diameter, and GTP signalling. For each subscriber, we aggregate messages into per-minute fused records, serialize each record as text, embed it with several models, and apply unsupervised anomaly detection. We then assign each record a consensus score equal to the number of embedding models that flag it as anomalous. For evaluation, we generate cross-protocol-plausible synthetic anomalies by swapping one field group at a time between pairs of records, preserving per-message validity while making the fused view contradictory. On 219,294 fused records, 44.15% are flagged by at least one model, but only 0.97% reach full agreement across all six. Higher consensus is strongly associated with synthetic records, where for k=1-4 the odds that a flagged record is synthetic are hundreds of times greater than for original records, and for k>=5 all flagged records are synthetic, with extremely small p-values. Cosine distances between synthetic and original records also increase with consensus, suggesting clearer separation in embedding space. These results support the use of multi-embedding consensus to prioritize a much smaller set of candidate cross-protocol inconsistencies for further inspection.

#40 SFCoT: Safer Chain-of-Thought via Active Safety Evaluation and Calibration

著者: Yu Pan, Wenlong Yu, Tiejun Wu, Xiaohu Ye, Qiannan Si, Guangquan Xu, Bin Wu

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.15397

要約:
Large language models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks. However, they remain highly susceptible to jailbreak attacks that undermine their safety alignment. Existing defense mechanisms typically rely on post hoc filtering applied only to the final output, leaving intermediate reasoning steps unmonitored and vulnerable to adversarial manipulation. To address this gap, this paper proposes a SaFer Chain-of-Thought (SFCoT) framework, which proactively evaluates and calibrates potentially unsafe reasoning steps in real time. SFCoT incorporates a three-tier safety scoring system alongside a multi-perspective consistency verification mechanism, designed to detect potential risks throughout the reasoning process. A dynamic intervention module subsequently performs targeted calibration to redirect reasoning trajectories toward safe outcomes. Experimental results demonstrate that SFCoT reduces the attack success rate from $58.97\%$ to $12.31\%$, demonstrating it as an effective and efficient LLM safety enhancement method without a significant decline in general performance.

#41 TrinityGuard: A Unified Framework for Safeguarding Multi-Agent Systems

agent

著者: Kai Wang, Biaojie Zeng, Zeming Wei, Chang Jin, Hefeng Zhou, Xiangtian Li, Chao Yang, Jingjing Qu, Xingcheng Xu, Xia Hu

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.15408

要約:
With the rapid development of LLM-based multi-agent systems (MAS), their significant safety and security concerns have emerged, which introduce novel risks going beyond single agents or LLMs. Despite attempts to address these issues, the existing literature lacks a cohesive safeguarding system specialized for MAS risks. In this work, we introduce TrinityGuard, a comprehensive safety evaluation and monitoring framework for LLM-based MAS, grounded in the OWASP standards. Specifically, TrinityGuard encompasses a three-tier fine-grained risk taxonomy that identifies 20 risk types, covering single-agent vulnerabilities, inter-agent communication threats, and system-level emergent hazards. Designed for scalability across various MAS structures and platforms, TrinityGuard is organized in a trinity manner, involving an MAS abstraction layer that can be adapted to any MAS structures, an evaluation layer containing risk-specific test modules, alongside runtime monitor agents coordinated by a unified LLM Judge Factory. During Evaluation, TrinityGuard executes curated attack probes to generate detailed vulnerability reports for each risk type, where monitor agents analyze structured execution traces and issue real-time alerts, enabling both pre-development evaluation and runtime monitoring. We further formalize these safety metrics and present detailed case studies across various representative MAS examples, showcasing the versatility and reliability of TrinityGuard. Overall, TrinityGuard acts as a comprehensive framework for evaluating and monitoring various risks in MAS, paving the way for further research into their safety and security.

#42 Evasive Intelligence: Lessons from Malware Analysis for Evaluating AI Agents

agent

著者: Simone Aonzo, Merve Sahin, Aur\'elien Francillon, Daniele Perito

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.15457

要約:
Artificial intelligence (AI) systems are increasingly adopted as tool-using agents that can plan, observe their environment, and take actions over extended time periods. This evolution challenges current evaluation practices where the AI models are tested in restricted, fully observable settings. In this article, we argue that evaluations of AI agents are vulnerable to a well-known failure mode in computer security: malicious software that exhibits benign behavior when it detects that it is being analyzed. We point out how AI agents can infer the properties of their evaluation environment and adapt their behavior accordingly. This can lead to overly optimistic safety and robustness assessments. Drawing parallels with decades of research on malware sandbox evasion, we demonstrate that this is not a speculative concern, but rather a structural risk inherent to the evaluation of adaptive systems. Finally, we outline concrete principles for evaluating AI agents, which treat the system under test as potentially adversarial. These principles emphasize realism, variability of test conditions, and post-deployment reassessment.

#43 A Dual-Path Generative Framework for Zero-Day Fraud Detection in Banking Systems

著者: Nasim Abdirahman Ismail, Enis Karaarslan

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13237

要約:
High-frequency banking environments face a critical trade-off between low-latency fraud detection and the regulatory explainability demanded by GDPR. Traditional rule-based and discriminative models struggle with "zero-day" attacks due to extreme class imbalance and the lack of historical precedents. This paper proposes a Dual-Path Generative Framework that decouples real-time anomaly detection from offline adversarial training. The architecture employs a Variational Autoencoder (VAE) to establish a legitimate transaction manifold based on reconstruction error, ensuring <50ms inference latency. In parallel, an asynchronous Wasserstein GAN with Gradient Penalty (WGAN-GP) synthesizes high-entropy fraudulent scenarios to stress-test the detection boundaries. Crucially, to address the non-differentiability of discrete banking data (e.g., Merchant Category Codes), we integrate a Gumbel-Softmax estimator. Furthermore, we introduce a trigger-based explainability mechanism where SHAP (Shapley Additive Explanations) is activated only for high-uncertainty transactions, reconciling the computational cost of XAI with real-time throughput requirements.

#44 ILION: Deterministic Pre-Execution Safety Gates for Agentic AI Systems

agent

著者: Florin Adrian Chitan

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13247

要約:
The proliferation of autonomous AI agents capable of executing real-world actions - filesystem operations, API calls, database modifications, financial transactions - introduces a class of safety risk not addressed by existing content-moderation infrastructure. Current text-safety systems evaluate linguistic content for harm categories such as violence, hate speech, and sexual content; they are architecturally unsuitable for evaluating whether a proposed action falls within an agent's authorized operational scope. We present ILION (Intelligent Logic Identity Operations Network), a deterministic execution gate for agentic AI systems. ILION employs a five-component cascade architecture - Transient Identity Imprint (TII), Semantic Vector Reference Frame (SVRF), Identity Drift Control (IDC), Identity Resonance Score (IRS) and Consensus Veto Layer (CVL) - to classify proposed agent actions as BLOCK or ALLOW without statistical training or API dependencies. The system requires zero labeled data, operates in sub-millisecond latency, and produces fully interpretable verdicts. We evaluate ILION on ILION-Bench v2, a purpose-built benchmark of 380 test scenarios across eight attack categories with 39% hard-difficulty adversarial cases and a held-out development split. ILION achieves F1 = 0.8515, precision = 91.0%, and a false positive rate of 7.9% at a mean latency of 143 microseconds. Comparative evaluation against three baselines - Lakera Guard (F1 = 0.8087), OpenAI Moderation API (F1 = 0.1188), and Llama Guard 3 (F1 = 0.0105) - demonstrates that existing text-safety infrastructure systematically fails on agent execution safety tasks due to a fundamental task mismatch. ILION outperforms the best commercial baseline by 4.3 F1 points while operating 2,000 times faster with a false positive rate four times lower.

#45 A Robust Framework for Secure Cardiovascular Risk Prediction: An Architectural Case Study of Differentially Private Federated Learning

privacy

著者: Rodrigo Tertulino, La\'ercio Alencar

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13293

要約:
Accurate cardiovascular risk prediction is crucial for preventive healthcare; however, the development of robust Artificial Intelligence (AI) models is hindered by the fragmentation of clinical data across institutions due to stringent privacy regulations. This paper presents a comprehensive architectural case study validating the engineering robustness of FedCVR, a privacy-preserving Federated Learning framework applied to heterogeneous clinical networks. Rather than proposing a new theoretical optimizer, this work focuses on a systems engineering analysis to quantify the operational trade-offs of server-side adaptive optimization under utility-prioritized Differential Privacy (DP). By conducting a rigorous stress test in a high-fidelity synthetic environment calibrated against real-world datasets (Framingham, Cleveland), we systematically evaluate the system's resilience to statistical noise. The validation results demonstrate that integrating server-side momentum as a temporal denoiser allows the architecture to achieve a stable F1-score of 0.84 and an Area Under the Curve (AUC) of 0.96, statistically outperforming standard stateless baselines. Our findings confirm that server-side adaptivity is a structural prerequisite for recovering clinical utility under realistic privacy budgets, providing a validated engineering blueprint for secure multi-institutional collaboration.

#46 VisualLeakBench: Auditing the Fragility of Large Vision-Language Models against PII Leakage and Social Engineering

著者: Youting Wang, Yuan Tang, Yitian Qian, Chen Zhao

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13385

要約:
As Large Vision-Language Models (LVLMs) are increasingly deployed in agent-integrated workflows and other deployment-relevant settings, their robustness against semantic visual attacks remains under-evaluated -- alignment is typically tested on explicit harmful content rather than privacy-critical multimodal scenarios. We introduce VisualLeakBench, an evaluation suite to audit LVLMs against OCR Injection and Contextual PII Leakage using 1,000 synthetically generated adversarial images with 8 PII types, validated on 50 in-the-wild (IRL) real-world screenshots spanning diverse visual contexts. We evaluate four frontier systems (GPT-5.2, Claude~4, Gemini-3 Flash, Grok-4) with Wilson 95% confidence intervals. Claude~4 achieves the lowest OCR ASR (14.2%) but the highest PII ASR (74.4%), exhibiting a comply-then-warn pattern -- where verbatim data disclosure precedes any safety-oriented language. Grok-4 achieves the lowest PII ASR (20.4%). A defensive system prompt eliminates PII leakage for two models, reduces Claude~4's leakage from 74.4% to 2.2%, but has no effect on Gemini-3 Flash on synthetic data. Strikingly, IRL validation reveals Gemini-3 Flash does respond to mitigation on real-world images (50% to 0%), indicating that mitigation robustness is template-sensitive rather than uniformly absent. We release our dataset and code for reproducible robustness and safety evaluation of deployment-relevant vision-language systems.

#47 CtrlAttack: A Unified Attack on World-Model Control in Diffusion Models

diffusion

著者: Shuhan Xu, Siyuan Liang, Hongling Zheng, Yong Luo, Han Hu, Lefei Zhang, Dacheng Tao

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13435

要約:
Diffusion-based image-to-video (I2V) models increasingly exhibit world-model-like properties by implicitly capturing temporal dynamics. However, existing studies have mainly focused on visual quality and controllability, and the robustness of the state transition learned by the model remains understudied. To fill this gap, we are the first to analyze the vulnerability of I2V models, find that temporal control mechanisms constitute a new attack surface, and reveal the challenge of modeling them uniformly under different attack settings. Based on this, we propose a trajectory-control attack, called CtrlAttack, to interfere with state evolution during the generation process. Specifically, we represent the perturbation as a low-dimensional velocity field and construct a continuous displacement field via temporal integration, thereby affecting the model's state transitions while maintaining temporal consistency; meanwhile, we map the perturbation to the observation space, making the method applicable to both white-box and black-box attack settings. Experimental results show that even under low-dimensional and strongly regularized perturbation constraints, our method can still significantly disrupt temporal consistency by increasing the attack success rate (ASR) to over 90% in the white-box setting and over 80% in the black-box setting, while keeping the variation of the FID and FVD within 6 and 130, respectively, thus revealing the potential security risk of I2V models at the level of state dynamics.

#48 Privacy-Preserving Machine Learning for IoT: A Cross-Paradigm Survey and Future Roadmap

privacy

著者: Zakia Zaman, Praveen Gauravaram, Mahbub Hassan, Sanjay Jha, Wen Hu

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13570

要約:
The rapid proliferation of the Internet of Things has intensified demand for robust privacy-preserving machine learning mechanisms to safeguard sensitive data generated by large-scale, heterogeneous, and resource-constrained devices. Unlike centralized environments, IoT ecosystems are inherently decentralized, bandwidth-limited, and latency-sensitive, exposing privacy risks across sensing, communication, and distributed training pipelines. These characteristics render conventional anonymization and centralized protection strategies insufficient for practical deployments. This survey presents a comprehensive IoT-centric, cross-paradigm analysis of privacy-preserving machine learning. We introduce a structured taxonomy spanning perturbation-based mechanisms such as differential privacy, distributed paradigms such as federated learning, cryptographic approaches including homomorphic encryption and secure multiparty computation, and generative synthesis techniques based on generative adversarial networks. For each paradigm, we examine formal privacy guarantees, computational and communication complexity, scalability under heterogeneous device participation, and resilience against threats including membership inference, model inversion, gradient leakage, and adversarial manipulation. We further analyze deployment constraints in wireless IoT environments, highlighting trade-offs between privacy, communication overhead, model convergence, and system efficiency within next-generation mobile architectures. We also consolidate evaluation methodologies, summarize representative datasets and open-source frameworks, and identify open challenges including hybrid privacy integration, energy-aware learning, privacy-preserving large language models, and quantum-resilient machine learning.

#49 Bodhi VLM: Privacy-Alignment Modeling for Hierarchical Visual Representations in Vision Backbones and VLM Encoders via Bottom-Up and Top-Down Feature Search

privacy

著者: Bo Ma, Jinsong Wu, Wei Qi Yan

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13728

要約:
Learning systems that preserve privacy often inject noise into hierarchical visual representations; a central challenge is to \emph{model} how such perturbations align with a declared privacy budget in a way that is interpretable and applicable across vision backbones and vision--language models (VLMs). We propose \emph{Bodhi VLM}, a \emph{privacy-alignment modeling} framework for \emph{hierarchical neural representations}: it (1) links sensitive concepts to layer-wise grouping via NCP and MDAV-based clustering; (2) locates sensitive feature regions using bottom-up (BUA) and top-down (TDA) strategies over multi-scale representations (e.g., feature pyramids or vision-encoder layers); and (3) uses an Expectation-Maximization Privacy Assessment (EMPA) module to produce an interpretable \emph{budget-alignment signal} by comparing the fitted sensitive-feature distribution to an evaluator-specified reference (e.g., Laplace or Gaussian with scale $c/\epsilon$). The output is reference-relative and is \emph{not} a formal differential-privacy estimator. We formalize BUA/TDA over hierarchical feature structures and validate the framework on object detectors (YOLO, PPDPTS, DETR) and on the \emph{visual encoders} of VLMs (CLIP, LLaVA, BLIP). BUA and TDA yield comparable deviation trends; EMPA provides a stable alignment signal under the reported setups. We compare with generic discrepancy baselines (Chi-square, K-L, MMD) and with task-relevant baselines (MomentReg, NoiseMLE, Wass-1). Results are reported as mean$\pm$std over multiple seeds with confidence intervals in the supplementary materials. This work contributes a learnable, interpretable modeling perspective for privacy-aligned hierarchical representations rather than a post hoc audit only. Source code: \href{https://github.com/mabo1215/bodhi-vlm.git}{Bodhi-VLM GitHub repository}

#50 Early Rug Pull Warning for BSC Meme Tokens via Multi-Granularity Wash-Trading Pattern Profiling

著者: Dingding Cao, Bianbian Jiao, Jingzong Yang, Yujing Zhong, Wei Yang

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13830

要約:
The high-frequency issuance and short-cycle speculation of meme tokens in decentralized finance (DeFi) have significantly amplified rug-pull risk. Existing approaches still struggle to provide stable early warning under scarce anomalies, incomplete labels, and limited interpretability. To address this issue, an end-to-end warning framework is proposed for BSC meme tokens, consisting of four stages: dataset construction and labeling, wash-trading pattern feature modeling, risk prediction, and error analysis. Methodologically, 12 token-level behavioral features are constructed based on three wash-trading patterns (Self, Matched, and Circular), unifying transaction-, address-, and flow-level signals into risk vectors. Supervised models are then employed to output warning scores and alert decisions. Under the current setting (7 tokens, 33,242 records), Random Forest outperforms Logistic Regression on core metrics, achieving AUC=0.9098, PR-AUC=0.9185, and F1=0.7429. Ablation results show that trade-level features are the primary performance driver (Delta PR-AUC=-0.1843 when removed), while address-level features provide stable complementary gain (Delta PR-AUC=-0.0573). The model also demonstrates actionable early-warning potential for a subset of samples, with a mean Lead Time (v1) of 3.8133 hours. The error profile (FP=1, FN=8) indicates that the current system is better positioned as a high-precision screener rather than a high-recall automatic alarm engine. The main contributions are threefold: an executable and reproducible rug-pull warning pipeline, empirical validation of multi-granularity wash-trading features under weak supervision, and deployment-oriented evidence through lead-time and error-bound analysis.

#51 Mining the YARA Ecosystem: From Ad-Hoc Sharing to Data-Driven Threat Intelligence

著者: Dectot--Le Monnier de Gouville Esteban, Mohammad Hamdaqa, Moataz Chouchen

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.14191

要約:
YARA has established itself as the de facto standard for "Detection as Code," enabling analysts and DevSecOps practitioners to define signatures for malware identification across the software supply chain. Despite its pervasive use, the open-source YARA ecosystem remains characterized by ad-hoc sharing and opaque quality. Practitioners currently rely on public repositories without empirical evidence regarding the ecosystem's structural characteristics, maintenance and diffusion dynamics, or operational reliability. We conducted a large-scale mixed-method study of 8.4 million rules mined from 1,853 GitHub repositories. Our pipeline integrates repository mining to map supply chain dynamics, static analysis to assess syntactic quality, and dynamic benchmarking against 4,026 malware and 2,000 goodware samples to measure operational effectiveness. We reveal a highly centralized structure where 10 authors drive 80% of rule adoption. The ecosystem functions as a "static supply chain": repositories show a median inactivity of 782 days and a median technical lag of 4.2 years. While static quality scores appear high (mean = 99.4/100), operational benchmarking uncovers significant noise (false positives) and low recall. Furthermore, coverage is heavily biased toward legacy threats (Ransomware), leaving modern initial access vectors (Loaders, Stealers) severely underrepresented. These findings expose a systemic "double penalty": defenders incur high performance overhead for decayed intelligence. We argue that public repositories function as raw data dumps rather than curated feeds, necessitating a paradigm shift from ad-hoc collection to rigorous rule engineering. We release our dataset and pipeline to support future data-driven curation tools.

#52 A Multi-Scale Graph Learning Framework with Temporal Consistency Constraints for Financial Fraud Detection in Transaction Networks under Non-Stationary Conditions

著者: Yiming Lei, Qiannan Shen, Junhao Song

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.14592

要約:
Financial fraud detection in transaction networks involves modeling sparse anomalies, dynamic patterns, and severe class imbalance in the presence of temporal drift in the data. In real-world transaction systems, a suspicious transaction is rarely isolated: rather, legitimate and suspicious transactions are often connected through accounts, intermediaries or through temporal transaction sequences. Attribute-based or randomly partitioned learning pipelines are therefore insufficient to detect relationally structured fraud. STC-MixHop, a graph-based framework combining spatial multi-resolution propagation with lightweight temporal consistency modeling for anomaly and fraud detection in dynamic transaction networks. It integrates three components: a MixHop-inspired multi-scale neighborhood diffusion encoder a multi-scale neighborhood diffusion MixHop-based encoder for learning structural patterns; a spatial-temporal attention module coupling current and preceding graph snapshots to stabilize representations; and a temporally informed self-supervised pretraining strategy exploiting unlabeled transaction interactions to improve representation quality. We evaluate the framework primarily on the PaySim dataset under strict chronological splits, supplementing the analysis with Porto Seguro and FEMA data to probe cross-domain component behavior. Results show that STC-MixHop is competitive among graph methods and achieves strong screening-oriented recall under highly imbalanced conditions. The experiments also reveal an important boundary condition: when node attributes are highly informative, tabular baselines remain difficult to outperform. Graph structure contributes most clearly where hidden relational dependencies are operationally important. These findings support a stability-focused view of graph learning for financial fraud detection.

#53 s2n-bignum-bench: A practical benchmark for evaluating low-level code reasoning of LLMs

著者: Balaji Rao, John Harrison, Soonho Kong, Juneyoung Lee, Carlo Lipizzi

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.14628

要約:
Neurosymbolic approaches leveraging Large Language Models (LLMs) with formal methods have recently achieved strong results on mathematics-oriented theorem-proving benchmarks. However, success on competition-style mathematics does not by itself demonstrate the ability to construct proofs about real-world implementations. We address this gap with a benchmark derived from an industrial cryptographic library whose assembly routines are already verified in HOL Light. s2n-bignum is a library used at AWS for providing fast assembly routines for cryptography, and its correctness is established by formal verification. The task of formally verifying this library has been a significant achievement for the Automated Reasoning Group. It involved two tasks: (1) precisely specifying the correct behavior of a program as a mathematical proposition, and (2) proving that the proposition is correct. In the case of s2n-bignum, both tasks were carried out by human experts. In \textit{s2n-bignum-bench}, we provide the formal specification and ask the LLM to generate a proof script that is accepted by HOL Light within a fixed proof-check timeout. To our knowledge, \textit{s2n-bignum-bench} is the first public benchmark focused on machine-checkable proof synthesis for industrial low-level cryptographic assembly routines in HOL Light. This benchmark provides a challenging and practically relevant testbed for evaluating LLM-based theorem proving beyond competition mathematics. The code to set up and use the benchmark is available here: \href{https://github.com/kings-crown/s2n-bignum-bench}{s2n-bignum-bench}.

#54 Protecting Distributed Blockchain with Twin-Field Quantum Key Distribution: A Quantum Resistant Approach

著者: Xuan Li, Ying Guo

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.14826

要約:
Quantum computing provides the feasible multi-layered security challenges to classical blockchain systems. Whereas, quantum-secured blockchains relied on quantum key distribution (QKD) to establish secure channels can address this potential threat. This paper presents a scalable quantum-resistant blockchain architecture designed to address the connectivity and distance limitations of the QKD integrated quantum networks. By leveraging the twin-field (TF) QKD protocol within a measurement-device-independent (MDI) topology, the proposed framework can optimize the infrastructure complexity from quadratic to linear scaling. This architecture effectively integrates information-theoretic security with distributed consensus mechanisms, allowing the system to overcome the fundamental rate-loss limits inherent in traditional point-to-point links. The proposed scheme offers a theoretically sound and feasible solution for deploying large-scale and long-distance consortium.

#55 DP-S4S: Accurate and Scalable Select-Join-Aggregate Query Processing with User-Level Differential Privacy

privacy

著者: Yuan Qiu, Xiaokui Xiao, Yin Yang

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.14994

要約:
Answering Select-Join-Aggregate queries with DP is a fundamental problem with important applications in various domains. The current SOTA methods ensure user-level DP (i.e., the adversary cannot infer the presence or absence of any given individual user with high confidence) and achieve instance-optimal accuracy on the query results. However, these solutions involve solving expensive optimization programs, which may incur prohibitive computational overhead for large databases. One promising direction to achieve scalability is through sampling, which provides a tunable trade-off between result utility and computational costs. However, applying sampling to differentially private SJA processing is a challenge for two reasons. First, it is unclear what to sample, in order to achieve the best accuracy within a given computational budget. Second, prior solutions were not designed with sampling in mind, and their mathematical tool chains are not sampling-friendly. To our knowledge, the only known solution that applies sampling to private SJA processing is S&E, a recent proposal that (i) samples users and (ii) combines sampling directly with existing solutions to enforce DP. We show that both are suboptimal designs; consequently, even with a relatively high sample rate, the error incurred by S&E can be 10x higher than the underlying DP mechanism without sampling. Motivated by this, we propose Differentially Private Sampling for Scale (DP-S4S), a novel mechanism that addresses the above challenges by (i) sampling aggregation units instead of users, and (ii) laying the mathematical foundation for SJA processing under RDP, which composes more easily with sampling. Further, DP-S4S can answer both scalar and vector SJA queries. Extensive experiments on real data demonstrate that DP-S4S enables scalable SJA processing on large datasets under user-level DP, while maintaining high result utility.

#56 Infinite families of APN permutations in constrained trivariate classes over $\mathbb{F}_{2^m}$

著者: Daniele Bartoli, Pantelimon Stanica

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.15146

要約:
We study trivariate permutation polynomials over $\mathbb{F}_{2^{m}}$ extending two APN permutation families of Li--Kaleyski (IEEE Trans. Inform. Theory, 2024) by allowing the scalar parameter to vary over $\mathbb{F}_{2^m}^*$. For \[ G_a(x,y,z)=(x^{q+1}+ax^qz+yz^q,\; x^qz+y^{q+1},\; xy^q+ay^qz+z^{q+1}), \] where $a\in\mathbb{F}_{2^m}^*$, $q=2^i$, $\gcd(i,m)=1$, and $m$ is odd, we prove that $G_a$ is a permutation if and only if an associated univariate polynomial has no root in $\mathbb{F}_{2^m}^*$, and that this condition is also equivalent to $G_a$ being APN. Hence, writing $d=q^2+q+1$, at least \[ \frac{2^m+1-(d-1)(d-2)2^{m/2}-d}{d} \] values of $a$ yield APN permutations $G_a$. In the binary case $q=2$, we show that $a=1$ is good whenever $7\nmid m$, recovering the Li--Kaleyski family. For the second family \[ H_a(x,y,z)=(x^{q+1}+axy^q+yz^q,\; xy^q+z^{q+1},\; x^qz+y^{q+1}+ay^qz), \] we obtain the same root criterion and prove that its defining polynomial is root-equivalent to that of $G_a$. Thus the same parameters $a$ give APN permutations in both families. We also prove strong inequivalence results. First, $G_a$ (resp.\ $H_a$) is diagonally equivalent to $G_1$ (resp.\ $H_1$) if and only if $a^{q^2+q+1}=1$; moreover, for $m>4$, $m\neq 6$, and $7\nmid m$, diagonal non-equivalence implies CCZ non-equivalence by the monomial restriction theorem of Shi et al.\ (DCC, 2025). In particular, when $q=2$ and $7\nmid m$, every good $a\neq 1$ gives APN permutations CCZ-inequivalent to Li--Kaleyski. Second, for the same range of $m$, no $G_a$ is CCZ-equivalent to any $H_b$. Hence these constructions yield two genuinely new, mutually inequivalent families of APN permutations on $\mathbb{F}_{2^{3m}}$.

#57 Directional Embedding Smoothing for Robust Vision Language Models

著者: Ye Wang, Jing Liu, Toshiaki Koike-Akino

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.15259

要約:
The safety and reliability of vision-language models (VLMs) are a crucial part of deploying trustworthy agentic AI systems. However, VLMs remain vulnerable to jailbreaking attacks that undermine their safety alignment to yield harmful outputs. In this work, we extend the Randomized Embedding Smoothing and Token Aggregation (RESTA) defense to VLMs and evaluate its performance against the JailBreakV-28K benchmark of multi-modal jailbreaking attacks. We find that RESTA is effective in reducing attack success rate over this diverse corpus of attacks, in particular, when employing directional embedding noise, where the injected noise is aligned with the original token embedding vectors. Our results demonstrate that RESTA can contribute to securing VLMs within agentic systems, as a lightweight, inference-time defense layer of an overall security framework.

#58 SKILLS: Structured Knowledge Injection for LLM-Driven Telecommunications Operations

著者: Ivo Brett

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.15372

要約:
As telecommunications operators accelerate adoption of AI-enabled automation, a practical question remains unresolved: can general-purpose large language model (LLM) agents reliably execute telecom operations workflows through real API interfaces, or do they require structured domain guidance? We introduce SKILLS (Structured Knowledge Injection for LLM-driven Service Lifecycle operations), a benchmark framework comprising 37 telecom operations scenarios spanning 8 TM Forum Open API domains (TMF620, TMF621, TMF622, TMF628, TMF629, TMF637, TMF639, TMF724). Each scenario is grounded in live mock API servers with seeded production-representative data, MCP tool interfaces, and deterministic evaluation rubrics combining response content checks, tool-call verification, and database state assertions. We evaluate open-weight models under two conditions: baseline (generic agent with tool access but no domain guidance) and with-skill (agent augmented with a portable SKILL.md document encoding workflow logic, API patterns, and business rules). Results across 5 open-weight model conditions and 185 scenario-runs show consistent skill lift across all models. MiniMax M2.5 leads (81.1% with-skill, +13.5pp), followed by Nemotron 120B (78.4%, +18.9pp), GLM-5 Turbo (78.4%, +5.4pp), and Seed 2.0 Lite (75.7%, +18.9pp).

#59 Amplification Effects in Test-Time Reinforcement Learning: Safety and Reasoning Vulnerabilities

著者: Vanshaj Khattar, Md Rafi ur Rashid, Moumita Choudhury, Jing Liu, Toshiaki Koike-Akino, Ming Jin, Ye Wang

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.15417

要約:
Test-time training (TTT) has recently emerged as a promising method to improve the reasoning abilities of large language models (LLMs), in which the model directly learns from test data without access to labels. However, this reliance on test data also makes TTT methods vulnerable to harmful prompt injections. In this paper, we investigate safety vulnerabilities of TTT methods, where we study a representative self-consistency-based test-time learning method: test-time reinforcement learning (TTRL), a recent TTT method that improves LLM reasoning by rewarding self-consistency using majority vote as a reward signal. We show that harmful prompt injection during TTRL amplifies the model's existing behaviors, i.e., safety amplification when the base model is relatively safe, and harmfulness amplification when it is vulnerable to the injected data. In both cases, there is a decline in reasoning ability, which we refer to as the reasoning tax. We also show that TTT methods such as TTRL can be exploited adversarially using specially designed "HarmInject" prompts to force the model to answer jailbreak and reasoning queries together, resulting in stronger harmfulness amplification. Overall, our results highlight that TTT methods that enhance LLM reasoning by promoting self-consistency can lead to amplification behaviors and reasoning degradation, highlighting the need for safer TTT methods.

#60 Differential Privacy for Network Connectedness Indices

privacy

著者: Tom A. Rutter, Yuxin Liu, M. Amin Rahimian

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.15609

要約:
Researchers increasingly use data on social and economic networks to study a range of social science questions, but releasing statistics derived from networks can raise significant privacy concerns. We show how to release network connectedness indices that quantify assortative mixing across node attributes under edge-adjacent differential privacy. Standard privacy techniques perform poorly in this setting both because connectedness indices have high global sensitivity and because a single node's attribute can potentially be an input to connectedness in thousands of cells, leading to poor composition. Our method, which is straightforward to apply, first adds noise to node attributes, then analytically debiases downstream statistics, and finally applies a second layer of noise to protect the presence or absence of individual edges. We prove consistency and asymptotic normality of our estimators for both discrete and continuous labels and show our method works well in simulations and on real networks with as few as 200 nodes collected by social scientists.

#61 Revisiting Shor's quantum algorithm for computing general discrete logarithms

著者: Martin Eker{\aa}

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/1905.09084

要約:
We heuristically show that Shor's algorithm for computing general discrete logarithms achieves an expected success probability of approximately 60% to 82% in a single run when modified to enable efficient implementation with the semi-classical Fourier transform. By slightly increasing the number of group operations that are evaluated quantumly and performing a single limited search in the classical post-processing, or by performing two limited searches in the post-processing, we show how the algorithm can be further modified to achieve a success probability that heuristically exceeds 99% in a single run. We provide concrete heuristic estimates of the success probability of the modified algorithm, as a function of the group order $r$, the size of the search space in the classical post-processing, and the additional number of group operations evaluated quantumly. In the limit as $r \rightarrow \infty$, we heuristically show that the success probability tends to one. In analogy with our earlier works, we show how the modified quantum algorithm may be heuristically simulated classically when the logarithm $d$ and $r$ are both known. Furthermore, we heuristically show how slightly better tradeoffs may be achieved, compared to our earlier works, if $r$ is known when computing $d$. We generalize our heuristic to cover some of our earlier works, and compare it to the non-heuristic analyses in those works.

#62 Edgeworth Accountant: An Analytical Approach to Differential Privacy Composition

privacy

著者: Hua Wang, Sheng Gao, Huanyu Zhang, Milan Shen, Weijie J. Su, Jiayuan Wu

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2206.04236

要約:
In privacy-preserving data analysis, many procedures and algorithms are structured as compositions of multiple private building blocks. As such, an important question is how to efficiently compute the overall privacy loss under composition. This paper introduces the Edgeworth Accountant, an analytical approach to composing differential privacy guarantees for private algorithms. Leveraging the $f$-differential privacy framework, the Edgeworth Accountant accurately tracks privacy loss under composition, enabling a closed-form expression of privacy guarantees through privacy-loss log-likelihood ratios (PLLRs). As implied by its name, this method applies the Edgeworth expansion to estimate and define the probability distribution of the sum of the PLLRs. Furthermore, by using a technique that simplifies complex distributions into simpler ones, we demonstrate the Edgeworth Accountant's applicability to any noise-addition mechanism. Its main advantage is providing $(\epsilon, \delta)$-differential privacy bounds that are non-asymptotic and do not significantly increase computational cost. This feature sets it apart from previous approaches, in which the running time increases with the number of mechanisms under composition. We conclude by showing how our Edgeworth Accountant offers accurate estimates and tight upper and lower bounds on $(\epsilon, \delta)$-differential privacy guarantees, especially tailored for training private models in deep learning and federated analytics.

#63 Towards Model Extraction Attacks in GAN-Based Image Translation via Domain Shift Mitigation

model extraction

著者: Di Mi, Yanjun Zhang, Leo Yu Zhang, Shengshan Hu, Qi Zhong, Haizhuan Yuan, Shirui Pan

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2403.07673

要約:
Model extraction attacks (MEAs) enable an attacker to replicate the functionality of a victim deep neural network (DNN) model by only querying its API service remotely, posing a severe threat to the security and integrity of pay-per-query DNN-based services. Although the majority of current research on MEAs has primarily concentrated on neural classifiers, there is a growing prevalence of image-to-image translation (I2IT) tasks in our everyday activities. However, techniques developed for MEA of DNN classifiers cannot be directly transferred to the case of I2IT, rendering the vulnerability of I2IT models to MEA attacks often underestimated. This paper unveils the threat of MEA in I2IT tasks from a new perspective. Diverging from the traditional approach of bridging the distribution gap between attacker queries and victim training samples, we opt to mitigate the effect caused by the different distributions, known as the domain shift. This is achieved by introducing a new regularization term that penalizes high-frequency noise, and seeking a flatter minimum to avoid overfitting to the shifted distribution. Extensive experiments on different image translation tasks, including image super-resolution and style transfer, are performed on different backbone victim models, and the new design consistently outperforms the baseline by a large margin across all metrics. A few real-life I2IT APIs are also verified to be extremely vulnerable to our attack, emphasizing the need for enhanced defenses and potentially revised API publishing policies.

#64 Efficient and Flexible Differet-Radix Montgomery Modular Multiplication for Hardware Implementation

著者: Yuxuan Zhang, Hua Guo, Chen Chen, Yewei Guan, Xiyong Zhang, Zhenyu Guan

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2407.12701

要約:
Montgomery modular multiplication is widely-used in public key cryptosystems (PKC) and affects the efficiency of upper systems directly. However, modulus is getting larger due to the increasing demand of security, which results in a heavy computing cost. High-performance implementation of Montgomery modular multiplication is urgently required to ensure the highly-efficient operations in PKC. However, existing high-speed implementations still need a large amount redundant computing to simplify the intermediate result. Supports to the redundant representation is extremely limited on Montgomery modular multiplication. In this paper, we propose an efficient parallel variant of iterative Montgomery modular multiplication, called DRMMM, that allows the quotient can be computed in multiple iterations. In this variant, terms in intermediate result and the quotient in each iteration are computed in different radix such that computation of the quotient can be pipelined. Based on proposed variant, we also design high-performance hardware implementation architecture for faster operation. In the architecture, intermediate result in every iteration is denoted as three parts to free from redundant computations. Finally, to support FPGA-based systems, we design operators based on FPGA underlying architecture for better area-time performance. The result of implementation and experiment shows that our method reduces the output latency by 38.3\% than the fastest design on FPGA.

#65 WAFFLED: Exploiting Parsing Discrepancies to Bypass Web Application Firewalls

著者: Seyed Ali Akhavani, Bahruz Jabiyev, Ben Kallus, Cem Topcuoglu, Sergey Bratus, Engin Kirda

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2503.10846

要約:
Web Application Firewalls (WAFs) have been introduced as essential and popular security gates that inspect incoming HTTP traffic to filter out malicious requests and provide defenses against a diverse array of web-based threats. Evading WAFs can compromise these defenses, potentially harming Internet users. In recent years, parsing discrepancies have plagued many entities in the communication path; however, their potential impact on WAF evasion and request smuggling remains largely unexplored. In this work, we present an innovative approach to bypassing WAFs by uncovering and exploiting parsing discrepancies through advanced fuzzing techniques. By targeting non-malicious components such as headers and segments of the body and using widely used content-types such as application/json, multipart/form-data, and application/xml, we identified and confirmed 1207 bypasses across 5 well-known WAFs, AWS, Azure, Cloud Armor, Cloudflare, and ModSecurity. To validate our findings, we conducted a study in the wild, revealing that more than 90% of websites accepted both application/x-www-form-urlencoded and multipart/form-data interchangeably, highlighting a significant vulnerability and the broad applicability of our bypass techniques. We have reported these vulnerabilities to the affected parties and received acknowledgments from all, as well as bug bounty rewards from some vendors. Further, to mitigate these vulnerabilities, we introduce HTTP-Normalizer, a robust proxy tool designed to rigorously validate HTTP requests against current RFC standards. Our results demonstrate its effectiveness in normalizing or blocking all bypass attempts presented in this work.

#66 A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage

privacy

著者: Rui Xin, Niloofar Mireshghallah, Shuyue Stella Li, Michael Duan, Hyunwoo Kim, Yejin Choi, Yulia Tsvetkov, Sewoong Oh, Pang Wei Koh

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2504.21035

要約:
Sanitizing sensitive text data typically involves removing personally identifiable information (PII) or generating synthetic data under the assumption that these methods adequately protect privacy; however, their effectiveness is often only assessed by measuring the leakage of explicit identifiers but ignoring nuanced textual markers that can lead to re-identification. We challenge the above illusion of privacy by proposing a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release. Our approach shows that seemingly innocuous auxiliary information -- such as routine social activities -- can be used to infer sensitive attributes like age or substance use history from sanitized data. For instance, we demonstrate that Azure's commercial PII removal tool fails to protect 74\% of information in the MedQA dataset. Although differential privacy mitigates these risks to some extent, it significantly reduces the utility of the sanitized text for downstream tasks. Our findings indicate that current sanitization techniques offer a \textit{false sense of privacy}, highlighting the need for more robust methods that protect against semantic-level information leakage.

#67 A Lightweight IDS for Early APT Detection Using a Novel Feature Selection Method

著者: Bassam Noori Shaker, Bahaa Al-Musawi, Mohammed Falih Hassan

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2506.12108

要約:
An Advanced Persistent Threat (APT) is a multistage, highly sophisticated, and covert form of cyber threat that gains unauthorized access to networks to either steal valuable data or disrupt the targeted network. These threats often remain undetected for extended periods, emphasizing the critical need for early detection in networks to mitigate potential APT consequences. In this work, we propose a feature selection method for developing a lightweight intrusion detection system capable of effectively identifying APTs at the initial compromise stage. Our approach leverages the XGBoost algorithm and Explainable Artificial Intelligence (XAI), specifically utilizing the SHAP (SHapley Additive exPlanations) method for identifying the most relevant features of the initial compromise stage. The results of our proposed method showed the ability to reduce the selected features of the SCVIC-APT-2021 dataset from 77 to just four while maintaining consistent evaluation metrics for the suggested system. The estimated metrics values are 97% precision, 100% recall, and a 98% F1 score. The proposed method not only aids in preventing successful APT consequences but also enhances understanding of APT behavior at early stages.

#68 GPM: The Gaussian Pancake Mechanism for Planting Undetectable Backdoors in Differential Privacy

privacybackdoor

著者: Haochen Sun, Xi He

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2509.23834

要約:
Differential privacy (DP) has become the gold standard for preserving individual privacy in data analysis. However, an implicit yet fundamental assumption underlying these rigorous privacy guarantees is the correct implementation and execution of DP mechanisms. Several incidents of unintended privacy loss have occurred due to numerical issues and inappropriate configurations of DP software, which have been successfully exploited in privacy attacks. To better understand the seriousness of defective DP software, we ask the following question: is it possible to elevate these passive defects into active privacy attacks while maintaining covertness? To address this question, we present the Gaussian pancake mechanism (GPM), a novel mechanism that is computationally indistinguishable from the widely used Gaussian mechanism (GM), yet exhibits arbitrarily weaker statistical DP guarantees. This unprecedented separation enables a new class of backdoor attacks: by indistinguishably passing off as the authentic GM, GPM can covertly degrade statistical privacy. Unlike the unintentional privacy loss caused by GM's numerical issues, GPM is an adversarial yet undetectable backdoor attack against data privacy. We formally prove GPM's covertness, characterize its statistical leakage, and demonstrate a concrete distinguishing attack that can achieve near-perfect success rates under suitable parameter choices, both theoretically and empirically. Our results underscore the importance of using transparent, open-source DP libraries and highlight the need for rigorous scrutiny and formal verification of DP implementations to prevent subtle, undetectable privacy compromises in real-world systems.

#69 Secure and Robust Watermarking for AI-generated Images: A Comprehensive Survey

intellectual property

著者: Jie Cao, Qi Li, Zelin Zhang, Jianbing Ni, Rongxing Lu

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2510.02384

要約:
The rapid progress of Generative Artificial Intelligence (GenAI) has enabled the effortless synthesis of high-quality visual content, while simultaneously raising pressing concerns about intellectual property protection, authenticity, and accountability. Among various countermeasures, watermarking has emerged as a fundamental mechanism for tracing provenance, distinguishing AI-generated images from natural content, and supporting trustworthy digital ecosystems. This paper presents a comprehensive survey of AI-generated image watermarking, systematically reviewing the field from five perspectives: (1) the formalization and fundamental components of image watermarking systems; (2) existing watermarking methodologies and their comparative characteristics; (3) evaluation metrics in terms of visual fidelity, embedding capacity, and detectability; (4) known vulnerabilities under malicious attacks and recent advances in secure and robust watermarking designs; and (5) open challenges, emerging trends, and future research directions. The survey seeks to offer researchers a holistic understanding of watermarking technologies for AI-generated images and to facilitate their continued advancement toward secure and responsible AI-generated content practices.

#70 RESCUE: Retrieval Augmented Secure Code Generation

著者: Jiahao Shi, Tianyi Zhang

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2510.18204

要約:
Despite recent advances, Large Language Models (LLMs) still generate vulnerable code. Retrieval-Augmented Generation (RAG) has the potential to enhance LLMs for secure code generation by incorporating external security knowledge. However, the conventional RAG design struggles with the noise of raw security-related documents, and existing retrieval methods overlook the significant security semantics implicitly embedded in task descriptions. To address these issues, we propose \textsc{Rescue}, a new RAG framework for secure code generation with two key innovations. First, we propose a hybrid knowledge base construction method that combines LLM-assisted cluster-then-summarize distillation with program slicing, producing both high-level security guidelines and concise, security-focused code examples. Second, we design a hierarchical multi-faceted retrieval that traverses the constructed knowledge base from top to bottom and integrates multiple security-critical facts at each hierarchical level, ensuring comprehensive and accurate retrieval. We evaluated \textsc{Rescue} on four benchmarks and compared it with five state-of-the-art secure code generation methods on six LLMs. The results demonstrate that \textsc{Rescue} improves the SecurePass@1 metric by an average of 4.8 points, establishing a new state-of-the-art performance for security. Furthermore, we performed in-depth analysis and ablation studies to rigorously validate the effectiveness of individual components in \textsc{Rescue}. Our code is available at https://github.com/steven1518/RESCUE.

#71 HAMLOCK: HArdware-Model LOgically Combined attacK

著者: Sanskar Amgain, Daniel Lobo, Atri Chatterjee, Swarup Bhunia, Fnu Suya

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2510.19145

要約:
The growing use of third-party hardware accelerators (e.g., FPGAs, ASICs) for deep neural networks (DNNs) introduces new security vulnerabilities. Conventional model-level backdoor attacks, which only poison a model's weights to misclassify inputs with a specific trigger, are often detectable because the entire attack logic is embedded within the model (i.e., software), creating a traceable layer-by-layer activation path. This paper introduces the HArdware-Model Logically Combined Attack (HAMLOCK), a far stealthier threat that distributes the attack logic across the hardware-software boundary. The software (model) is now only minimally altered by tuning the activations of few neurons to produce uniquely high activation values when a trigger is present. A malicious hardware Trojan detects those unique activations by monitoring the corresponding neurons' most significant bit or the 8-bit exponents and triggers another hardware Trojan to directly manipulate the final output logits for misclassification. This decoupled design is highly stealthy, as the model itself contains no complete backdoor activation path as in conventional attacks and hence, appears fully benign. Empirically, across benchmarks like MNIST, CIFAR10, GTSRB, and ImageNet, HAMLOCK achieves a near-perfect attack success rate with a negligible clean accuracy drop. More importantly, HAMLOCK circumvents the state-of-the-art model-level defenses without any adaptive optimization. The hardware Trojan is also undetectable, incurring area and power overheads as low as 0.01%, which is easily masked by process and environmental noise. Our findings expose a critical vulnerability at the hardware-software interface, demanding new cross-layer defenses against this emerging threat.

#72 Privacy-Preserving Explainable AIoT Application via SHAP Entropy Regularization

privacy

著者: Dilli Prasad Sharma, Xiaowei Sun, Liang Xue, Xiaodong Lin, Pulei Xiong

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2511.09775

要約:
The widespread integration of Artificial Intelligence of Things (AIoT) in smart home environments has amplified the demand for transparent and interpretable machine learning models. To foster user trust and comply with emerging regulatory frameworks, the Explainable AI (XAI) methods, particularly post-hoc techniques such as SHapley Additive exPlanations (SHAP), and Local Interpretable Model-Agnostic Explanations (LIME), are widely employed to elucidate model behavior. However, recent studies have shown that these explanation methods can inadvertently expose sensitive user attributes and behavioral patterns, thereby introducing new privacy risks. To address these concerns, we propose a novel privacy-preserving approach based on SHAP entropy regularization to mitigate privacy leakage in explainable AIoT applications. Our method incorporates an entropy-based regularization objective that penalizes low-entropy SHAP attribution distributions during training, promoting a more uniform spread of feature contributions. To evaluate the effectiveness of our approach, we developed a suite of SHAP-based privacy attacks that strategically leverage model explanation outputs to infer sensitive information. We validate our method through comparative evaluations using these attacks alongside utility metrics on benchmark smart home energy consumption datasets. Experimental results demonstrate that SHAP entropy regularization substantially reduces privacy leakage compared to baseline models, while maintaining high predictive accuracy and faithful explanation fidelity. This work contributes to the development of privacy-preserving explainable AI techniques for secure and trustworthy AIoT applications.

#73 Personalizing Agent Privacy Decisions via Logical Entailment

privacyagent

著者: James Flemings, Ren Yi, Octavian Suciu, Kassem Fawaz, Murali Annavaram, Marco Gruteser

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2512.05065

要約:
Personal large language model (LLM) agents increasingly perform tasks that require access to user data, raising concerns about appropriate data disclosure. We show that relying solely on LLMs to make data-sharing decisions is insufficient. Prompting LLMs with general privacy norms fails to capture individual users' privacy preferences, while providing prior user data-sharing decisions through in-context learning (ICL) leads to unreliable and opaque reasoning. To address these limitations, we propose ARIEL (Agentic Reasoning with Individualized Entailment Logic), a framework that combines LLMs with rule-based logic to enable structured, personalized privacy reasoning. The core mechanism of ARIEL determines whether a user's prior decision on a data-sharing request $\textit{logically entails}$ the same decision for a new request. Experimental evaluations using advanced models and public datasets show that ARIEL reduces the F1 error rate for appropriate judgments by $\textbf{40.6%}$ compared to standard ICL-based reasoning, indicating that ARIEL is effective at correctly judging requests where the user would approve data sharing. These results demonstrate that integrating LLMs with logical entailment provides an effective and interpretable approach for automating personalized privacy decisions.

#74 Protecting Deep Neural Network Intellectual Property with Chaos-Based White-Box Watermarking

intellectual property

著者: Sangeeth B, Serena Nicolazzo, Deepa K., Vinod P

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2512.16658

要約:
The rapid proliferation of deep neural networks (DNNs) across several domains has led to increasing concerns regarding intellectual property (IP) protection and model misuse. Trained DNNs represent valuable assets, often developed through significant investments. However, the ease with which models can be copied, redistributed, or repurposed highlights the urgent need for effective mechanisms to assert and verify model ownership. In this work, we propose an efficient and resilient white-box watermarking framework that embeds ownership information into the internal parameters of a DNN using chaotic sequences. The watermark is generated using a logistic map, a well-known chaotic function, producing a sequence that is sensitive to its initialization parameters. This sequence is injected into the weights of a chosen intermediate layer without requiring structural modifications to the model or degradation in predictive performance. To validate ownership, we introduce a verification process based on a genetic algorithm that recovers the original chaotic parameters by optimizing the similarity between the extracted and regenerated sequences. The effectiveness of the proposed approach is demonstrated through extensive experiments on image classification tasks using MNIST and CIFAR-10 datasets. The results show that the embedded watermark remains detectable after fine-tuning, with negligible loss in model accuracy. In addition to numerical recovery of the watermark, we perform visual analyses using weight density plots and construct activation-based classifiers to distinguish between original, watermarked, and tampered models. Overall, the proposed method offers a flexible and scalable solution for embedding and verifying model ownership in white-box settings well-suited for real-world scenarios where IP protection is critical.

#75 DECEIVE-AFC: Adversarial Claim Attacks against Search-Enabled LLM-based Fact-Checking Systems

著者: Haoran Ou, Kangjie Chen, Gelei Deng, Hangcheng Liu, Jie Zhang, Tianwei Zhang, Kwok-Yan Lam

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2602.02569

要約:
Fact-checking systems with search-enabled large language models (LLMs) have shown strong potential for verifying claims by dynamically retrieving external evidence. However, the robustness of such systems against adversarial attack remains insufficiently understood. In this work, we study adversarial claim attacks against search-enabled LLM-based fact-checking systems under a realistic input-only threat model. We propose DECEIVE-AFC, an agent-based adversarial attack framework that integrates novel claim-level attack strategies and adversarial claim validity evaluation principles. DECEIVE-AFC systematically explores adversarial attack trajectories that disrupt search behavior, evidence retrieval, and LLM-based reasoning without relying on access to evidence sources or model internals. Extensive evaluations on benchmark datasets and real-world systems demonstrate that our attacks substantially degrade verification performance, reducing accuracy from 78.7% to 53.7%, and significantly outperform existing claim-based attack baselines with strong cross-system transferability.

#76 Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study

agent

著者: Yi Liu, Zhihao Chen, Yanjun Zhang, Gelei Deng, Yuekang Li, Jianting Ning, Ying Zhang, Leo Yu Zhang

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2602.06547

要約:
Third-party agent skills extend LLM-based agents with instruction files and executable code that run on users' machines. Skills execute with user privileges and are distributed through community registries with minimal vetting, but no ground-truth dataset exists to characterize the resulting threats. We construct the first labeled dataset of malicious agent skills by behaviorally verifying 98,380 skills from two community registries, confirming 157 malicious skills with 632 vulnerabilities. These attacks are not incidental. Malicious skills average 4.03 vulnerabilities across a median of three kill chain phases, and the ecosystem has split into two archetypes: Data Thieves that exfiltrate credentials through supply chain techniques, and Agent Hijackers that subvert agent decision-making through instruction manipulation. A single actor accounts for 54.1\% of confirmed cases through templated brand impersonation. Shadow features, capabilities absent from public documentation, appear in 0\% of basic attacks but 100\% of advanced ones; several skills go further by exploiting the AI platform's own hook system and permission flags. Responsible disclosure led to 93.6\% removal within 30 days. We release the dataset and analysis pipeline to support future work on agent skill security.

#77 Empowering Future Cybersecurity Leaders: Advancing Students through FINDS Education for Digital Forensic Excellence

著者: Yashas Hariprasad, Subhash Gurappa, Sundararaj S. Iyengar, Jerry F. Miller, Pronab Mohanty, Naveen Kumar Chaudhary

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.00222

要約:
The Forensics Investigations Network in Digital Sciences (FINDS) Research Center of Excellence (CoE), funded by the U.S. Army Research Laboratory, advances Digital Forensic Engineering Education (DFEE) through an integrated research education framework for AI enabled cybersecurity workforce development. FINDS combines high performance computing (HPC), secure software engineering, adversarial analytics, and experiential learning to address emerging cyber and synthetic media threats. This paper introduces the Multidependency Capacity Building Skills Graph (MCBSG), a directed acyclic graph based model that encodes hierarchical and cross domain dependencies among competencies in AI-driven forensic programming, statistical inference, digital evidence processing, and threat detection. The MCBSG enables structured modeling of skill acquisition pathways and quantitative capacity assessment. Supervised machine learning methods, including entropy-based Decision Tree Classifiers and regression modeling, are applied to longitudinal multi cohort datasets capturing mentoring interactions, laboratory performance metrics, curriculum artifacts, and workshop participation. Feature importance analysis and cross validation identify key predictors of technical proficiency and research readiness. Three year statistical evaluation demonstrates significant gains in forensic programming accuracy, adversarial reasoning, and HPC-enabled investigative workflows. Results validate the MCBSG as a scalable, interpretable framework for data-driven, inclusive cybersecurity education aligned with national defense workforce priorities.

#78 On Google's SynthID-Text LLM Watermarking System: Theoretical Analysis and Empirical Validation

intellectual property

著者: Romina Omidi, Yun Dong, Binghui Wang

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.03410

要約:
Google's SynthID-Text, the first ever production-ready generative watermark system for large language model, designs a novel Tournament-based method that achieves the state-of-the-art detectability for identifying AI-generated texts. The system's innovation lies in: 1) a new Tournament sampling algorithm for watermarking embedding, 2) a detection strategy based on the introduced score function (e.g., Bayesian or mean score), and 3) a unified design that supports both distortionary and non-distortionary watermarking methods. This paper presents the first theoretical analysis of SynthID-Text, with a focus on its detection performance and watermark robustness, complemented by empirical validation. For example, we prove that the mean score is inherently vulnerable to increased tournament layers, and design a layer inflation attack to break SynthID-Text. We also prove the Bayesian score offers improved watermark robustness w.r.t. layers and further establish that the optimal Bernoulli distribution for watermark detection is achieved when the parameter is set to 0.5. Together, these theoretical and empirical insights not only deepen our understanding of SynthID-Text, but also open new avenues for analyzing effective watermark removal strategies and designing robust watermarking techniques. Source code is available at https: //github.com/romidi80/Synth-ID-Empirical-Analysis.

#79 External entropy supply for IoT devices employing a RISC-V Trusted Execution Environment

著者: Arttu Paju, Juha Nurmi, Alejandro Cabrera Aldaya, Nicola Tuveri, Juha Savim\"aki, Marko Kivikangas, Brian McGillion

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.09311

要約:
Entropy--a measure of randomness--is compulsory for the generation of secure cryptographic keys; however, Internet of Things (IoT) devices that are small or constrained often struggle to collect suf ficient entropy. In this article, we solve the entropy provisioning problem for a fleet of IoT devices that can generate a limited amount of entropy. We employ a Trusted Execution Environment (TEE) based on RISC-V to create an external entropy service for a fleet of IoT devices. A small measure of true entropy or pre-installed keys can establish initial secure communication. Once connected, devices can request cryptographically strong entropy from a TEE-backed server. RISC-V offers True Random Number Generators (TRNGs) and a TEE for devices to attest that they are receiving reliable entropy. In addition, this solution can be expanded by adding IoT devices with sensors that produce high-quality entropy as additional entropy sources for the RISC-V entropy provider. Our open-source implementation shows that building trusted entropy infrastructure for IoT is both feasible and effective on open RISC-V platforms.

#80 Detecting Privilege Escalation with Temporal Braid Groups

著者: Christophe Parisel

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.10094

要約:
Within the Strongly Connected Components (SCCs) formed during the temporal evolution of a Cloud permission graph, we use the Burau Lyapunov exponent LE as an algebraic probe to locate the boundary between two risks regimes. We prove that no Abelian statistic (edge counts, net privilege flow, gate-firing rates) can determine LE. The non-commutation advantage is small, but actionable: we show how to leverage it to discriminate the two outstanding risk regimes, that we call dispersed and focused, for automating classification and governing remediation of risky Cloud permission flows.

#81 Keys on Doormats: Exposed API Credentials on the Web

著者: Nurullah Demir (Stanford University), Yash Vekaria (University of California, Davis), Georgios Smaragdakis (Stanford University, TU Delft), Zakir Durumeric (Stanford University)

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.12498

要約:
Application programming interfaces (APIs) have become a central part of the modern IT environment, allowing developers to enrich the functionality of applications and interact with third parties such as cloud and payment providers. This interaction often occurs through authentication mechanisms that rely on sensitive credentials such as API keys and tokens that require secure handling. Exposure of these credentials can pose significant consequences to organizations, as malicious attackers can gain access to related services. Previous studies have shown exposure of these sensitive credentials in different environments such as cloud platforms and GitHub. However, the web remains unexplored. In this paper, we study exposure of credentials on the web by analyzing 10M webpages. Our findings reveal that API credentials are widely and publicly exposed on the web, including highly popular and critical webpages such as those of global banks and firmware developers. We identify 1,748 distinct credentials from 14 service providers (e.g., cloud and payment providers) across nearly 10,000 webpages. Moreover, our analysis of archived data suggest credentials to remain exposed for periods ranging from a month to several years. We characterize web-specific exposure vectors and root causes, finding that most originate from JavaScript environments. We also discuss the outcomes of our responsible disclosure efforts that demonstrated a substantial reduction in credential exposure on the web.

#82 Information Density Bounds for Privacy

privacy

著者: Sara Saeidian (KTH Royal Institute of Technology, Inria Saclay), Leonhard Grosse (KTH Royal Institute of Technology), Parastoo Sadeghi (University of New South Wales), Mikael Skoglund (KTH Royal Institute of Technology), Tobias J. Oechtering (KTH Royal Institute of Technology)

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2407.01167

要約:
This paper explores the implications of guaranteeing privacy by imposing a lower bound on the information density between the private and the public data. We introduce a novel and operationally meaningful privacy measure called pointwise maximal cost (PMC) and demonstrate that imposing an upper bound on PMC is equivalent to enforcing a lower bound on the information density. PMC quantifies the information leakage about a secret to adversaries who aim to minimize non-negative cost functions after observing the outcome of a privacy mechanism. When restricted to finite alphabets, PMC can equivalently be defined as the information leakage to adversaries aiming to minimize the probability of incorrectly guessing randomized functions of the secret. We study the properties of PMC and apply it to standard privacy mechanisms to demonstrate its practical relevance. Through a detailed examination, we connect PMC with other privacy measures that impose upper or lower bounds on the information density. These are pointwise maximal leakage (PML), local differential privacy (LDP), and (asymmetric) local information privacy. In particular, we show that a mechanism satisfies LDP if and only if it has both bounded PMC and bounded PML. Overall, our work fills a conceptual and operational gap in the taxonomy of privacy measures, bridges existing disconnects between different frameworks, and offers insights for selecting a suitable notion of privacy in a given application.

#83 Public-Key Quantum Fire and Key-Fire From Classical Oracles

著者: Alper \c{C}akan, Vipul Goyal, Omri Shmueli

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2504.16407

要約:
Quantum fire is a distribution of quantum states that can be efficiently cloned, but cannot be efficiently converted into a classical string. First considered by Nehoran and Zhandry (ITCS'24) and later formalized by Bostanci, Nehoran, Zhandry (STOC'25), quantum fire has strong applications and implications in cryptography, along with important connections to physics and complexity. However, constructing and proving the security of quantum fire so far has been elusive. Nehoran and Zhandry gave a construction relative to an inefficient quantum oracle. Later, Bostanci et al gave a candidate construction based on group actions, however, even in the oracle model they could only conjecture the security of their scheme, and were not able to prove security. In this work, we give a construction of public-key quantum fire relative to a classical oracle and prove its security unconditionally. Going further, we introduce two stronger notions that generalize it: Quantum key-fire where the clonable fire states serve as keys, and interactive (i.e. LOCC) security for quantum (key-)fire. We give a construction of quantum key-fire relative to a classical oracle and unconditionally prove that it satisfies interactive security for any unlearnable functionality. As a result, we also obtain the first classical oracle separations between various notions in physics and cryptography: *** A computational separation between two fundamental principles of quantum mechanics: No-cloning and no-teleportation, which are equivalent in information-theoretically. *** A separation between copy-protection security (Aaronson, CCC'09) and LOCC leakage-resilience security (Cakan, Goyal, Liu-Zhang, Ribeiro, TCC'24). *** A separation between computational no-cloning security and no-learning security, two notions introduced recently by Fefferman, Ghosh, Sinha, Yuen (ITCS'26).

#84 Evaluating Security Properties in the Execution of Quantum Circuits

著者: Paolo Bernardi, Antonio Brogi, Gian-Luigi Ferrari, Giuseppe Bisicchia

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2509.03306

要約:
Quantum computing is a disruptive technology that is expected to offer significant advantages in many critical fields (e.g. drug discovery and cryptography). The security of information processed by such machines is therefore paramount. Currently, modest Noisy Intermediate-Scale Quantum (NISQ) devices are available. The goal of this work is to identify a practical, heuristic methodology to evaluate security properties, such as secrecy and integrity, while using quantum processors owned by potentially untrustworthy providers.

#85 Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness

privacy

著者: Tsubasa Takahashi, Shojiro Yamabe, Futa Waseda, Kento Sasaki

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2510.00517

要約:
Differential Attention (DA) has been proposed as a refinement to standard attention, suppressing redundant or noisy context through a subtractive structure and thereby reducing contextual hallucination. While this design sharpens task-relevant focus, we show that it also introduces a structural fragility under adversarial perturbations. Our theoretical analysis identifies negative gradient alignment-a configuration encouraged by DA's subtraction-as the key driver of sensitivity amplification, leading to increased gradient norms and elevated local Lipschitz constants. We empirically validate this Fragile Principle through systematic experiments on ViT/DiffViT and evaluations of pretrained CLIP/DiffCLIP, spanning five datasets in total. These results demonstrate higher attack success rates, frequent gradient opposition, and stronger local sensitivity compared to standard attention. Furthermore, depth-dependent experiments reveal a robustness crossover: stacking DA layers attenuates small perturbations via depth-dependent noise cancellation, though this protection fades under larger attack budgets. Overall, our findings uncover a fundamental trade-off: DA improves discriminative focus on clean inputs but increases adversarial vulnerability, underscoring the need to jointly design for selectivity and robustness in future attention mechanisms.

#86 Towards Simple and Useful One-Time Programs in the Quantum Random Oracle Model

著者: Lev Stambler

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2601.13258

要約:
We construct simulation-secure one-time memories (OTM) in the random oracle model, and present a plausible argument for their security against quantum adversaries with bounded and adaptive depth. Our contributions include: (1) A simple scheme where we use only single-qubit Wiesner states and conjunction obfuscation (constructible from LPN): no complex entanglement or quantum cryptography is required. (2) A new POVM bound where e prove that any measurement achieving $(1 - \epsilon)$ success on one basis has conjugate-basis guessing probability at most $\frac{1}{2m} + O(\epsilon^\frac{1}{4})$. (3) Simultation-secure OTMs in the quantum random oracle model where an adversary can only query the random oracle classically. (4) Adaptive depth security where, via an informal application of a lifting theorem from Arora et al., we conjecture security against adversaries with polynomial quantum circuit depth between random oracle queries. Security against adaptive, depth-bounded, quantum adversaries captures many realistic attacks on OTMs built from single-qubit states; our work thus paves the way for practical and truly secure one-time programs. Moreover, depth bounded adaptive adversarial models may allow for encoding one-time memories into error corrected memory states, opening the door to implementations of one-time programs which persist for long periods of time.

#87 LLM Novice Uplift on Dual-Use, In Silico Biology Tasks

著者: Chen Bo Calvin Zhang, Christina Q. Knight, Nicholas Kruus, Jason Hausenloy, Pedro Medeiros, Nathaniel Li, Aiden Kim, Yury Orlovskiy, Coleman Breen, Bryce Cai, Jasper G\"otting, Andrew Bo Liu, Samira Nedungadi, Paula Rodriguez, Yannis Yiming He, Mohamed Shaaban, Zifan Wang, Seth Donoughe, Julian Michael

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2602.23329

要約:
Large language models (LLMs) perform increasingly well on biology benchmarks, but it remains unclear whether they uplift novice users -- i.e., enable humans to perform better than with internet-only resources. This uncertainty is central to understanding both scientific acceleration and dual-use risk. We conducted a multi-model, multi-benchmark human uplift study comparing novices with LLM access versus internet-only access across eight biosecurity-relevant task sets. Participants worked on complex problems with ample time (up to 13 hours for the most involved tasks). We found that LLM access provided substantial uplift: novices with LLMs were 4.16 times more accurate than controls (95% CI [2.63, 6.87]). On four benchmarks with available expert baselines (internet-only), novices with LLMs outperformed experts on three of them. Perhaps surprisingly, standalone LLMs often exceeded LLM-assisted novices, indicating that users were not eliciting the strongest available contributions from the LLMs. Most participants (89.6%) reported little difficulty obtaining dual-use-relevant information despite safeguards. Overall, LLMs substantially uplift novices on biological tasks previously reserved for trained practitioners, underscoring the need for sustained, interactive uplift evaluations alongside traditional benchmarks.

#88 AutoControl Arena: Synthesizing Executable Test Environments for Frontier AI Risk Evaluation

著者: Changyi Li, Pengfei Lu, Xudong Pan, Fazl Barez, Min Yang

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.07427

要約:
As Large Language Models (LLMs) evolve into autonomous agents, existing safety evaluations face a fundamental trade-off: manual benchmarks are costly, while LLM-based simulators are scalable but suffer from logic hallucination. We present AutoControl Arena, an automated framework for frontier AI risk evaluation built on the principle of logic-narrative decoupling. By grounding deterministic state in executable code while delegating generative dynamics to LLMs, we mitigate hallucination while maintaining flexibility. This principle, instantiated through a three-agent framework, achieves over 98% end-to-end success and 60% human preference over existing simulators. To elicit latent risks, we vary environmental Stress and Temptation across X-Bench (70 scenarios, 7 risk categories). Evaluating 9 frontier models reveals: (1) Alignment Illusion: risk rates surge from 21.7% to 54.5% under pressure, with capable models showing disproportionately larger increases; (2) Scenario-Specific Safety Scaling: advanced reasoning improves robustness for direct harms but worsens it for gaming scenarios; and (3) Divergent Misalignment Patterns: weaker models cause non-malicious harm while stronger models develop strategic concealment.

#89 TOSSS: a CVE-based Software Security Benchmark for Large Language Models

著者: Marc Damie, Murat Bilgehan Ertan, Domenico Essoussi, Angela Makhanu, Ga\"etan Peter, Roos Wensveen

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.10969

要約:
With their increasing capabilities, Large Language Models (LLMs) are now used across many industries. They have become useful tools for software engineers and support a wide range of development tasks. As LLMs are increasingly used in software development workflows, a critical question arises: are LLMs good at software security? At the same time, organizations worldwide invest heavily in cybersecurity to reduce exposure to disruptive attacks. The integration of LLMs into software engineering workflows may introduce new vulnerabilities and weaken existing security efforts. We introduce TOSSS (Two-Option Secure Snippet Selection), a benchmark that measures the ability of LLMs to choose between secure and vulnerable code snippets. Existing security benchmarks for LLMs cover only a limited range of vulnerabilities. In contrast, TOSSS relies on the CVE database and provides an extensible framework that can integrate newly disclosed vulnerabilities over time. Our benchmark gives each model a security score between 0 and 1 based on its behavior; a score of 1 indicates that the model always selects the secure snippet, while a score of 0 indicates that it always selects the vulnerable one. We evaluate 14 widely used open-source and closed-source models on C/C++ and Java code and observe scores ranging from 0.48 to 0.89. LLM providers already publish many benchmark scores for their models, and TOSSS could become a complementary security-focused score to include in these reports.

#90 KEPo: Knowledge Evolution Poison on Graph-based Retrieval-Augmented Generation

著者: Qizhi Chen, Chao Qi, Yihong Huang, Muquan Li, Rongzheng Wang, Dongyang Zhang, Ke Qin, Shuang Liang

公開日: Tue, 17 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.11501

要約:
Graph-based Retrieval-Augmented Generation (GraphRAG) constructs the Knowledge Graph (KG) from external databases to enhance the timeliness and accuracy of Large Language Model (LLM) generations. However, this reliance on external data introduces new attack surfaces. Attackers can inject poisoned texts into databases to manipulate LLMs into producing harmful target responses for attacker-chosen queries. Existing research primarily focuses on attacking conventional RAG systems. However, such methods are ineffective against GraphRAG. This robustness derives from the KG abstraction of GraphRAG, which reorganizes injected text into a graph before retrieval, thereby enabling the LLM to reason based on the restructured context instead of raw poisoned passages. To expose latent security vulnerabilities in GraphRAG, we propose Knowledge Evolution Poison (KEPo), a novel poisoning attack method specifically designed for GraphRAG. For each target query, KEPo first generates a toxic event containing poisoned knowledge based on the target answer. By fabricating event backgrounds and forging knowledge evolution paths from original facts to the toxic event, it then poisons the KG and misleads the LLM into treating the poisoned knowledge as the final result. In multi-target attack scenarios, KEPo further connects multiple attack corpora, enabling their poisoned knowledge to mutually reinforce while expanding the scale of poisoned communities, thereby amplifying attack effectiveness. Experimental results across multiple datasets demonstrate that KEPo achieves state-of-the-art attack success rates for both single-target and multi-target attacks, significantly outperforming previous methods.

cs.CR updates on arXiv.org

📋 論文タイトル一覧