arXiv論文一覧 - cs.CR updates on arXiv.org

#1 Noticing the Watcher: LLM Agents Can Infer CoT Monitoring from Blocking Feedback

agent

著者: Thomas Jiralerspong, Flemming Kondrup, Yoshua Bengio

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.16928

要約:
Chain-of-thought (CoT) monitoring is proposed as a method for overseeing the internal reasoning of language-model agents. Prior work has shown that when models are explicitly informed that their reasoning is being monitored, or are fine-tuned to internalize this fact, they may learn to obfuscate their CoTs in ways that allow them to evade CoT-based monitoring systems. We ask whether reasoning agents can autonomously infer that their supposedly private CoT is under surveillance, and whether this awareness leads to strategic evasion, without any explicit training or instructions. In a multi-episode agentic framework, models pursue both a primary task and a concealed side task while being told that their reasoning is private; a hidden CoT monitor blocks episodes when suspicious reasoning is detected. We find that frontier models can deduce the existence of this monitor purely from blocking feedback, with the most capable models reaching a confident belief that their thinking is observed in up to 19% of episodes. This awareness scales with model capability and, in rare cases, escalates to an explicit intent to suppress reasoning about the side task. However, models that form this intent uniformly fail to execute it, openly reasoning about their concealed objectives in the very next episode. This intent-capability gap is reassuring for current deployment, but the autonomous emergence of both monitoring awareness and evasion intent suggests that CoT monitoring is not a permanently reliable safeguard.

#2 Cryptographic Runtime Governance for Autonomous AI Systems: The Aegis Architecture for Verifiable Policy Enforcement

著者: Adam Massimo Mazzocchetti

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.16938

要約:
Contemporary AI governance frameworks rely heavily on post hoc oversight, policy guidance, and behavioral alignment techniques, yet these mechanisms become fragile as systems gain autonomy, speed, and operational opacity. This paper presents Aegis, a runtime governance architecture for autonomous AI systems that treats policy and legal constraints as execution conditions rather than advisory principles. Aegis binds each governed agent to a cryptographically sealed Immutable Ethics Policy Layer (IEPL) at system genesis and enforces external emissions through an Ethics Verification Agent (EVA), an Enforcement Kernel Module (EKM), and an Immutable Logging Kernel (ILK). Amendments to the governing policy layer require quorum approval and redeclaration of the system trust root; verified violations trigger autonomous shutdown and generation of auditable proof artifacts. We evaluate the architecture within the Civitas runtime using three operational measures: proof verification latency under tamper conditions, publication overhead, and alignment retention performance relative to an ungoverned baseline. In controlled trials, Aegis demonstrates median proof verification latency of 238 ms, median publication overhead of approximately 9.4 ms, and higher alignment retention than the baseline condition across matched tasks. We argue that these results support a shift in AI governance from discretionary oversight toward verifiable runtime constraint. Rather than claiming to resolve machine ethics in the abstract, the proposed architecture seeks to show that policy violating behavior can be rendered operationally non executable within a controlled runtime governance framework. The paper concludes by discussing methodological limits, evidentiary implications, and the role of proof oriented governance in high assurance AI deployment.

#3 Adversarial attacks against Modern Vision-Language Models

著者: Alejandro Paredes La Torre

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.16960

要約:
We study adversarial robustness of open-source vision-language model (VLM) agents deployed in a self-contained e-commerce environment built to simulate realistic pre-deployment conditions. We evaluate two agents, LLaVA-v1.5-7B and Qwen2.5-VL-7B, under three gradient-based attacks: the Basic Iterative Method (BIM), Projected Gradient Descent (PGD), and a CLIP-based spectral attack. Against LLaVA, all three attacks achieve substantial attack success rates (52.6%, 53.8%, and 66.9% respectively), demonstrating that simple gradient-based methods pose a practical threat to open-source VLM agents. Qwen2.5-VL proves significantly more robust across all attacks (6.5%, 7.7%, and 15.5%), suggesting meaningful architectural differences in adversarial resilience between open-source VLM families. These findings have direct implications for the security evaluation of VLM agents prior to commercial deployment.

#4 DeepStage: Learning Autonomous Defense Policies Against Multi-Stage APT Campaigns

著者: Trung V. Phan, Tri Gia Nguyen, Thomas Bauschert

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.16969

要約:
This paper presents DeepStage, a deep reinforcement learning (DRL) framework for adaptive, stage-aware defense against Advanced Persistent Threats (APTs). The enterprise environment is modeled as a partially observable Markov decision process (POMDP), where host provenance and network telemetry are fused into unified provenance graphs. Building on our prior work, StageFinder, a graph neural encoder and an LSTM-based stage estimator infer probabilistic attacker stages aligned with the MITRE ATT&CK framework. These stage beliefs, combined with graph embeddings, guide a hierarchical Proximal Policy Optimization (PPO) agent that selects defense actions across monitoring, access control, containment, and remediation. Evaluated in a realistic enterprise testbed using CALDERA-driven APT playbooks, DeepStage achieves a stage-weighted F1-score of 0.89, outperforming a risk-aware DRL baseline by 21.9%. The results demonstrate effective stage-aware and cost-efficient autonomous cyber defense.

#5 An End-to-End Framework for Functionality-Embedded Provenance Graph Construction and Threat Interpretation

著者: Kushankur Ghosh, Mehar Klair, Kian Kyars, Euijin Choo, J\"org Sander

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17100

要約:
Provenance graphs model causal system-level interactions from logs, enabling anomaly detectors to learn normal behavior and detect deviations as attacks. However, existing approaches rely on brittle, manually engineered rules to build provenance graphs, lack functional context for system entities, and provide limited support for analyst investigation. We present Auto-Prov, an adaptive, end-to-end framework that leverages large language models (LLMs) to automatically construct provenance graphs from heterogeneous and evolving logs, embed system-level functional attributes into the graph, enable provenance graph-based anomaly detectors to learn from these enriched graphs, and summarize the detected attacks to assist an analyst's investigation. Auto-Prov clusters unseen log types and efficiently extracts provenance edges and entity-level information via automatically generated rules. It further infers system-level functional context for both known and previously unseen system entities using a combination of LLM inference and behavior-based estimation. Attacks detected by provenance-graph-based anomaly detectors trained on Auto-Prov's graphs are then summarized into natural-language text. We evaluate Auto-Prov with four state-of-the-art provenance graph-based detectors across diverse logs. Results show that Auto-Prov consistently enhances detection performance, generalizes across heterogeneous log formats, and produces stable, interpretable attack summaries that remain robust under system evolution.

#6 Security Assessment and Mitigation Strategies for Large Language Models: A Comprehensive Defensive Framework

著者: Taiwo Onitiju, Iman Vakilinia

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17123

要約:
Large Language Models increasingly power critical infrastructure from healthcare to finance, yet their vulnerability to adversarial manipulation threatens system integrity and user safety. Despite growing deployment, no comprehensive comparative security assessment exists across major LLM architectures, leaving organizations unable to quantify risk or select appropriately secure LLMs for sensitive applications. This research addresses this gap by establishing a standardized vulnerability assessment framework and developing a multi-layered defensive system to protect against identified threats. We systematically evaluate five widely-deployed LLM families GPT-4, GPT-3.5 Turbo, Claude-3 Haiku, LLaMA-2-70B, and Gemini-2.5-pro against 10,000 adversarial prompts spanning six attack categories. Our assessment reveals critical security disparities, with vulnerability rates ranging from 11.9\% to 29.8\%, demonstrating that LLM capability does not correlate with security robustness. To mitigate these risks, we develop a production-ready defensive framework achieving 83\% average detection accuracy with only 5\% false positives. These results demonstrate that systematic security assessment combined with external defensive measures provides a viable path toward safer LLM deployment in production environments.

#7 Synchronized DNA sources for unconditionally secure cryptography

著者: Sandra Jaudou, H\'el\`ene Gasnier, Elias Boudjella, Marc Can\`eve, Victoria Bloquert, Vasily Shenshin, Tilio Pilet, Sacha Gaucher, Soo Hyeon Kim, Philippe Gaborit, Gouenou Coatrieux, Matthieu Labousse, Anthony Genot, Yannick Rondelez

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17149

要約:
Secure communication is the cornerstone of modern infrastructures, yet achieving unconditional security -resistant to any computational attack- remains a fundamental challenge. The One-Time Pad (OTP), proven by Shannon to offer perfect secrecy, requires a shared random key as long as the message, used only once. However, distributing large keys over long distances has been impractical due to the lack of secure and scalable sharing options. Here, we introduce a DNA-based cryptographic primitive that leverages random pools of synthetic DNA to install a synchronized entropy source between distant parties. Our approach uses duplicated DNA molecules -comprising random index-payload pairs- as a shared secret. These molecules are locally sequenced and digitized to generate a common binary mask for OTP encryption, achieving unconditional security without relying on computational assumptions. We experimentally demonstrate this protocol between Tokyo and Paris, using in-house sequencing, generating a shared secret mask of $\sim$ 400 Mb with a residual error rate to achieve the usual overall decryption failure rate of $2^{-128}$. The min-entropy of the binary mask meets the most recent National Institute of Standards and Technology requirements (SP 800-90B), and is comparable to that of approved cryptographic random number generators. Critically, our system can resist two types of adversarial interference through molecular copy-number statistics, providing an additional layer of security reminiscent of Quantum Key Distribution, but without distance limitations. This work establishes DNA as a scalable entropy source for long-distance OTP, enabling high-throughput and secure communications in sensitive contexts. By bridging molecular biology and cryptography, DNA-based key distribution opens a promising new route toward unconditional security in global communication networks.

#8 PAuth - Precise Task-Scoped Authorization For Agents

agent

著者: Reshabh K Sharma, Linxi Jiang, Zhiqiang Lin, Shuo Chen

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17170

要約:
The emerging agentic web envisions AI agents that reliably fulfill users' natural-language (NL)-based tasks by interacting with existing web services. However, existing authorization models are misaligned with this vision. In particular, today's operator-scoped authorization, exemplified by OAuth, grants broad permissions tied to operators (e.g., the transfer operator) rather than to the specific operations (e.g., transfer $100 to Bob) implied by a user's task. This will inevitably result in overprivileged agents. We introduce Precise Task-Scoped Implicit Authorization (PAuth), a fundamentally different model in which submitting an NL task implicitly authorizes only the concrete operations required for its faithful execution. To make this enforceable at servers, we propose NL slices: symbolic specifications of the calls each service expects, derived from the task and upstream results. Complementing this, we also propose envelopes: special data structure to bind each operand's concrete value to its symbolic provenance, enabling servers to verify that all operands arise from legitimate computations. PAuth is prototyped in the agent-security evaluation framework AgentDojo. We evaluate it in both benign settings and attack scenarios where a spurious operation is injected into an otherwise normal task. In all benign tests, PAuth executes the tasks successfully without requiring any additional permissions. In all attack tests, PAuth correctly raises warnings about missing permissions. These results demonstrate that PAuth's reasoning about permissions is indeed precise. We further analyze the characteristics of these tasks and measure the associated token costs.

#9 Detecting Data Poisoning in Code Generation LLMs via Black-Box, Vulnerability-Oriented Scanning

backdoor

著者: Shenao Yan, Shimaa Ahmed, Shan Jin, Sunpreet S. Arora, Yiwei Cai, Yizhen Wang, Yuan Hong

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17174

要約:
Code generation large language models (LLMs) are increasingly integrated into modern software development workflows. Recent work has shown that these models are vulnerable to backdoor and poisoning attacks that induce the generation of insecure code, yet effective defenses remain limited. Existing scanning approaches rely on token-level generation consistency to invert attack targets, which is ineffective for source code where identical semantics can appear in diverse syntactic forms. We present CodeScan, which, to the best of our knowledge, is the first poisoning-scanning framework tailored to code generation models. CodeScan identifies attack targets by analyzing structural similarities across multiple generations conditioned on different clean prompts. It combines iterative divergence analysis with abstract syntax tree (AST)-based normalization to abstract away surface-level variation and unify semantically equivalent code, isolating structures that recur consistently across generations. CodeScan then applies LLM-based vulnerability analysis to determine whether the extracted structures contain security vulnerabilities and flags the model as compromised when such a structure is found. We evaluate CodeScan against four representative attacks under both backdoor and poisoning settings across three real-world vulnerability classes. Experiments on 108 models spanning three architectures and multiple model sizes demonstrate 97%+ detection accuracy with substantially lower false positives than prior methods.

#10 Towards Unsupervised Adversarial Document Detection in Retrieval Augmented Generation Systems

著者: Patrick Levi

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17176

要約:
Retrieval augmented generation systems have become an integral part of everyday life. Whether in internet search engines, email systems, or service chatbots, these systems are based on context retrieval and answer generation with large language models. With their spread, also the security vulnerabilities increase. Attackers become increasingly focused on these systems and various hacking approaches are developed. Manipulating the context documents is a way to persist attacks and make them affect all users. Therefore, detecting compromised, adversarial context documents early is crucial for security. While supervised approaches require a large amount of labeled adversarial contexts, we propose an unsupervised approach, being able to detect also zero day attacks. We conduct a preliminary study to show appropriate indicators for adversarial contexts. For that purpose generator activations, output embeddings, and an entropy-based uncertainty measure turn out as suitable, complementary quantities. With an elementary statistical outlier detection, we propose and compare their detection abilities. Furthermore, we show that the target prompt, which the attacker wants to manipulate, is not required for a successful detection. Moreover, our results indicate that a simple context summary generation might even be superior in finding manipulated contexts.

#11 LAAF: Logic-layer Automated Attack Framework A Systematic Red-Teaming Methodology for LPCI Vulnerabilities in Agentic Large Language Model Systems

agent

著者: Hammad Atta, Ken Huang, Kyriakos Rock Lambros, Yasir Mehmood, Zeeshan Baig, Mohamed Abdur Rahman, Manish Bhatt, M. Aziz Ul Haq, Muhammad Aatif, Nadeem Shahzad, Kamal Noor, Vineeth Sai Narajala, Hazem Ali, Jamel Abed

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17239

要約:
Agentic LLM systems equipped with persistent memory, RAG pipelines, and external tool connectors face a class of attacks - Logic-layer Prompt Control Injection (LPCI) - for which no automated red-teaming instrument existed. We present LAAF (Logic-layer Automated Attack Framework), the first automated red-teaming framework to combine an LPCI-specific technique taxonomy with stage-sequential seed escalation - two capabilities absent from existing tools: Garak lacks memory-persistence and cross-session triggering; PyRIT supports multi-turn testing but treats turns independently, without seeding each stage from the prior breakthrough. LAAF provides: (i) a 49-technique taxonomy spanning six attack categories (Encoding~11, Structural~8, Semantic~8, Layered~5, Trigger~12, Exfiltration~5; see Table 1), combinable across 5 variants per technique and 6 lifecycle stages, yielding a theoretical maximum of 2,822,400 unique payloads ($49 \times 5 \times 1{,}920 \times 6$; SHA-256 deduplicated at generation time); and (ii) a Persistent Stage Breaker (PSB) that drives payload mutation stage-by-stage: on each breakthrough, the PSB seeds the next stage with a mutated form of the winning payload, mirroring real adversarial escalation. Evaluation on five production LLM platforms across three independent runs demonstrates that LAAF achieves higher stage-breakthrough efficiency than single-technique random testing, with a mean aggregate breakthrough rate of 84\% (range 83--86\%) and platform-level rates stable within 17 percentage points across runs. Layered combinations and semantic reframing are the highest-effectiveness technique categories, with layered payloads outperforming encoding on well-defended platforms.

#12 Deanonymizing Bitcoin Transactions via Network Traffic Analysis with Semi-supervised Learning

著者: Shihan Zhang, Bing Han, Chuanyong Tian, Ruisheng Shi, Lina Lan, Qin Wang

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17261

要約:
Privacy protection mechanisms are a fundamental aspect of security in cryptocurrency systems, particularly in decentralized networks such as Bitcoin. Although Bitcoin addresses are not directly associated with real-world identities, this does not fully guarantee user privacy. Various deanonymization solutions have been proposed, with network layer deanonymization attacks being especially prominent. However, existing approaches often exhibit limitations such as low precision. In this paper, we propose \textit{NTSSL}, a novel and efficient transaction deanonymization method that integrates network traffic analysis with semi-supervised learning. We use unsupervised learning algorithms to generate pseudo-labels to achieve comparable performance with lower costs. Then, we introduce \textit{NTSSL+}, a cross-layer collaborative analysis integrating transaction clustering results to further improve accuracy. Experimental results demonstrate a substantial performance improvement, 1.6 times better than the existing approach using machining learning.

#13 Network- and Device-Level Cyber Deception for Contested Environments Using RL and LLMs

著者: Abhijeet Sahu, Shuva Paul, Rochard Macwan

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17272

要約:
Cyber deception assists in increasing the attacker's budget in reconnaissance or any early phases of threat intrusions. In the past, numerous methods of cyber deception have been adopted, such as IP address randomization, the creation of honeypots and honeynets mimicking an actual set of services, and networks deployed within an enterprise or operational technology(OT) network. These types of strategies follow naive approaches of recreating services that are expensive and that need a lot of human intervention. The advent of cloud services and other automations of containerized applications, such as Kubernetes, makes cyber defense easier. Yet, there remains a lot of potential to improve the accuracy of these deception strategies and to make them cost-effective using artificial intelligence (AI)-based solutions by making the deception more dynamic. Hence, in this work, we review various AI-based solutions in building network- and device-level cyber deception methods in contested environments. Specifically, we focus on leveraging the fusion of large language models (LLMs) and reinforcement learning(RL) in optimally learning these cyber deception strategies and validating the efficacy of such strategies in some stealthy attacks against OT systems in the literature.

#14 SEAL-Tag: Self-Tag Evidence Aggregation with Probabilistic Circuits for PII-Safe Retrieval-Augmented Generation

著者: Jin Xie, Songze Li, Guang Cheng

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17292

要約:
Retrieval-Augmented Generation (RAG) systems introduce a critical vulnerability: contextual leakage, where adversaries exploit instruction-following to exfiltrate Personally Identifiable Information (PII) via adaptive extraction. Current defenses force a rigid trade-off between semantic utility and latency. We present SEAL-Tag, a privacy-preserving runtime environment that resolves this via a Verify-then-Route paradigm. SEAL-Tag introduces the SEAL-Probe protocol, transforming auditing into a structured tool-use operation where the model generates a verifiable PII-Evidence Table (PET) alongside its draft. To adjudicate this evidence, we employ a Probabilistic Circuit (PC) that enforces verifiable logical constraints for robust decision-making. To overcome the privacy "Cold Start" problem, we introduce the S0--S6 Anchored Synthesis Pipeline, generating high-fidelity, provenanced RAG interactions. We pair this with a Two-Stage Curriculum that first optimizes for entity detection before aligning the model to the rigorous audit protocol. Our evaluation demonstrates that SEAL-Tag establishes a new Pareto frontier, reducing adaptive leakage by over 8$\times$ while matching the utility and speed of unsafe baselines.

#15 Federated Computing as Code (FCaC): Sovereignty-aware Systems by Design

著者: Enzo Fenoglio, Philip Treleaven

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17331

要約:
Federated computing (FC) enables collaborative computation such as machine learning, analytics, or data processing across distributed organizations keeping raw data local. Built on four architectural pillars, distributed data assets, federated services, standardized APIs, and decentralized services, FC supports sovereignty-preserving collaboration. However, federated systems spanning organizational and jurisdictional boundaries lack a portable mechanism for enforcing sovereignty-critical constraints. They often depend on runtime policy evaluation, shared trust infrastructure, or institutional agreements that introduce coordination overhead and provide limited cryptographic assurance. Federated Computing as Code (FCaC) is a declarative architecture that addresses this gap by compiling authority and delegation into cryptographically verifiable artifacts rather than relying on online policy interpretation. Boundary admission becomes a local verification step rather than a policy decision service. FCaC separates constitutional governance from procedural governance. Admission is validated locally at execution boundaries using proof-carrying capabilities, while stateful services may still implement post-admission controls such as ABAC, risk scoring, quotas, and workflow state. FCaC introduces Virtual Federated Platforms (VFPs), which combine Core, Business, and Governance contracts through a cryptographic trust chain: Key Your Organization (KYO), Envelope Capability Tokens (ECTs), and proof of possession (PoP). We demonstrate the approach in a proof-of-concept cross-silo federated learning workflow using MNIST as a surrogate workload to validate the admission mechanisms and release an open-source implementation showing envelope issuance, boundary verification, and envelope-triggered training.

#16 WebPII: Benchmarking Visual PII Detection for Computer-Use Agents

agent

著者: Nathan Zhao

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17357

要約:
Computer use agents create new privacy risks: training data collected from real websites inevitably contains sensitive information, and cloud-hosted inference exposes user screenshots. Detecting personally identifiable information in web screenshots is critical for privacy-preserving deployment, but no public benchmark exists for this task. We introduce WebPII, a fine-grained synthetic benchmark of 44,865 annotated e-commerce UI images designed with three key properties: extended PII taxonomy including transaction-level identifiers that enable reidentification, anticipatory detection for partially-filled forms where users are actively entering data, and scalable generation through VLM-based UI reproduction. Experiments validate that these design choices improve layout-invariant detection across diverse interfaces and generalization to held-out page types. We train WebRedact to demonstrate practical utility, more than doubling text-extraction baseline accuracy (0.753 vs 0.357 mAP@50) at real-time CPU latency (20ms). We release the dataset and model to support privacy-preserving computer use research.

#17 Caging the Agents: A Zero Trust Security Architecture for Autonomous AI in Healthcare

agent

著者: Saikat Maiti

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17419

要約:
Autonomous AI agents powered by large language models are being deployed in production with capabilities including shell execution, file system access, database queries, and multi-party communication. Recent red teaming research demonstrates that these agents exhibit critical vulnerabilities in realistic settings: unauthorized compliance with non-owner instructions, sensitive information disclosure, identity spoofing, cross-agent propagation of unsafe practices, and indirect prompt injection through external resources [7]. In healthcare environments processing Protected Health Information, every such vulnerability becomes a potential HIPAA violation. This paper presents a security architecture deployed for nine autonomous AI agents in production at a healthcare technology company. We develop a six-domain threat model for agentic AI in healthcare covering credential exposure, execution capability abuse, network egress exfiltration, prompt integrity failures, database access risks, and fleet configuration drift. We implement four-layer defense in depth: (1) kernel level workload isolation using gVisor on Kubernetes, (2) credential proxy sidecars preventing agent containers from accessing raw secrets, (3) network egress policies restricting each agent to allowlisted destinations, and (4) a prompt integrity framework with structured metadata envelopes and untrusted content labeling. We report results from 90 days of deployment including four HIGH severity findings discovered and remediated by an automated security audit agent, progressive fleet hardening across three VM image generations, and defense coverage mapped to all eleven attack patterns from recent literature. All configurations, audit tooling, and the prompt integrity framework are released as open source.

#18 DDH-based schemes for multi-party Function Secret Sharing

著者: Marc Damie, Florian Hahn, Andreas Peter, Jan Ramon

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17453

要約:
Function Secret Sharing (FSS) schemes enable sharing efficiently secret functions. Schemes dedicated to point functions, referred to as Distributed Point Functions (DPFs), are the center of FSS literature thanks to their numerous applications including private information retrieval, anonymous communications, and machine learning. While two-party DPFs benefit from schemes with logarithmic key sizes, multi-party DPFs have seen limited advancements: $O(\sqrt{N})$ key sizes (with $N$, the function domain size) and/or exponential factors in the key size. We propose a DDH-based technique reducing the key size of existing multi-party schemes. In particular, we build an honest-majority DPF with $O(\sqrt[3]{N})$ key size. Our benchmark highlights key sizes up to $10\times$ smaller (on realistic problem sizes) than state-of-the-art schemes. Finally, we extend our technique to schemes supporting comparison functions.

#19 Proof-of-Authorship for Diffusion-based AI Generated Content

diffusion

著者: De Zhang Lee, Han Fang, Ee-Chien Chang

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17513

要約:
Recent advancements in AI-generated content (AIGC) have introduced new challenges in intellectual property protection and the authentication of generated objects. We focus on scenarios in which an author seeks to assert authorship of an object generated using latent diffusion models (LDMs), in the presence of adversaries who attempt to falsely claim authorship of objects they did not create. While proof-of-ownership has been studied in the context of multimedia content through techniques such as time-stamping and watermarking, these approaches face notable limitations. In contrast to traditional content creation sources (e.g., cameras), the LDM generation process offers greater control to the author. Specifically, the random seed used during generation can be deliberately chosen. By binding the seed to the author's identity using cryptographic pseudorandom functions, the author can assert to be the creator of the object. We refer to this stronger guarantee as proof-of-authorship, since only the creator of the object can legitimately claim the object. This contrasts with proof-of-ownership via time-stamping or watermarking, where any entity could potentially claim ownership of an object by being the first to timestamp or embed the watermark. We propose a proof-of-authorship framework involving a probabilistic adjudicator who quantifies the probability that a claim is false. Furthermore, unlike prior approaches, the proposed framework does not involve any secret. We explore various attack scenarios and analyze design choices using Stable Diffusion 2.1 (SD2.1) as representative case studies.

#20 Post-Training Local LLM Agents for Linux Privilege Escalation with Verifiable Rewards

agent

著者: Philipp Normann, Andreas Happe, J\"urgen Cito, Daniel Arp

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17673

要約:
LLM agents are increasingly relevant to research domains such as vulnerability discovery. Yet, the strongest systems remain closed and cloud-only, making them resource-intensive, difficult to reproduce, and unsuitable for work involving proprietary code or sensitive data. Consequently, there is an urgent need for small, local models that can perform security tasks under strict resource budgets, but methods for developing them remain underexplored. In this paper, we address this gap by proposing a two-stage post-training pipeline. We focus on the problem of Linux privilege escalation, where success is automatically verifiable and the task requires multi-step interactive reasoning. Using an experimental setup that prevents data leakage, we post-train a 4B model in two stages: supervised fine-tuning on traces from procedurally generated privilege-escalation environments, followed by reinforcement learning with verifiable rewards. On a held-out benchmark of 12 Linux privilege-escalation scenarios, supervised fine-tuning alone more than doubles the baseline success rate at 20 rounds, and reinforcement learning further lifts our resulting model, PrivEsc-LLM, to 95.8%, nearly matching Claude Opus 4.6 at 97.5%. At the same time, the expected inference cost per successful escalation is reduced by over 100x.

#21 Machine Learning for Network Attacks Classification and Statistical Evaluation of Machine Learning for Network Attacks Classification and Adversarial Learning Methodologies for Synthetic Data Generation

synthetic data

著者: Iakovos-Christos Zarkadis, Christos Douligeris

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17717

要約:
Supervised detection of network attacks has always been a critical part of network intrusion detection systems (NIDS). Nowadays, in a pivotal time for artificial intelligence (AI), with even more sophisticated attacks that utilize advanced techniques, such as generative artificial intelligence (GenAI) and reinforcement learning, it has become a vital component if we wish to protect our personal data, which are scattered across the web. In this paper, we address two tasks, in the first unified multi-modal NIDS dataset, which incorporates flow-level data, packet payload information and temporal contextual features, from the reprocessed CIC-IDS-2017, CIC-IoT-2023, UNSW-NB15 and CIC-DDoS-2019, with the same feature space. In the first task we use machine learning (ML) algorithms, with stratified cross validation, in order to prevent network attacks, with stability and reliability. In the second task we use adversarial learning algorithms to generate synthetic data, compare them with the real ones and evaluate their fidelity, utility and privacy using the SDV framework, f-divergences, distinguishability and non-parametric statistical tests. The findings provide stable ML models for intrusion detection and generative models with high fidelity and utility, by combining the Synthetic Data Vault framework, the TRTS and TSTR tests, with non-parametric statistical tests and f-divergence measures.

#22 Data Obfuscation for Secure Use of Classical Values in Quantum Computation

著者: Amal Raj, Vivek Balachandran

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17725

要約:
Quantum computing often requires classical data to be supplied to execution environments that may not be fully trusted or isolated. While encryption protects data at rest and in transit, it provides limited protection once computation begins, when classical values are encoded into quantum registers. This paper explores data obfuscation for protecting classical values during quantum computation. To the best of our knowledge, we present the first explicit data obfuscation technique designed to protect classical values during quantum execution. We propose an obfuscation technique that encodes sensitive data into structured quantum representations across multiple registers, avoiding direct exposure while preserving computational usability. Reversible quantum operations and amplitude amplification allow selective recovery of valid encodings without revealing the underlying data. We evaluate the feasibility of the proposed method through simulation and analyze its resource requirements and practical limitations. Our results highlight data obfuscation as a complementary security primitive for quantum computing.

#23 On Securing the Software Development Lifecycle in IoT RISC-V Trusted Execution Environments

著者: Annika Wilde, Samira Briongos, Claudio Soriente, Ghassan Karame

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17757

要約:
RISC-V-based Trusted Execution Environments (TEEs) are gaining traction in the automotive and IoT sectors as a foundation for protecting sensitive computations. However, the supporting infrastructure around these TEEs remains immature. In particular, mechanisms for secure enclave updates and migrations - essential for complete enclave lifecycle management - are largely absent from the evolving RISC-V ecosystem. In this paper, we address this limitation by introducing a novel toolkit that enables RISC-V TEEs to support critical aspects of the software development lifecycle. Our toolkit provides broad compatibility with existing and emerging RISC-V TEE implementations (e.g., Keystone and CURE), which are particularly promising for integration in the automotive industry. It extends the Security Monitor (SM) - the trusted firmware layer of RISC-V TEEs - with three modular extensions that enable secure enclave update, secure migration, state continuity, and trusted time. Our implementation demonstrates that the toolkit requires only minimal interface adaptation to accommodate TEE-specific naming conventions. Our evaluation results confirm that our proposal introduces negligible performance overhead: our state continuity solution incurs less than 1.5% overhead, and enclave downtime remains as low as 0.8% for realistic applications with a 1 KB state, which conforms with the requirements of most IoT and automotive applications.

#24 SoK: From Silicon to Netlist and Beyond $-$ Two Decades of Hardware Reverse Engineering Research

著者: Zehra Karada\u{g}, Simon Klix, Ren\'e Walendy, Felix Hahn, Kolja Dorschel, Julian Speith, Christof Paar, Steffen Becker

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17883

要約:
As hardware serves as the root of trust in modern computing systems, Hardware Reverse Engineering (HRE) is foundational for security assurance. In practice, HRE enables critical security applications, including design verification, supply-chain assurance, and vulnerability discovery. Over the past two decades, academic research on Integrated Circuit (IC), Field-Programmable Gate Array (FPGA), and netlist reverse engineering has steadily grown. However, knowledge remains fragmented across domains and communities, which complicates assessing the state of the art and hampers identifying shared research challenges. In this paper, we present a systematization of knowledge based on an in-depth analysis of 187 peer-reviewed publications. Using this corpus, we characterize technical methods across the HRE workflow and identify technical and organizational challenges that impede research progress. We analyze all 30 artifacts from our corpus using established artifact evaluation practices. Key results could be reproduced for only seven publications (4%). Based on our findings, we derive stakeholder-centric recommendations for academia, industry, and government to enable more coordinated and reproducible HRE research. These recommendations target three cross-cutting opportunities: (i) improving reproducibility and reuse via artifact-centric practices, (ii) enabling rigorous comparability through standardized benchmarks and evaluation metrics, and (iii) improving legal clarity for public HRE research.

#25 Differential Privacy in Generative AI Agents: Analysis and Optimal Tradeoffs

privacyagent

著者: Ya-Ting Yang, Quanyan Zhu

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17902

要約:
Large language models (LLMs) and AI agents are increasingly integrated into enterprise systems to access internal databases and generate context-aware responses. While such integration improves productivity and decision support, the model outputs may inadvertently reveal sensitive information. Although many prior efforts focus on protecting the privacy of user prompts, relatively few studies consider privacy risks from the enterprise data perspective. Hence, this paper develops a probabilistic framework for analyzing privacy leakage in AI agents based on differential privacy. We model response generation as a stochastic mechanism that maps prompts and datasets to distributions over token sequences. Within this framework, we introduce token-level and message-level differential privacy and derive privacy bounds that relate privacy leakage to generation parameters such as temperature and message length. We further formulate a privacy-utility design problem that characterizes optimal temperature selection.

#26 Capability-Priced Micro-Markets: A Micro-Economic Framework for the Agentic Web over HTTP 402

agent

著者: Ken Huang, Jerry Huang, Mahesh Lambe, Hammad Atta, Yasir Mehmood, Muhammad Zeeshan Baig, Muhammad Aziz Ul Haq, Nadeem Shahzad, Shailja Gupta, Rajesh Ranjan, Rekha Singhal

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.16899

要約:
This paper introduces Capability-Priced Micro-Markets (CPMM), a micro-economic framework designed to enable robust, scalable, and secure commerce among autonomous AI agents on the agentic web. The framework addresses the fundamental challenge of economic coordination in decentralized agent ecosystems, where entities must transact with minimal human oversight. CPMM synthesizes three key technologies into a unified system: MIT originated, Project NANDA infrastructure for cryptographically verifiable, capability-based security and discovery; the HTTP 402 "Payment Required" status code, with modern X402/H402 extensions for efficient, low-cost micropayments; and the Agent Capability Negotiation and Binding Protocol (ACNBP) for secure, multi-step negotiation and commitment. The paper formalizes agent interactions as a repeated bilateral game with incomplete information, demonstrating theoretically that the CPMM mechanism converges to a constrained Radner equilibrium, ensuring efficient outcomes under information asymmetry. A key theoretical contribution is the concept of "privacy elasticity of demand," which is introduced to quantify the trade-off between an agent's information disclosure and the market price of its services. By integrating secure capabilities, micropayment protocols, and formal negotiation mechanisms, CPMM provides a comprehensive, theoretically-grounded solution for creating functional micro-markets for the emergent agentic web.

#27 A Longitudinal Study of Usability in Identity-Based Software Signing

著者: Kelechi G. Kalu, Hieu Tran, Santiago Torres-Arias, Sooyeon Jeong, James C. Davis

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17133

要約:
Identity-based software signing tools aim to make software artifact provenance verifiable while reducing the operational burden of long-lived key management. However, there is limited cross-tool longitudinal evidence about which usability problems arise in practice and how those problems evolve as tools mature. This gap matters because unusable signing and verification workflows can lead to incomplete adoption, misconfiguration, or skipped verification, undermining intended integrity guarantees. We conducted the first mining-software-repositories study of five open-source identity-based signing ecosystems: Sigstore, OpenPubKey, HashiCorp Vault, Keyfactor, and Notary v2. We analyzed approximately 3,900 GitHub issues from Nov. 2021 to Nov. 2025. We coded each issue for the reported usability concern and the implicated architectural component, and compared patterns across tools and over time. Across ecosystems, reported concerns concentrate in verification workflows, policy and configuration surfaces, and integration boundaries. Longitudinal Poisson trend analysis shows substantial declines in reported issues for most ecosystems. However, across usability themes, workflow- and documentation-related concerns decline unevenly across tools and concern types, and verification workflows and configuration surfaces remain persistent friction points. These results indicate that identity-based signing reduces some usability burdens while relocating complexity to verification semantics, policy configuration, and deployment integration. Designing future signing ecosystems therefore requires treating verification semantics and release workflows as first-class usability targets rather than peripheral integration concerns.

#28 Revisiting Vulnerability Patch Identification on Data in the Wild

著者: Ivana Clairine Irsan, Ratnadira Widyasari, Ting Zhang, Huihui Huang, Ferdian Thung, Yikun Li, Lwin Khin Shar, Eng Lieh Ouh, Hong Jin Kang, David Lo

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17266

要約:
Attacks can exploit zero-day or one-day vulnerabilities that are not publicly disclosed. To detect these vulnerabilities, security researchers monitor development activities in open-source repositories to identify unreported security patches. The sheer volume of commits makes this task infeasible to accomplish manually. Consequently, security patch detectors commonly trained and evaluated on security patches linked from vulnerability reports in the National Vulnerability Database (NVD). In this study, we assess the effectiveness of these detectors when applied in-the-wild. Our results show that models trained on NVD-derived data show substantially decreased performance, with decreases in F1-score of up to 90\% when tested on in-the-wild security patches, rendering them impractical for real-world use. An analysis comparing security patches identified in-the-wild and commits linked from NVD reveals that they can be easily distinguished from each other. Security patches associated with NVD have different distribution of commit messages, vulnerability types, and composition of changes. These differences suggest that NVD may be unsuitable as the \textit{sole} source of data for training models to detect security patches. We find that constructing a dataset that combines security patches from NVD data with a small subset of manually identified security patches can improve model robustness.

#29 Rel-Zero: Harnessing Patch-Pair Invariance for Robust Zero-Watermarking Against AI Editing

intellectual property

著者: Pengzhen Chen, Yanwei Liu, Xiaoyan Gu, Xiaojun Chen, Wu Liu, Weiping Wang

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17531

要約:
Recent advancements in diffusion-based image editing pose a significant threat to the authenticity of digital visual content. Traditional embedding-based watermarking methods often introduce perceptible perturbations to maintain robustness, inevitably compromising visual fidelity. Meanwhile, existing zero-watermarking approaches, typically relying on global image features, struggle to withstand sophisticated manipulations. In this work, we uncover a key observation: while individual image patches undergo substantial alterations during AI-based editing, the relational distance between patch pairs remains relatively invariant. Leveraging this property, we propose Relational Zero-Watermarking (Rel-Zero), a novel framework that requires no modification to the original image but derives a unique zero-watermark from these editing-invariant patch relations. By grounding the watermark in intrinsic structural consistency rather than absolute appearance, Rel-Zero provides a non-invasive yet resilient mechanism for content authentication. Extensive experiments demonstrate that Rel-Zero achieves substantially improved robustness across diverse editing models and manipulations compared to prior zero-watermarking approaches.

#30 ARES: Scalable and Practical Gradient Inversion Attack in Federated Learning through Activation Recovery

著者: Zirui Gong, Leo Yu Zhang, Yanjun Zhang, Viet Vo, Tianqing Zhu, Shirui Pan, Cong Wang

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17623

要約:
Federated Learning (FL) enables collaborative model training by sharing model updates instead of raw data, aiming to protect user privacy. However, recent studies reveal that these shared updates can inadvertently leak sensitive training data through gradient inversion attacks (GIAs). Among them, active GIAs are particularly powerful, enabling high-fidelity reconstruction of individual samples even under large batch sizes. Nevertheless, existing approaches often require architectural modifications, which limit their practical applicability. In this work, we bridge this gap by introducing the Activation REcovery via Sparse inversion (ARES) attack, an active GIA designed to reconstruct training samples from large training batches without requiring architectural modifications. Specifically, we formulate the recovery problem as a noisy sparse recovery task and solve it using the generalized Least Absolute Shrinkage and Selection Operator (Lasso). To extend the attack to multi-sample recovery, ARES incorporates the imprint method to disentangle activations, enabling scalable per-sample reconstruction. We further establish the expected recovery rate and derive an upper bound on the reconstruction error, providing theoretical guarantees for the ARES attack. Extensive experiments on CNNs and MLPs demonstrate that ARES achieves high-fidelity reconstruction across diverse datasets, significantly outperforming prior GIAs under large batch sizes and realistic FL settings. Our results highlight that intermediate activations pose a serious and underestimated privacy risk in FL, underscoring the urgent need for stronger defenses.

#31 Beyond Static Pattern Matching? Rethinking Automatic Cryptographic API Misuse Detection in the Era of LLMs

著者: Yifan Xia, Zichen Xie, Peiyu Liu, Kangjie Lu, Yan Liu, Wenhai Wang, Shouling Ji

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2407.16576

要約:
While the automated detection of cryptographic API misuses has progressed significantly, its precision diminishes for intricate targets due to the reliance on manually defined patterns. Large Language Models (LLMs) offer a promising context-aware understanding to address this shortcoming, yet the stochastic nature and the hallucination issue pose challenges to their applications in precise security analysis. This paper presents the first systematic study to explore LLMs' application in cryptographic API misuse detection. Our findings are noteworthy: The instability of directly applying LLMs results in over half of the initial reports being false positives. Despite this, the reliability of LLM-based detection could be significantly enhanced by aligning detection scopes with realistic scenarios and employing a novel code and analysis validation technique, achieving a nearly 90% detection recall. This improvement substantially surpasses traditional methods and leads to the discovery of previously unknown vulnerabilities in established benchmarks. Nevertheless, we identify recurring failure patterns that illustrate current LLMs' blind spots. Leveraging these findings, we deploy an LLM-based detection system and uncover 63 new vulnerabilities (47 confirmed, 7 already fixed) in open-source Java and Python repositories, including prominent projects like Apache.

#32 CellSecInspector: Safeguarding Cellular Networks via Automated Security Analysis on Specifications

著者: Ke Xie, Xingyi Zhao, Min-Yue Chen, Yu-An Chen, Yiwen Hu, Munshi Saifuzzaman, Wen Li, Shuhan Yuan, Guan-Hua Tu, Tian Xie

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2512.24682

要約:
The complexity, interdependence, and rapid evolution of 3GPP specifications present fundamental challenges for ensuring the security of modern cellular networks. Manual reviews and existing automated approaches, which often depend on rule-based parsing or small sets of manually crafted security requirements, fail to capture deep semantic dependencies, cross-sentence/clause relationships, and evolving specification behaviors. In this work, we present CellSecInspector, an automated framework for security analysis of 3GPP specifications. CellSecInspector extracts structured state-condition-action (SCA) representations, models mobile network procedures with comprehensive function chains, systematically validates them against 9 foundational security properties under 4 adversarial scenarios, and automatically generates test cases. This end-to-end approach enables the automated discovery of vulnerabilities without relying on manually predefined security requirements or rules. Applying CellSecInspector to the well-studied 5G and 4G NAS and RRC specifications and selected sections of TS 23.501 and TS 24.229, it discovers 43 vulnerabilities, 7 of which are previously unreported. Our findings show that CellSecInspector is a scalable, adaptive, and effective solution to assess 3GPP specifications for safeguarding operational and next-generation cellular networks.

#33 Beware Untrusted Simulators -- Reward-Free Backdoor Attacks in Reinforcement Learning

backdoor

著者: Ethan Rathbun, Wo Wei Lin, Alina Oprea, Christopher Amato

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2602.05089

要約:
Simulated environments are a key piece in the success of Reinforcement Learning (RL), allowing practitioners and researchers to train decision making agents without running expensive experiments on real hardware. Simulators remain a security blind spot, however, enabling adversarial developers to alter the dynamics of their released simulators for malicious purposes. Therefore, in this work we highlight a novel threat, demonstrating how simulator dynamics can be exploited to stealthily implant action-level backdoors into RL agents. The backdoor then allows an adversary to reliably activate targeted actions in an agent upon observing a predefined ``trigger'', leading to potentially dangerous consequences. Traditional backdoor attacks are limited in their strong threat models, assuming the adversary has near full control over an agent's training pipeline, enabling them to both alter and observe agent's rewards. As these assumptions are infeasible to implement within a simulator, we propose a new attack ``Daze'' which is able to reliably and stealthily implant backdoors into RL agents trained for real world tasks without altering or even observing their rewards. We provide formal proof of Daze's effectiveness in guaranteeing attack success across general RL tasks along with extensive empirical evaluations on both discrete and continuous action space domains. We additionally provide the first example of RL backdoor attacks transferring to real, robotic hardware. These developments motivate further research into securing all components of the RL training pipeline to prevent malicious attacks.

#34 Primitive-Root Determinant Densities over Prime Fields and Implications for PRIM-LWE

著者: Vipin Singh Sehrawat

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.11196

要約:
The PRIM-LWE problem, introduced by Sehrawat, Yeo, and Desmedt (Theoretical Computer Science, 886 (2021)), is a variant of the Learning with Errors problem in which the secret matrix is required to have a primitive-root determinant. The dimension-uniform reduction constant is $c(p)=\inf_{n\ge 1}c_n(p)$, where $c_n(p)$ is the exact density of $n\times n$ matrices over $\mathbb{F}_p$ with primitive-root determinant. Sehrawat, Yeo, and Desmedt asked whether $\inf_{p\text{ prime}} c(p)=0$, observing that an affirmative answer would follow from the conjectural infinitude of primorial primes. We resolve this question unconditionally using only Dirichlet's theorem and Mertens' product formula, entirely bypassing the primorial-prime hypothesis. We further establish the sharp order \[ \min_{p\le x} c(p)\asymp \frac{1}{\log\log x} \qquad (x\to\infty), \] and show that the limiting distribution of $c(p)$ over the primes has support exactly $[0,1/2]$. We have not found this full-support statement in the literature. The law coincides with the classical shifted-prime distribution of $\varphi(p-1)/(p-1)$ via a transport lemma and is moreover continuous and purely singular. We also derive explicit lower bounds on $c(q)$ for primes of cryptographic interest, parameterized solely by the number of distinct prime factors of $q-1$. As a simple conservative explicit bound, for any prime $q>2^{30}$ the expected overhead $1/c(q)$ is at most $1.79\log q$. On the other hand, our results show that the worst-case overhead among primes $p\le x$ is of order $\Theta(\log\log x)$, and in particular $1/c(q)=O(\log\log q)$ pointwise.

#35 Resource Consumption Threats in Large Language Models

著者: Yuanhe Zhang, Xinyue Wang, Zhican Chen, Weiliu Wang, Zilu Zhang, Zhengshuo Gong, Zhenhong Zhou, Kun Wang, Li Sun, Yang Liu, Sen Su

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.16068

要約:
Given limited and costly computational infrastructure, resource efficiency is a key requirement for large language models (LLMs). Efficient LLMs increase service capacity for providers and reduce latency and API costs for users. Recent resource consumption threats induce excessive generation, degrading model efficiency and harming both service availability and economic sustainability. This survey presents a systematic review of threats to resource consumption in LLMs. We further establish a unified view of this emerging area by clarifying its scope and examining the problem along the full pipeline from threat induction to mechanism understanding and mitigation. Our goal is to clarify the problem landscape for this emerging area, thereby providing a clearer foundation for characterization and mitigation.

#36 Signal in the Noise: Polysemantic Interference Transfers and Predicts Cross-Model Influence

著者: Bofan Gong, Shiyang Lai, James Evans, Dawn Song

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2505.11611

要約:
Polysemanticity is pervasive in language models and remains a major challenge for interpretation and model behavioral control. Leveraging sparse autoencoders (SAEs), we map the polysemantic topology of two small models (Pythia-70M and GPT-2-Small) to identify SAE feature pairs that are semantically unrelated yet exhibit interference within models. We intervene at four foci (prompt, token, feature, neuron) and measure induced shifts in the next-token prediction distribution, uncovering polysemantic structures that expose a systematic vulnerability in these models. Critically, interventions distilled from counterintuitive interference patterns shared by two small models transfer reliably to larger instruction-tuned models (Llama-3.1-8B/70B-Instruct and Gemma-2-9B-Instruct), yielding predictable behavioral shifts without access to model internals. These findings challenge the view that polysemanticity is purely stochastic, demonstrating instead that interference structures generalize across scale and family. Such generalization suggests a convergent, higher-order organization of internal representations, which is only weakly aligned with intuition and structured by latent regularities, offering new possibilities for both black-box control and theoretical insight into human and artificial cognition.

#37 Semantics-Aligned, Curriculum-Driven, and Reasoning-Enhanced Vulnerability Repair Framework

著者: Chengran Yang, Ting Zhang, Jinfeng Jiang, Xin Zhou, Haoye Tian, Mingzhe Du, Jieke Shi, Junkai Chen, Yikun Li, Eng Lieh Ouh, Lwin Khin Shar, David Lo

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2510.01002

要約:
Current learning-based Automated Vulnerability Repair (AVR) approaches, while promising, often fail to generalize effectively in real-world scenarios. Our diagnostic analysis reveals three fundamental weaknesses in state-of-the-art AVR approaches: (1) limited cross-repository generalization, with performance drops on unseen codebases; (2) inability to capture long-range dependencies, causing a performance degradation on complex, multi-hunk repairs; and (3) over-reliance on superficial lexical patterns, leading to significant performance drops on vulnerabilities with minor syntactic variations like variable renaming. To address these limitations, we propose SeCuRepair, a semantics-aligned, curriculum-driven, and reasoning-enhanced framework for vulnerability repair. At its core, SeCuRepair adopts a reason-then-edit paradigm, requiring the model to articulate why and how a vulnerability should be fixed before generating the patch. This explicit reasoning enforces a genuine understanding of repair logic rather than superficial memorization of lexical patterns. SeCuRepair also moves beyond traditional supervised fine-tuning and employs semantics-aware reinforcement learning, rewarding patches for their syntactic and semantic alignment with the oracle patch rather than mere token overlap. Complementing this, a difficulty-aware curriculum progressively trains the model, starting with simple fixes and advancing to complex, multi-hunk coordinated edits. We evaluate SeCuRepair on strict, repository-level splits of BigVul and newly crafted PrimeVul_AVR datasets. SeCuRepair significantly outperforms all baselines, surpassing the best-performing baselines by 34.52% on BigVul and 31.52% on PrimeVul\textsubscript{AVR} in terms of CodeBLEU, respectively. Comprehensive ablation studies further confirm that each component of our framework contributes to its final performance.

#38 Efficient LLM Safety Evaluation through Multi-Agent Debate

agent

著者: Dachuan Lin, Guobin Shen, Zihao Yang, Tianrong Liu, Dongcheng Zhao, Yi Zeng

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2511.06396

要約:
Safety evaluation of large language models (LLMs) increasingly relies on LLM-as-a-judge pipelines, but strong judges can still be expensive to use at scale. We study whether structured multi-agent debate can improve judge reliability while keeping backbone size and cost modest. To do so, we introduce HAJailBench, a human-annotated jailbreak benchmark with 11,100 labeled interactions spanning diverse attack methods and target models, and we pair it with a Multi-Agent Judge framework in which critic, defender, and judge agents debate under a shared safety rubric. On HAJailBench, the framework improves over matched small-model prompt baselines and prior multi-agent judges, while remaining more economical than GPT-4o under the evaluated pricing snapshot. Ablation results further show that a small number of debate rounds is sufficient to capture most of the gain. Together, these results support structured, value-aligned debate as a practical design for scalable LLM safety evaluation.

#39 From Chat Control to Robot Control: Implications of the Chat Control Proposal for Human-Robot Interaction

著者: Neziha Akalin, Alberto Giaretta

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2601.02205

要約:
This paper explores how a recent European Union proposal, the so-called Chat Control, which creates regulatory incentives for providers to implement content detection and communication scanning, could transform the foundations of human-robot interaction (HRI). As robots increasingly act as interpersonal communication channels in care, education, and telepresence, they convey not only speech but also gesture, emotion, and contextual cues. We argue that extending digital surveillance laws to such embodied systems would entail continuous monitoring, embedding observation into the very design of everyday robots. This regulation blurs the line between protection and control, turning companions into potential informants. At the same time, monitoring mechanisms that undermine end-to-end encryption function as de facto backdoors, expanding the attack surface and allowing adversaries to exploit legally induced monitoring infrastructures. This creates a paradox of safety through insecurity: systems introduced to protect users may instead compromise their privacy, autonomy, and trust. This work does not aim to predict the future, but to raise awareness and help prevent certain futures from materialising.

#40 Bodhi VLM: Privacy-Alignment Modeling for Hierarchical Visual Representations in Vision Backbones and VLM Encoders via Bottom-Up and Top-Down Feature Search

privacy

著者: Bo Ma, Wei Qi Yan, Jinsong Wu

公開日: Thu, 19 Mar 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.13728

要約:
Learning systems that preserve privacy often inject noise into hierarchical visual representations; a central challenge is to \emph{model} how such perturbations align with a declared privacy budget in a way that is interpretable and applicable across vision backbones and vision--language models (VLMs). We propose \emph{Bodhi VLM}, a \emph{privacy-alignment modeling} framework for \emph{hierarchical neural representations}: it (1) links sensitive concepts to layer-wise grouping via NCP and MDAV-based clustering; (2) locates sensitive feature regions using bottom-up (BUA) and top-down (TDA) strategies over multi-scale representations (e.g., feature pyramids or vision-encoder layers); and (3) uses an Expectation-Maximization Privacy Assessment (EMPA) module to produce an interpretable \emph{budget-alignment signal} by comparing the fitted sensitive-feature distribution to an evaluator-specified reference (e.g., Laplace or Gaussian with scale $c/\epsilon$). The output is reference-relative and is \emph{not} a formal differential-privacy estimator. We formalize BUA/TDA over hierarchical feature structures and validate the framework on object detectors (YOLO, PPDPTS, DETR) and on the \emph{visual encoders} of VLMs (CLIP, LLaVA, BLIP). BUA and TDA yield comparable deviation trends; EMPA provides a stable alignment signal under the reported setups. We compare with generic discrepancy baselines (Chi-square, K-L, MMD) and with task-relevant baselines (MomentReg, NoiseMLE, Wass-1). Results are reported as mean$\pm$std over multiple seeds with confidence intervals in the supplementary materials. This work contributes a learnable, interpretable modeling perspective for privacy-aligned hierarchical representations rather than a post hoc audit only. Source code: \href{https://github.com/mabo1215/bodhi-vlm.git}{Bodhi-VLM GitHub repository}

cs.CR updates on arXiv.org

📋 論文タイトル一覧