arXiv論文一覧 - cs.CR updates on arXiv.org

#1 Can SOC Operators Explain their Decisions while Triaging Alarms? A Real-World Study

著者: Jessica Moosmann, Irdin Pekaric, Giovanni Apruzzese

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.22001

要約:
Security Operations Centers (SOCs) are pivotal in modern enterprises. Tasked to monitor complex network environments constantly under attack, SOCs can be active 24/7 and can include hundreds of operators supported by state-of-the-art technologies. Abundant research has studied the internal processes of SOCs, highlighting their pros and cons, as well as the challenges faced by SOC analysts -- such as dealing with the overwhelming number of false alarms triggered by automated security mechanisms. In this context, we wonder: given that "someone" must triage the alarms, and that such triaging must be grounded on established knowledge or evidence-based reasoning, can SOC employees justify why a certain decision was taken while triaging alarms? Answering such a research question (RQ) can better guide future efforts. We hence tackle this RQs. First, via a systematic literature review across 257 research documents, we provide evidence that such RQ received limited attention so far. Then, we partner-up with a real-world SOC and carry out a field study (n=12) with SOC employees. We show them real alarms raised in their SOC, and inquire whether such alarms are indicative of true security problems or not. Then, we ask to explain their decision. We found that while most analysts were able to separate "true from false" alarms (the decision was correct in 83% of the cases), a correct justification was hardly provided (only 39% of the provided explanations reflected the actual root cause). Ultimately, our results highlight the need for decision-support systems that help SOC analysts not only make the right call -- but also understand and articulate why it is right.

#2 Who Audits the Auditor? Tamper-Proof Fraud Detection with Blockchain-Anchored Explainable ML

著者: Zhaohui Wang

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.22096

要約:
In enterprise fraud detection, model accuracy alone is insufficient when insiders can tamper with audit logs or bypass approval workflows. Real-world incidents show that fraud often persists not because detection algorithms fail, but because the audit trail itself is controllable by privileged operators. This exposes a fundamental trust gap: *who audits the auditor?* We present a tamper-evident fraud detection system that anchors both ML predictions and workflow execution to an immutable blockchain ledger. Rather than using blockchain as passive storage, we enforce the entire approval process through smart contracts, ensuring that every transaction, prediction, and explanation is atomically recorded and cannot be retroactively modified. Our detection module achieves competitive accuracy (F1 = 0.895, PR-AUC = 0.974) while providing cryptographically verifiable decision trails that support regulatory auditability requirements (e.g., GDPR Article 22). System evaluation shows sub-25 ms inference latency and economically viable deployment on Layer-2 networks at under \$0.01 per transaction (validated against PolygonScan data), supporting enterprise-scale workloads of 10,000+ monthly payments.

#3 Sovereign Agentic Loops: Decoupling AI Reasoning from Execution in Real-World Systems

agent

著者: Jun He, Deying Yu

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.22136

要約:
Large language model (LLM) agents increasingly issue API calls that mutate real systems, yet many current architectures pass stochastic model outputs directly to execution layers. We argue that this coupling creates a safety risk because model correctness, context awareness, and alignment cannot be assumed at execution time. We introduce Sovereign Agentic Loops (SAL), a control-plane architecture in which models emit structured intents with justifications, and the control plane validates those intents against true system state and policy before execution. SAL combines an obfuscation membrane, which limits model access to identity-sensitive state, with a cryptographically linked Evidence Chain for auditability and replay. We formalize SAL and show that, under the stated assumptions, it provides policy-bounded execution, identity isolation, and deterministic replay. In an OpenKedge prototype for cloud infrastructure, SAL blocks 93% of unsafe intents at the policy layer, rejects the remaining 7% via consistency checks, prevents unsafe executions in our benchmark, and adds 12.4 ms median latency.

#4 PrivSTRUCT: Untangling Data Purpose Compliance of Privacy Policies in Google Play Store

privacy

著者: Bhanuka Silva, Anirban Mahanti, Aruna Seneviratne, Suranga Senevirante

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.22157

要約:
Existing research typically treats privacy policies as flat, uniform text, extracting information without regard for the document's logical hierarchy. Disregard for structural cues of section headings designed to guide the reader, often leads automated methods to entangle distinct data practices, particularly when linking sensitive data items to their specific purposes. To address this, we introduce PrivSTRUCT, a novel and systematic encoder and decoder combined framework that to untangle complex privacy disclosures. Benchmarking against the state-of-the-art tool PoliGrapher reveals that PrivSTRUCT robustly extracts more than x2 the number of data item and purpose excerpts while retaining developer-defined structural cues. By applying PrivSTRUCT to a large-scale dataset of 3,756 Android apps, we uncover a critical transparency gap: the probability of developers overstating a data purpose is 20.4% higher for first-party collection and 9.7% higher for third-party sharing when they rely on globally defined purposes rather than specific, locally scoped disclosures. Alarmingly, we find that sensitive third-party data flows such as sharing financial data for analytics are frequently diluted and entangled into generic or unrelated categories, highlighting a persistent failure in the current purpose disclosure landscape.

#5 FixV2W: Correcting Invalid CVE-CWE Mappings with Knowledge Graph Embeddings

著者: Sevval Simsek, Varsha Athreya, David Starobinski

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.22176

要約:
Accurate mapping between Common Vulnerabilities and Exposures (CVE) and Common Weakness Enumeration (CWE) entries is critical for effective vulnerability management and risk assessment. However, public databases, such as the National Vulnerability Database (NVD), suffer from inconsistent and incomplete CVE to CWE mappings, complicating automated analysis and remediation. We introduce FixV2W, a lightweight approach that leverages knowledge graph embeddings and longitudinal trends to improve mapping accuracy of the NVD. FixV2W systematically analyzes historical remapping patterns and leverages hierarchical relationships within NVD and CWE data to predict more precise CWE mappings for vulnerabilities linked to Prohibited or Discouraged categories. We run extensive experimental evaluation of FixV2W, based on test data set collected between August 2021 and December 2024. Considering the Top 10 ranked predictions, the results show that FixV2W predicts the correct CWE mappings for 69% of exploited vulnerabilities that had invalid CWEs before they were exploited. We also show that FixV2W significantly improves the performance of ML models relying on NVD data. For instance, for a model geared at uncovering unknown CVE-CWE mappings, FixV2W improves the Mean Reciprocal Rank (MRR) from 0.174 to 0.608. These results show that FixV2W is a promising approach to identify and thwart emerging threats.

#6 Behavioral Canaries: Auditing Private Retrieved Context Usage in RL Fine-Tuning

privacy

著者: Chaoran Chen, Dayu Yuan, Peter Kairouz

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.22191

要約:
In agentic workflows, LLMs frequently process retrieved contexts that are legally protected from further training. However, auditors currently lack a reliable way to verify if a provider has violated the terms of service by incorporating these data into post-training, especially through Reinforcement Learning (RL). While standard auditing relies on verbatim memorization and membership inference, these methods are ineffective for RL-trained models, as RL primarily influences a model's behavioral style rather than the retention of specific facts. To bridge this gap, we introduce Behavioral Canaries, a new auditing mechanism for RLFT pipelines. The framework instruments preference data by pairing document triggers with feedback that rewards a distinctive stylistic response, inducing a latent trigger-conditioned preference if such data are used in training. Empirical results show that these behavioral signals enable detection of unauthorized document-conditioned training, achieving a 67% detection rate at a 10% false-positive rate (AUROC = 0.756) at a 1% canary injection rate. More broadly, our results establish behavioral canaries as a new auditing mechanism for RLFT pipelines, enabling auditors to test for training-time influence even when such influence manifests as distributional behavioral change rather than memorization.

#7 Train in Vain: Functionality-Preserving Poisoning to Prevent Unauthorized Use of Code Datasets

backdoor

著者: Yuan Xiao, Jiaming Wang, Yuchen Chen, Wei Song, Jun Sun, Shiqing Ma, Yanzhou Mu, Juan Zhai, Chunrong Fang, Jin Song Dong, Zhenyu Chen

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.22291

要約:
The widespread availability of large-scale code datasets has accelerated the development of code large language models (CodeLLMs), raising concerns about unauthorized dataset usage. Dataset poisoning offers a proactive defense by reducing the utility of such unauthorized training. However, existing poisoning methods often require full dataset poisoning and introduce transformations that break code compilability. In this paper, we introduce FunPoison, a functionality-preserving poisoning approach that injects short, compilable weak-use fragments into executed code paths. FunPoison leverages reusable statement-level templates with automatic repair and conservative safety checking to ensure side-effect freedom, while a type-aware synthesis module suppresses static analysis warnings and enhances stealth. Extensive experiments show that FunPoison achieves effective poisoning by contaminating only 10% of the dataset, while maintaining 100% compilability and functional correctness, and remains robust against various advanced code sanitization techniques.

#8 Resource-Aware Layered Intrusion Detection Allocation Model

著者: Ioan P\u{a}durean, B\'ela Genge, Roland Bolboac\u{a}

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.22304

要約:
This paper proposes a resource-aware allocation model for layered intrusion detection in het erogeneous networks. Monitoring traffic at higher protocol layers improves the ability to detect sophisticated attacks, but it also increases computational and storage costs. The problem is formu lated as an integer linear program that assigns a single monitoring depth, ranging from Ethernet to the application layer, to each device, while accounting for device importance, attack probability, layer-dependent detection rates, and per-layer monitoring costs. The model further enforces a global resource budget, a minimum monitoring level for critical devices, and maximum-feasibility limits for constrained devices such as simple IoT sensors. The formulation is solved with the SCIP optimization framework on a small heterogeneous network of six devices, and the resulting allocation illustrates how the model concentrates monitoring effort on important and high-risk devices while respecting feasibility and budget constraints.

#9 Introducing the Cyber-Physical Data Flow Diagram to Improve Threat Modelling of Internet of Things Devices

著者: Simon Liebl, Ian Ferguson, Andreas A{\ss}muth, Natalie Coull, George R. S. Weir

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.22307

要約:
A growing number of Internet of Things (IoT) devices are used across consumer, medical, and industrial domains. They interact with their environment through sensors and actuators and connect to networks such as the Internet. Because sensors may collect sensitive data and actuators can trigger physical actions, security, privacy, and safety are major challenges. Threat modelling can help identify risks, but established IT-focused methods transfer to the IoT only to a limited extent. In this paper, a new modelling technique specifically for IoT devices called Cyber-Physical Data Flow Diagram (CPDFD) is proposed that also allows modelling of hardware with the aim to support manufacturers in identifying threats and developing countermeasures. The technique was examined through an experimental study and a survey with interviews. The results suggest that numerous other attack scenarios can be found through the modelling technique, improving the identification of threats to IoT devices.

#10 Automation-Exploit: A Multi-Agent LLM Framework for Adaptive Offensive Security with Digital Twin-Based Risk-Mitigated Exploitation

agent

著者: Biagio Andreucci, Arcangelo Castiglione

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.22427

要約:
The offensive security landscape is highly fragmented: enterprise platforms avoid memory-corruption vulnerabilities due to Denial of Service (DoS) risks, Automatic Exploit Generation (AEG) systems suffer from semantic blindness, and Large Language Model (LLM) agents face safety alignment filters and "Live Fire" execution hazards. We introduce Automation-Exploit, a fully autonomous Multi-Agent System (MAS) framework designed for adaptive offensive security in complex black-box scenarios. It bridges the abstraction gap between reconnaissance and exploitation by autonomously exfiltrating executables and contextual intelligence across multiple protocols, using this data to fuel both logical and binary attack chains. The framework introduces an adaptive safety architecture to mitigate DoS risks. While it natively resolves logical and web-based vulnerabilities, it employs a conditional isomorphic validation for high-risk memory-corruption flaws: if the target binary is successfully exfiltrated, it dynamically instantiates a cross-platform digital twin. By enforcing strict state synchronization, including libc alignment and runtime file descriptor hooking, potentially destructive payloads are iteratively debugged in an isolated replica. This enables a highly risk-mitigated "one-shot" execution on the physical target. Empirical evaluations across eight scenarios, including undocumented zero-day environments to rule out LLM data contamination, validate the framework's architectural resilience, demonstrating its ability to prevent "live fire" crashes and execute risk-mitigated compromises on actual targets.

#11 Horizontal SCA Attacks on Binary kP Algorithms using Chevallier-Mames Atomic Blocks

著者: Gerald Isheanesu Matungamire, Alkistis Aikaterini Sigourou, Gerrit Schrock, Zoya Dyka, Peter Langendoerfer, Ievgen Kabin

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.22429

要約:
Scalar multiplication kP is the operation most frequently targeted in Elliptic Curve (EC) cryptosystems. To protect against single-trace Side-Channel Analysis (SCA) attacks, the atomicity principle and various atomic block patterns have been proposed in the past. In this work we use our software and hardware implementations to demonstrate that binary right-to left and left-to-right kP algorithms, when implemented with Chevallier-Mames atomic block patterns, are still vulnerable to single-trace SCA attacks. The vulnerability remains true for the left-to-right kP algorithm with projective coordinate randomization.

#12 SSG: Logit-Balanced Vocabulary Partitioning for LLM Watermarking

intellectual property

著者: Chenxi Gu, Xiaoning Du, John Grundy

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.22438

要約:
Watermarking has emerged as a promising technique for tracing the authorship of content generated by large language models (LLMs). Among existing approaches, the KGW scheme is particularly attractive due to its versatility, efficiency, and effectiveness in natural language generation. However, KGW's effectiveness degrades significantly under low-entropy settings such as code generation and mathematical reasoning. A crucial step in the KGW method is random vocabulary partitioning, which enables adjustments to token selection based on specific preferences. Our study revealed that the next-token probability distribution plays an critical role in determining how much, or even whether, we can modify token selection and, consequently, the effectiveness of watermarking. We refer to this characteristic, associated with the probability distribution of each token prediction, as \emph{watermark strength.} In cases of random vocabulary partitioning, the lower bound of watermark strength is dictated by the next-token probability distribution. However, we found that, by redesigning the vocabulary partitioning algorithm, we can potentially raise this lower bound. In this paper, we propose SSG (\textbf{S}ort-then-\textbf{S}plit by \textbf{G}roups), a method that partitions the vocabulary into two logit-balanced subsets. This design lifts the lower bound of watermark strength for each token prediction, thereby improving watermark detectability. Experiments on code generation and mathematical reasoning datasets demonstrate the effectiveness of SSG.

#13 Information-Theoretic Authenticated PIR: From PIR-RV To APIR

著者: Pengzhen Ke, Yuxuan Qin, Liang Feng Zhang

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.22505

要約:
Private Information Retrieval (PIR) allows clients to retrieve database entries without leaking retrieval indices, yet malicious servers seriously compromise retrieval correctness. Existing Authenticated PIR (APIR) schemes resist selective-failure attacks but rely on computational hardness assumptions. In contrast, information-theoretic PIR with Result Verification (itPIR-RV) achieves integrity without computational assumptions, yet only provides relaxed query privacy with no defense against selective-failure attacks. This paper focuses on unconditionally secure information-theoretic APIR (itAPIR) constructions. We propose the rigorous information-theoretic security definition for itAPIR with statistical privacy against selective-failure attacks and integrity as core properties, formalize the hierarchical relation between itAPIR and itPIR-RV as a relaxed variant with identical integrity but basic query privacy, and prove a conversion theorem that valid itPIR-RV schemes can be directly upgraded to secure itAPIR with no extra overhead. Our work bridges the theoretical gap, simplifies itAPIR design, and enables quantum-resistant PIR in malicious server environments.

#14 ArmSSL: Adversarial Robust Black-Box Watermarking for Self-Supervised Learning Pre-trained Encoders

intellectual property

著者: Yongqi Jiang, Yansong Gao, Boyu Kuang, Chunyi Zhou, Anmin Fu, Liquan Chen

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.22550

要約:
Self-supervised learning (SSL) encoders are invaluable intellectual property (IP). However, no existing SSL watermarking for IP protection can concurrently satisfy the following two practical requirements: (1) provide ownership verification capability under black-box suspect model access once the stolen encoders are used in downstream tasks; (2) be robust under adversarial watermark detection or removal, because the watermark samples form a distinguishable out-of-distribution (OOD) cluster. We propose ArmSSL, an SSL watermarking framework that assures black-box verifiability and adversarial robustness while preserving utility. For verification, we introduce paired discrepancy enlargement, enforcing feature-space orthogonality between the clean and its watermark counterpart to produce a reliable verification signal in black-box against the suspect model. For adversarial robustness, ArmSSL integrates latent representation entanglement and distribution alignment to suppress the OOD clustering. The former entangles watermark representations with clean representations (i.e., from non-source-class) to avoid forming a dense cluster of watermark samples, while the latter minimizes the distributional discrepancy between watermark and clean representations, thereby disguising watermark samples as natural in-distribution data. For utility, a reference-guided watermark tuning strategy is designed to allow the watermark to be learned as a small side task without affecting the main task by aligning the watermarked encoder's outputs with those of the original clean encoder on normal data. Extensive experiments across five mainstream SSL frameworks and nine benchmark datasets, along with end-to-end comparisons with SOTAs, demonstrate that ArmSSL achieves superior ownership verification, negligible utility degradation, and strong robustness against various adversarial detection and removal.

#15 Adversarial Co-Evolution of Malware and Detection Models: A Bilevel Optimization Perspective

著者: Olha Jure\v{c}kov\'a, Martin Jure\v{c}ek, Matou\v{s} Koz\'ak, R\'obert L\'orencz

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.22569

要約:
Machine learning-based malware detectors are increasingly vulnerable to adversarial examples. Traditional defenses, such as one-shot adversarial training, often fail against adaptive attackers who use reinforcement learning to bypass detection. This paper proposes a robust defense framework based on bilevel optimization, explicitly modeling the strategic interaction between a defender and an attacker as an adversarial co-evolutionary process. We evaluate our approach using the MAB-malware framework against three distinct malware families: Mokes, Strab, and DCRat. Our experimental results demonstrate that while standard classifiers and basic adversarial retraining often remain vulnerable, showing evasion rates as high as 90 %, the proposed bilevel optimization approach consistently achieves near-total immunity, reducing evasion rates to 0 - 1.89 %. Furthermore, the iterative framework significantly increases the attacker's query complexity, raising the average cost of successful evasion by up to two orders of magnitude. These findings suggest that modeling the iterative cycle of attack and defense through bilevel optimization is essential for developing resilient malware detection systems capable of withstanding evolving adversarial threats.

#16 PASS: A Provenanced Access Subaccount System for Blockchain Wallets

著者: Jay Yu, Shunfan Zhou, Hang Yin, Brian Seong

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.22602

要約:
Blockchain wallets conventionally follow an ownership model where possession of a private key grants unilateral control. However, this assumption is brittle for emerging settings such as AI agent wallets, organizational custody, and enterprise payroll, where multiple actors must coordinate without exposing secrets or leaking internal activity. We present PASS, a Provenanced Access Subaccount System that replaces role-based or identity-based control with provenance-based control: assets can only be used by subaccounts that can trace custody back to a valid deposit. A simple Inbox-Outbox mechanism ensures all external actions have verifiable lineage, while internal transfers remain private and indistinguishable from ordinary EOAs. We formalize PASS in Lean 4 and prove core invariants, including privacy of internal transfers, asset accessibility, and provenance integrity. We implement a prototype with enclave backends on AWS Nitro Enclaves and dstack Intel TDX, integrate with WalletConnect, and benchmark throughput across wallet operations. These results show that provenance-based wallets are both implementable and efficient. PASS bridges today's gap between strict self-custody and flexible shared access, advancing the design space for practical, privacy-preserving custody.

#17 Detecting Concept Drift in Evolving Malware Families Using Rule-Based Classifier Representations

著者: Tom\'a\v{s} Kaln\'y, Martin Jure\v{c}ek, Mark Stamp

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.22629

要約:
This work proposes a structural approach to concept drift detection in malware classification using decision tree rulesets. Classifiers are trained across temporal windows on the EMBER2024 dataset, and drift is quantified by comparing extracted rule representations using feature importance, prediction agreement, activation stability, and coverage metrics. These metrics are correlated with both accuracy degradation and data distribution shift as complementary drift indicators. The approach is evaluated across six malware families using fixed-interval and clustering-based windowing in family-vs-benign and family-vs-family settings, and compared against RIPPER and Transcendent baselines. Results show that fixed two-month windowing with feature-level Pearson correlation is the most reliable configuration, being the only one where all family pairs produce positive drift-accuracy correlations. The methods are complementary - no single approach dominates across all pairs.

#18 Adversarial Malware Generation in Linux ELF Binaries via Semantic-Preserving Transformations

著者: Luk\'a\v{s} Hrdonka, Martin Jure\v{c}ek

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.22639

要約:
Malware development and detection have undergone significant changes in recent years as modern concepts, such as machine learning, have been used for both adversarial attacks and defense. Despite intensive research on Windows Portable Executable (PE) files, there is minimal work on Linux Executable and Linkable Format (ELF). In this work, we summarize the academic papers submitted in this field and develop a new adversarial malware generator for the ELF format. Using a variety of metrics, we thoroughly evaluated our generator and achieved an Evasion Rate of 67.74 % while changing the confidence of the malware detector by -0.50 in the mean case for the dataset used. In our approach, we chose MalConv as the target classifier. Using this classifier, we found that the most successful modifications used strings typical of benign files as a data source. We conducted a variety of experiments and concluded that the target classifier appears sensitive to strings at any location within the executable file.

#19 PoLO: Proof-of-Learning and Proof-of-Ownership at Once with Chained Watermarking

intellectual property

著者: Haiyu Deng, Yanna Jiang, Guangsheng Yu, Qin Wang, Xu Wang, Baihe Ma, Wei Ni, Ren Ping Liu

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2505.12296

要約:
Our evaluation shows that PoLO achieves \textbf{99\%} watermark detection accuracy for ownership verification, while preserving data privacy and cutting verification costs to just \textbf{1.5--10\%} of traditional methods. Forging PoLO demands \textbf{1.1--4$\times$} more resources than honest proof generation, with the original proof retaining over \textbf{90\%} detection accuracy even after attacks.

#20 Toward Principled LLM Safety Testing: Solving the Jailbreak Oracle Problem

著者: Shuyi Lin, Anshuman Suri, Alina Oprea, Cheng Tan

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2506.17299

要約:
As large language models (LLMs) become increasingly deployed in safety-critical applications, the lack of systematic methods to assess their vulnerability to jailbreak attacks presents a critical security gap. We introduce the jailbreak oracle problem: given a model, prompt, and decoding strategy, determine whether a jailbreak response can be generated with likelihood exceeding a specified threshold. This formalization enables a principled study of jailbreak vulnerabilities. Answering the jailbreak oracle problem poses significant computational challenges, as the search space grows exponentially with response length. We present Boa, the first system designed for efficiently solving the jailbreak oracle problem. Boa employs a two-phase search strategy: (1) breadth-first sampling to identify easily accessible jailbreaks, followed by (2) depth-first priority search guided by fine-grained safety scores to systematically explore promising yet low-probability paths. Boa enables rigorous security assessments including systematic defense evaluation, standardized comparison of red team attacks, and model certification under extreme adversarial conditions. Code is available at https://github.com/shuyilinn/BOA/tree/mlsys2026ae

#21 Intrinsic Fingerprint of LLMs: Continue Training is NOT All You Need to Steal A Model!

著者: Do-hyeon Yoon, Minsoo Chun, Thomas Allen, Hans M\"uller, Min Wang, Rajesh Sharma

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2507.03014

要約:
Large language models (LLMs) face significant copyright and intellectual property challenges as the cost of training increases and model reuse becomes prevalent. While watermarking techniques have been proposed to protect model ownership, they may not be robust to continue training and development, posing serious threats to model attribution and copyright protection. This work introduces a simple yet effective approach for robust LLM fingerprinting based on intrinsic model characteristics. We discover that the standard deviation distributions of attention parameter matrices across different layers exhibit distinctive patterns that remain stable even after extensive continued training. These parameter distribution signatures serve as robust fingerprints that can reliably identify model lineage and detect potential copyright infringement. Our experimental validation across multiple model families demonstrates the effectiveness of our method for model authentication. Notably, our investigation uncovers evidence that a recently Pangu Pro MoE model released by Huawei is derived from Qwen-2.5 14B model through upcycling techniques rather than training from scratch, highlighting potential cases of model plagiarism, copyright violation, and information fabrication. These findings underscore the critical importance of developing robust fingerprinting methods for protecting intellectual property in large-scale model development and emphasize that deliberate continued training alone is insufficient to completely obscure model origins.

#22 AgentBound: Securing Execution Boundaries of AI Agents

agent

著者: Christoph B\"uhler, Matteo Biagiola, Luca Di Grazia, Guido Salvaneschi

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2510.21236

要約:
Large Language Models (LLMs) have evolved into AI agents that interact with external tools and environments to perform complex tasks. The Model Context Protocol (MCP) has become the de facto standard for connecting agents with such resources, but security has lagged behind: thousands of MCP servers execute with unrestricted access to host systems, creating a broad attack surface. In this paper, we introduce AgentBound, the first access control framework for MCP servers. AgentBound combines a declarative policy mechanism, inspired by the Android permission model, with a policy enforcement engine that contains malicious behavior without requiring MCP server modifications. We build a dataset containing the 296 most popular MCP servers, and show that access control policies can be generated automatically from source code with 80.9% accuracy. We also show that AgentBound blocks the majority of security threats in several malicious MCP servers, and that the policy enforcement engine introduces negligible overhead. Our contributions provide developers and project managers with a foundation for securing MCP servers while maintaining productivity, enabling researchers and tool builders to explore new directions for declarative access control and MCP security.

#23 ThreadFuzzer: Fuzzing Framework for Thread Protocol

著者: Ilja Siro\v{s}, Jakob Heirwegh, Dave Singel\'ee, Bart Preneel

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2511.17283

要約:
With the rapid growth of IoT, secure and efficient mesh networking has become essential. Thread has emerged as a key protocol, widely used in smart-home and commercial systems, and serving as a core transport layer in the Matter standard. This paper presents ThreadFuzzer, the first dedicated fuzzing framework for systematically testing Thread protocol implementations. By manipulating packets at the MLE layer, ThreadFuzzer enables fuzzing of both virtual OpenThread nodes and physical Thread devices. The framework incorporates multiple fuzzing strategies, including Random and Coverage-based fuzzers from CovFuzz, as well as a newly introduced TLV Inserter, designed specifically for TLV-structured MLE messages. These strategies are evaluated on the OpenThread stack using code-coverage and vulnerability-discovery metrics. The evaluation uncovered five previously unknown vulnerabilities in the OpenThread stack, several of which were successfully reproduced on commercial devices that rely on OpenThread. Moreover, ThreadFuzzer was benchmarked against an oracle AFL++ setup using the manually extended OSS-Fuzz harness from OpenThread, demonstrating strong effectiveness. These results demonstrate the practical utility of ThreadFuzzer while highlighting challenges and future directions in the wireless protocol fuzzing research space.

#24 Evaluating Concept Filtering Defenses against Child Sexual Abuse Material Generation by Text-to-Image Models

著者: Ana-Maria Cretu, Klim Kireev, Amro Abdalla, Wisdom Obinna, Raphael Meier, Sarah Adel Bargal, Elissa M. Redmiles, Carmela Troncoso

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2512.05707

要約:
We evaluate the effectiveness of filtering child images from training datasets of text-to-image models to prevent model misuse to create child sexual abuse material (CSAM). First, we capture the complexity of preventing CSAM generation using a game-based security definition. Second, we show that current detection methods cannot remove all children from a dataset. Third, using an ethical proxy for CSAM (a child wearing glasses), we show that even when only a small percentage of child images are left in the training dataset after filtering, there exist prompting strategies that generate a child wearing glasses using only a few more queries than when the model is trained on the unfiltered data. Fine-tuning the filtered model on child images further reduces the additional query overhead. We also show that re-introducing a concept is possible via fine-tuning even if filtering is perfect. Our results show that current child filtering methods offer limited protection to closed-weight models and no protection to open-weight models, while reducing the generality of the model by hindering the generation of child-related concepts or changing their representation. We conclude by outlining challenges in conducting evaluations that establish robust evidence on the impact of concept filtering defenses for CSAM.

#25 AgentMark: Utility-Preserving Behavioral Watermarking for Agents

intellectual propertyagent

著者: Kaibo Huang, Jin Tan, Yukun Wei, Wanling Li, Zipei Zhang, Hui Tian, Zhongliang Yang, Linna Zhou

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2601.03294

要約:
LLM-based agents are increasingly deployed to autonomously solve complex tasks, raising urgent needs for IP protection and regulatory provenance. While content watermarking effectively attributes LLM-generated outputs, it fails to directly identify the high-level planning behaviors (e.g., tool and subgoal choices) that govern multi-step execution. Critically, watermarking at the planning-behavior layer faces unique challenges: minor distributional deviations in decision-making can compound during long-term agent operation, degrading utility, and many agents operate as black boxes that are difficult to intervene in directly. To bridge this gap, we propose AgentMark, a behavioral watermarking framework that embeds multi-bit identifiers into planning decisions while preserving utility. It operates by eliciting an explicit behavior distribution from the agent and applying distribution-preserving conditional sampling, enabling deployment under black-box APIs while remaining compatible with action-layer content watermarking. Experiments across embodied, tool-use, and social environments demonstrate practical multi-bit capacity, robust recovery from partial logs, and utility preservation. The code is available at https://github.com/Tooooa/AgentMark.

#26 Eidolon: A Post-Quantum Signature Scheme Based on k-Colorability in the Age of Graph Neural Networks

著者: Asmaa Cherkaoui, Ramon Flores, Delaram Kahrobaei, Richard Wilson

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2602.02689

要約:
We propose Eidolon, a post-quantum signature scheme grounded on the NP-complete k-colorability problem. Our construction generalizes the Goldreich-Micali-Wigderson zero-knowledge protocol to arbitrary k >= 3, applies the Fiat-Shamir transform, and uses Merkle-tree commitments to compress signatures from O(tn) to O(t log n). We generate hard instances by planting a coloring while aiming to preserve the statistical profile of random graphs. We present an empirical security analysis of such a scheme against both classical solvers (ILP, DSatur) and a custom graph neural network (GNN) attacker. Experiments show that for n >= 60, neither approach is able to recover a valid coloring matching the planted solution, suggesting that well-engineered k-coloring instances can resist the considered classical and learning-based cryptanalytic approaches. These experiments indicate that the constructed instances resist the attacks considered in our evaluation.

#27 Legitimate Overrides in Decentralized Protocols

著者: Oghenekaro Elem, Nimrod Talmon

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2602.12260

要約:
Decentralized protocols claim immutable, rule-based execution, yet many embed emergency mechanisms such as chain-level freezes, protocol pauses, and account quarantines. These overrides are crucial for responding to exploits and systemic failures, but they expose a core tension: when does intervention preserve trust and when is it perceived as illegitimate discretion? With approximately \$10 billion in technical exploit losses potentially addressable by onchain intervention (2016-2026), the design of these mechanisms has high practical stakes, but current approaches remain ad hoc and ideologically charged. We address this gap by developing a Scope $\times$ Authority taxonomy that maps the design space of emergency architectures along two dimensions: the precision of the intervention and the concentration of trigger authority. We formalize the resulting tradeoffs of standing centralization cost, containment speed, and collateral disruption as a stochastic decision support framework, and derive three empirical hypotheses from it. Assessing the framework against 705 documented exploit incidents, we find that containment time varies systematically by authority type, that losses follow a heavy-tailed distribution ($\alpha \approx 1.33$) concentrating risk in rare catastrophic events, and that community sentiment plausibly modulates the effective cost of maintaining intervention capability. Using scope breadth as a practical proxy for blast potential, we also find that narrower interventions (Account/Module) do not underperform broader ones (Protocol/Network) on containment success and are slightly faster at the median, giving partial empirical support to the scope-blast hypothesis. The analysis yields design guidance for emergency governance and reframes the problem as one of engineering tradeoffs rather than ideological debate.

#28 A traffic analysis attack against Introduction Protocol and Onion Services

著者: Nicolas Constantinides

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2602.23560

要約:
Tor onion services rely on long-lived introduction circuits to support anonymous rendezvous between clients and services. Although Tor incorporates defenses against traffic analysis, the introduction protocol retains deterministic routing structure that can be exploited by an adversary. We present a practical intersection attack against Tor introduction circuits that over repeated interactions can identify each hop from the introduction point toward the onion service while requiring observation at only one relay per stage. The attack repeatedly probes the target service and intersects sets of destination IP addresses observed within narrowly bounded INTRODUCE1-RENDEZVOUS2 intervals, without assuming global visibility or access to packet payloads. Our traffic-analysis technique identifies with certainty the next relay in the path to target at each stage, thereby revealing a gap in Tor's privacy model, which is intended to resist traffic-analysis attacks in which an adversary uses traffic patterns to determine which points in the network to observe or attack. We evaluate the attack's feasibility through live-network experiments using a self-operated onion service and relays. To support data minimization, we implement a Tor-compatible plugin that computes intersections online over pseudonymized data retained only in volatile memory. Our experiments show reliable convergence in practice, with convergence rate influenced by relay consensus weight and time-varying background traffic. We further assess practicality under a partial-global adversary model and discuss the implications of geographic concentration in Tor relay selection weight across cooperating jurisdictions.

#29 Detecting Cognitive Signatures in Typing Behavior for Non-Intrusive Authorship Verification

著者: David Condrey

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.00177

要約:
The proliferation of AI-generated text has intensified the need for reliable authorship verification, yet current output-based methods are increasingly unreliable. We observe that the ordinary typing interface captures rich cognitive signatures, measurable patterns in keystroke timing that reflect the planning, translating, and revising stages of genuine composition. Drawing on large-scale keystroke datasets comprising over 136 million events, we define the Cognitive Load Correlation (CLC) and show it distinguishes genuine composition from mechanical transcription. We present a non-intrusive verification framework that operates within existing writing interfaces, collecting only timing metadata to preserve privacy. Our analytical evaluation estimates 85 to 95 percent discrimination accuracy under stated assumptions, while limiting biometric leakage via evidence quantization. We analyze the adversarial robustness of cognitive signatures, showing they resist timing-forgery attacks that defeat motor-level authentication because the cognitive channel is entangled with semantic content. We conclude that reframing authorship verification as a human-computer interaction problem provides a privacy-preserving alternative to invasive surveillance.

#30 A TEE-Based Architecture for Confidential and Dependable Process Attestation in Authorship Verification

著者: David Condrey

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.00178

要約:
Process attestation systems verify that a continuous physical process, such as human authorship, actually occurred, rather than merely checking system state. These systems face a fundamental dependability challenge: the evidence collection infrastructure must remain available and tamper-resistant even when the attesting party controls the platform. Trusted Execution Environments (TEEs) provide hardware-enforced isolation that can address this challenge, but their integration with continuous process attestation introduces novel resilience requirements not addressed by existing frameworks. We present the first architecture for continuous process attestation evidence collection inside TEEs, providing hardware-backed tamper resistance against trust-inverted adversaries with graduated input assurance from software-channel integrity (Tier 1) through hardware-bound input (Tier 3). We develop a Markov-chain dependability model quantifying Evidence Chain Availability (ECA), Mean Time Between Evidence Gaps (MTBEG), and Recovery Time Objectives (RTO). We introduce a resilient evidence chain protocol maintaining chain integrity across TEE crashes, network partitions, and enclave migration. Our security analysis derives formal bounds under combined threat models including trust inversion and TEE side channels, parameterized by a conjectural side-channel leakage bound esc that requires empirical validation. Evaluation on Intel SGX demonstrates under 25% per-checkpoint CPU overhead (<0.3% of the 30 s checkpoint interval), >99.5% Evidence Chain Availability (ECA) (the fraction of session time with active evidence collection) in Monte Carlo simulation under Poisson failure models, and sealed-state recovery under 200 ms.

#31 Privacy-Preserving Proof of Human Authorship via Zero-Knowledge Process Attestation

privacy

著者: David Condrey

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.00179

要約:
Process attestation verifies human authorship by collecting behavioral biometric evidence, including keystroke dynamics, typing patterns, and editing behavior, during the creative process. However, the very data needed to prove authenticity can reveal intimate details about an author's cognitive state, health conditions, and identity, constituting sensitive biometric data under GDPR Article 9. We resolve this privacy-attestation paradox using zero-knowledge proofs. We present ZK-PoP, a construction that allows a verifier to confirm that (a) sequential work function chains were computed correctly, (b) behavioral feature vectors fall within human population distributions, and (c) content evolution is consistent with incremental human editing, all without learning the underlying behavioral data, exact timing, or intermediate content. Our construction uses Groth16 proofs over arithmetic circuits with Pedersen commitments and Bulletproof range proofs. We prove that ZK-PoP is computationally zero-knowledge, computationally sound, and achieves unlinkability across sessions. Evaluation shows proof generation in under 30 seconds for a 1-hour writing session, with 192-byte proofs verifiable in 8.2 ms, while incurring less than 5% accuracy loss in simulation at practical privacy levels (epsilon >= 1.0) compared to non-private baselines.

#32 The Power of Power Codes: New Classes of Easy Instances for the Linear Equivalence Problem

著者: Michele Battagliola, Anna-Lena Horlemann, Abhinaba Mazumder, Rocco Mora, Paolo Santini, Michael Schaller, Violetta Weger

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.23230

要約:
Given two linear codes, the Linear Equivalence Problem (LEP) asks to find (if it exists) a linear isometry between them; as a special case, we have the Permutation Equivalence Problem (PEP), in which isometries must be permutations. LEP and PEP have recently gained renewed interest as the security foundations for several post-quantum schemes, including LESS. A recent paper has introduced the use of the Schur product to solve PEP, identifying many new easy-to-solve instances. In this paper, we extend this result to LEP. In particular, we generalize the approach and rely on the more general notion of power codes. Combining it with Frobenius automorphisms and Hermitian hulls, we identify many classes of easy LEP instances. To the best of our knowledge, this is the first work exploiting algebraic weaknesses for LEP. Finally we show an improved reduction to PEP whenever the coefficients of the monomial matrix are in a subgroup of the multiplicative group of the finite field.

#33 Privacy Leakage via Output Label Space and Differentially Private Continual Learning

privacy

著者: Marlon Tobaben, Talal Alrawajfeh, Marcus Klasson, Mikko Heikkil\"a, Arno Solin, Antti Honkela

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2411.04680

要約:
Differential privacy (DP) is a formal privacy framework that enables training machine learning (ML) models while protecting individuals' data. As pointed out by prior work, ML models are part of larger systems, which can lead to so-called privacy side-channels even if the model training itself is DP. We identify the output label space of a classification model as such a privacy side-channel and show a concrete privacy attack that exploits it. The side-channel becomes highly relevant in continual learning (CL), where the output label space changes over time. To reason about privacy guarantees in CL, we introduce a formalisation of DP for CL, which also clarifies how our approach differs from existing approaches. We propose and evaluate two methods for eliminating this side-channel: applying an optimal DP mechanism to release the labels in the sensitive data, and using a large public label space. We explore the trade-offs of these methods through adapting pre-trained models. We demonstrate empirically that our models consistently achieve higher accuracy under DP than previous work over both Split-CIFAR-100 and Split-ImageNet-R, with a stronger privacy model.

#34 How Vulnerable Is My Learned Policy? Universal Adversarial Perturbation Attacks On Modern Behavior Cloning Policies

著者: Akansha Kalra, Basavasagar Patil, Guanhong Tao, Daniel S. Brown

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2502.03698

要約:
Learning from demonstrations is a popular approach to train AI models; however, their vulnerability to adversarial attacks remains underexplored. We present the first systematic study of adversarial attacks, across a range of both classic and recently proposed imitation learning algorithms, including Vanilla Behavior Cloning (Vanilla BC), LSTM-GMM, Implicit Behavior Cloning (IBC), Diffusion Policy (DP), and Vector-Quantized Behavior Transformer (VQ-BET). We study the vulnerability of these methods to both white-box, grey-box and black-box adversarial perturbations. Our experiments reveal that most existing methods are highly vulnerable to these attacks, including black-box transfer attacks that transfer across algorithms. To the best of our knowledge, we are the first to study and compare the vulnerabilities of different popular imitation learning algorithms to both white-box and black-box attacks. Our findings highlight the vulnerabilities of modern imitation learning algorithms, paving the way for future work in addressing such limitations. Videos and code are available at https://sites.google.com/view/uap-attacks-on-bc.

#35 SecureVibeBench: Benchmarking Secure Vibe Coding of AI Agents via Reconstructing Vulnerability-Introducing Scenarios

agent

著者: Junkai Chen, Huihui Huang, Yunbo Lyu, Junwen An, Jieke Shi, Chengran Yang, Ting Zhang, Haoye Tian, Yikun Li, Zhenhao Li, Xin Zhou, Xing Hu, David Lo

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2509.22097

要約:
Large language model-powered code agents are rapidly transforming software engineering, yet the security risks of their generated code have become a critical concern. Existing benchmarks have provided valuable insights, but they fail to capture scenarios in which vulnerabilities are actually introduced by human developers, making fair comparisons between humans and agents infeasible. We therefore introduce SecureVibeBench, a benchmark of 105 C/C++ secure coding tasks sourced from 41 projects in OSS-Fuzz for code agents. SecureVibeBench has the following features: (i) realistic task settings that require multi-file edits in large repositories, (ii)~aligned contexts based on real-world open-source vulnerabilities with precisely identified vulnerability introduction points, and (iii) comprehensive evaluation that combines functionality testing and security checking with both static and dynamic oracles. We evaluate 5 popular code agents like OpenHands, supported by 5 LLMs (e.g., Claude sonnet 4.5) on SecureVibeBench. Results show that current agents struggle to produce both correct and secure code, as even the best-performing one, produces merely 23.8\% correct and secure solutions on SecureVibeBench. Our code and data are on https://github.com/iCSawyer/SecureVibeBench.

#36 SDN-SYN PoW: Adaptive Ingress-Aware Defense with Non-Interactive PoW Against Volumetric SYN Floods

著者: Wenyang Jia, Jingjing Wang, Xianneng Zou, Kai Lei

公開日: Mon, 27 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.06668

要約:
The stability of Internet services is persistently challenged by large volumetric TCP SYN floods, for which conventional defenses such as SYN Cookies preserve server state but still amplify bandwidth pressure. This paper presents SDN-SYN PoW, an ingress aware defense architecture that integrates non interactive Proof of Work with an SDN control plane for managed edge networks. The controller monitors per ingress SYN pressure and raises PoW difficulty when flooding is detected. If traffic mainly originates from a stable source region, enforcement is refined to the offending source prefix to reduce overhead on benign co located clients; otherwise, ingress wide enforcement is retained under randomized or spoofed sources. We further design a conservative Difficulty Discovery Protocol that reuses TCP retransmissions and commits difficulty updates only after a successful handshake. Experiments on a custom SDN testbed show restored application QoS under concentrated and spoofed floods, 11.7% higher benign client throughput than ingress only enforcement, and below 0.8% transient false escalations under 2% random loss.

cs.CR updates on arXiv.org

📋 論文タイトル一覧