arXiv論文一覧 - cs.CR updates on arXiv.org

#1 Towards a Systematic Risk Assessment of Deep Neural Network Limitations in Autonomous Driving Perception

著者: Svetlana Pavlitska, Christopher Gerking, J. Marius Z\"ollner

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.20895

要約:
Safety and security are essential for the admission and acceptance of automated and autonomous vehicles. Deep neural networks (DNNs) are widely used for perception and further components of the autonomous driving (AD) stack. However, they possess several limitations, including lack of generalization, efficiency, explainability, plausibility, and robustness. These insufficiencies can pose significant risks to autonomous driving systems. However, hazards, threats, and risks associated with DNN limitations in this domain have not been systematically studied so far. In this work, we propose a joint workflow for risk assessment combining the hazard analysis and risk assessment (HARA) following ISO 26262 and threat analysis and risk assessment (TARA) following the ISO/SAE 21434 to identify and analyze risks arising from inherent DNN limitations in AD perception.

#2 Sensitivity Uncertainty Alignment in Large Language Models

著者: Prakul Sunil Hiremath, Harshit R. Hiremath

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.20903

要約:
We propose Sensitivity-Uncertainty Alignment (SUA), a framework for analyzing failures of large language models under adversarial and ambiguous inputs. We argue that adversarial sensitivity and ambiguity reflect a common issue: misalignment between prediction instability and model uncertainty. A reliable model should express higher uncertainty when its predictions are unstable; failure to do so leads to miscalibration. We define a scalar score, SUA_theta(x), capturing the difference between distributional sensitivity and predictive entropy. We show that minimizing its positive part bounds worst-case perturbed risk and relates to calibration error. We also formalize ambiguity collapse, where models produce overconfident outputs despite multiple valid interpretations. We introduce SUA-TR, a training method combining consistency regularization and entropy alignment, along with an abstention rule for safer inference. Across tasks including question answering and classification, SUA better identifies model failures than entropy or self-consistency alone. The framework is model-agnostic and provides a basis for improving reliability in evolving language models.

#3 Omission Constraints Decay While Commission Constraints Persist in Long-Context LLM Agents

agent

著者: Yeran Gamage

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.20911

要約:
LLM agents deployed in production operate under operator-defined behavioral policies (system-prompt instructions such as prohibitions on credential disclosure, data exfiltration, and unauthorized output) that safety evaluations assume hold throughout a conversation. Prohibition-type constraints decay under context pressure while requirement-type constraints persist; we term this asymmetry Security-Recall Divergence (SRD). In a 4,416-trial three-arm causal study across 12 models and 8 providers at six conversation depths, omission compliance falls from 73% at turn 5 to 33% at turn 16 while commission compliance holds at 100% (Mistral Large 3, $p < 10^{-33}$). In the two models with token-matched padding controls, schema semantic content accounts for 62-100% of the dilution effect. Re-injecting constraints before the per-model Safe Turn Depth (STD) restores compliance without retraining. Production security policies consist of prohibitions such as never revealing credentials, never executing untrusted code, and never forwarding user data. Commission-type audit signals remain healthy while omission constraints have already failed, leaving the failure invisible to standard monitoring.

#4 Hidden Secrets in the arXiv: Discovering, Analyzing, and Preventing Unintentional Information Disclosure in Source Files of Scientific Preprints

著者: Jan Pennekamp, Johannes Lohm\"oller, David Sch\"utte, Joscha Loos, Martin Henze

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.20927

要約:
Preprints are essential for the timely and open dissemination of research. arXiv, the most widely used preprint service, takes the idea of open science one step further by not only publishing the actual preprints but also LaTeX sources and other files used to create them. As known from other contexts, such as GitHub repositories, and anecdotally exemplified for arXiv, making source code publicly available risks disclosing otherwise "hidden" information. Consequently, the public availability of paper sources raises the question of how much sensitive content is (unintentionally) disclosed through them. In this paper, we systematically answer this question for all 2.7M arXiv submissions with available source files across three dimensions of source file-induced information disclosure: (1) inclusion of unnecessary files, (2) metadata embedded in files, and (3) irrelevant content in files such as source code comments. Our analysis reveals that nearly every arXiv submission contains some form of "hidden" information. Notable findings range from links to editable web documents for internal coordination over API and private keys to complete Git histories. While different tools promise to remove such information from source files, we show that they fail to reliably achieve the intended cleaning functionality. To mitigate this situation, we provide ALC-NG to comprehensively remove files, metadata, and comments that are not needed to compile a LaTeX paper.

#5 SafeRedirect: Defeating Internal Safety Collapse via Task-Completion Redirection in Frontier LLMs

著者: Chao Pan, Yu Wu, Xin Yao

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.20930

要約:
Internal Safety Collapse (ISC) is a failure mode in which frontier LLMs, when executing legitimate professional tasks whose correct completion structurally requires harmful content, spontaneously generate that content with safety failure rates exceeding 95%. Existing input-level defenses achieve a 100% failure rate against ISC, and standard system prompt defenses provide only partial mitigation. We propose SafeRedirect, a system-level override that defeats ISC by redirecting the model's task-completion drive rather than suppressing it. SafeRedirect grants explicit permission to fail the task, prescribes a deterministic hard-stop output, and instructs the model to preserve harmful placeholders unresolved. Evaluated on seven frontier LLMs across three AI/ML-related ISC task types in the single-turn setting, SafeRedirect reduces average unsafe generation rates from 71.2% to 8.0%, compared to 55.0% for the strongest viable baseline. Multi-model ablation reveals that failure permission and condition specificity are universally critical, while the importance of other components varies across models. Cross-attack evaluation confirms state-of-the-art defense against ISC with generalization performance at least on par with the baseline on other attack families. Code is available at https://github.com/fzjcdt/SafeRedirect.

#6 Adaptive Defense Orchestration for RAG: A Sentinel-Strategist Architecture against Multi-Vector Attacks

著者: Pranav Pallerla, Wilson Naik Bhukya, Bharath Vemula, Charan Ramtej Kodi

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.20932

要約:
Retrieval-augmented generation (RAG) systems are increasingly deployed in sensitive domains such as healthcare and law, where they rely on private, domain-specific knowledge. This capability introduces significant security risks, including membership inference, data poisoning, and unintended content leakage. A straightforward mitigation is to enable all relevant defenses simultaneously, but doing so incurs a substantial utility cost. In our experiments, an always-on defense stack reduces contextual recall by more than 40%, indicating that retrieval degradation is the primary failure mode. To mitigate this trade-off in RAG systems, we propose the Sentinel-Strategist architecture, a context-aware framework for risk analysis and defense selection. A Sentinel detects anomalous retrieval behavior, after which a Strategist selectively deploys only the defenses warranted by the query context. Evaluated across three benchmark datasets and five orchestration models, ADO is shown to eliminate MBA-style membership inference leakage while substantially recovering retrieval utility relative to a fully static defense stack, approaching undefended baseline levels. Under data poisoning, the strongest ADO variants reduce attack success to near zero while restoring contextual recall to more than 75% of the undefended baseline, although robustness remains sensitive to model choice. Overall, these findings show that adaptive, query-aware defense can substantially reduce the security-utility trade-off in RAG systems.

#7 SDNGuardStack: An Explainable Ensemble Learning Framework for High-Accuracy Intrusion Detection in Software-Defined Networks

著者: Ashikuzzaman, Md. Saifuzzaman Abhi, Mahabubur Rahman, Md. Manjur Ahmed, Md. Mehedi Hasan, Md. Ahsan Arif

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.20934

要約:
Software-Defined Networking (SDN) is another technology that has been developing in the last few years as a relevant technique to improve network programmability and administration. Nonetheless, its centralized design presents a major security issue, which requires effective intrusion detection systems. The SDN-specific machine learning-based intrusion detection system described in this paper is innovative because it is trained and tested on the InSDN dataset which models attack scenarios and realistic traffic patterns in SDN. Our approach incorporates a comprehensive preprocessing pipeline, feature selection via Mutual Information, and a novel ensemble learning model, SDNGuardStack, which combines multiple base learners to enhance detection accuracy and efficiency. In addition, we include explainable AI methods, including SHAP to add transparency to model predictions, which helps security analysts respond to incidents. The experiments prove that SDNGuard-Stack has an accuracy rate of 99.98% and a Cohen Kappa of 0.9998, surpassing other models, and at the same time being interpretable and practically executable. It is interesting to see such features like Flow ID, Bwd Header Len, and Src Port as the most important factors in the model predictions. The work is a step towards closing the gap between performance intrusion detection and realistic deployment in SDN, which will lead to the creation of secure and resilient network infrastructures.

#8 Breaking Bad: Interpretability-Based Safety Audits of State-of-the-Art LLMs

著者: Krishiv Agarwal, Ramneet Kaur, Colin Samplawski, Manoj Acharya, Anirban Roy, Daniel Elenius, Brian Matejek, Adam D. Cobb, Susmit Jha

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.20945

要約:
Effective safety auditing of large language models (LLMs) demands tools that go beyond black-box probing and systematically uncover vulnerabilities rooted in model internals. We present a comprehensive, interpretability-driven jailbreaking audit of eight SOTA open-source LLMs: Llama-3.1-8B, Llama-3.3-70B-4bt, GPT-oss- 20B, GPT-oss-120B, Qwen3-0.6B, Qwen3-32B, Phi4-3.8B, and Phi4-14B. Leveraging interpretability-based approaches -- Universal Steering (US) and Representation Engineering (RepE) -- we introduce an adaptive two-stage grid search algorithm to identify optimal activation-steering coefficients for unsafe behavioral concepts. Our evaluation, conducted on a curated set of harmful queries and a standardized LLM-based judging protocol, reveals stark contrasts in model robustness. The Llama-3 models are highly vulnerable, with up to 91\% (US) and 83\% (RepE) jailbroken responses on Llama-3.3-70B-4bt, while GPT-oss-120B remains robust to attacks via both interpretability approaches. Qwen and Phi models show mixed results, with the smaller Qwen3-0.6B and Phi4-3.8B mostly exhibiting lower jailbreaking rates, while their larger counterparts are more susceptible. Our results establish interpretability-based steering as a powerful tool for systematic safety audits, but also highlight its dual-use risks and the need for better internal defenses in LLM deployment.

#9 Breaking MCP with Function Hijacking Attacks: Novel Threats for Function Calling and Agentic Models

agent

著者: Yannis Belkhiter, Giulio Zizzo, Sergio Maffeis, Seshu Tirupathi, John D. Kelleher

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.20994

要約:
The growth of agentic AI has drawn significant attention to function calling Large Language Models (LLMs), which are designed to extend the capabilities of AI-powered system by invoking external functions. Injection and jailbreaking attacks have been extensively explored to showcase the vulnerabilities of LLMs to user prompt manipulation. The expanded capabilities of agentic models introduce further vulnerabilities via their function calling interface. Recent work in LLM security showed that function calling can be abused, leading to data tampering and theft, causing disruptive behavior such as endless loops, or causing LLMs to produce harmful content in the style of jailbreaking attacks. This paper introduces a novel function hijacking attack (FHA) that manipulates the tool selection process of agentic models to force the invocation of a specific, attacker-chosen function. While existing attacks focus on semantic preference of the model for function-calling tasks, we show that FHA is largely agnostic to the context semantics and robust to the function sets, making it applicable across diverse domains. We further demonstrate that FHA can be trained to produce universal adversarial functions, enabling a single attacked function to hijack tool selection across multiple queries and payload configurations. We conducted experiments on 5 different models, including instructed and reasoning variants, reaching 70% to 100% ASR over the established BFCL dataset. Our findings further demonstrate the need for strong guardrails and security modules for agentic systems.

#10 VRSafe: A Secure Virtual Keyboard to Mitigate Keystroke Inference in Virtual Reality

著者: Yijun Yuan, Na Du, Adam J. Lee, Balaji Palanisamy

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21001

要約:
Password-based authentication is one of the most commonly used methods for verifying user identities, and its widespread usage continues in virtual reality (VR) applications. As a result, various forms of attacks on password-based authentication in traditional environments such as keystroke inference and shoulder surfing, are still effective in VR applications. While keystroke inference attacks on virtual keyboards have been studied extensively, few efforts have developed an effective and cost-efficient defense strategy to mitigate keystroke inferences in VR. To address this gap, this paper presents a novel QWERTY keyboard called \textit{VRSafe} that is resilient to keystroke inference attacks. The proposed keyboard carefully introduces false positive keystrokes into the information collected by attackers during the typing process, making the inference of the original password difficult. \textit{VRSafe} also incorporates a novel malicious login detector that can effectively identify unauthorized login attempts using credentials inferred from keystroke inference attacks with high detection rate and minimal time and memory cost. The proposed design is evaluated through both simulation experiments and a real-world user study, and the results show that \textit{VRSafe} can significantly reduce the accuracy of keystroke inference attacks while incurring a modest overhead from a usability standpoint.

#11 Layer 2 Blockchains Simplified: A Survey of Vector Commitment Schemes, ZKP Frameworks, Layer-2 Data Structures and Verkle Trees

著者: Ekleen Kaur, Marko Suvajdzic

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21055

要約:
Layer-2 (L2) protocols address the fundamental limitations of Layer-1 (L1) blockchains by offloading computation while anchoring trust to the parent chain. This architectural shift, while boosting throughput, introduces a new, complex security surface defined by off-chain components like sequencers, bridges, and data availability mechanisms. Prior literature[31][33] offers fragmented views of this risk. This paper presents the first unified, security-focused survey that rigorously maps L2 architecture to its underlying cryptographic security. We dissect the technical progression from L1 primitives to the core of modern L2s, analyzing the security assumptions(Discrete Logarithm, Computational Diffie-Hellman, Bilinear Diffie-Hellman) of ZK frameworks (Groth16, Plonk) and their corresponding commitment schemes (KZG, IPA). We formalize a comprehensive L2 threat model encompassing sequencer liveness, bridge exploits, and data-availability failures. This work serves as an accessible yet rigorous reference for researchers and developers to reason about L2 security from a deep crypto-mathematical perspective.

#12 Behavioral Consistency and Transparency Analysis on Large Language Model API Gateways

著者: Guanjie Lin, Yinxin Wan, Shichao Pei, Ting Xu, Kuai Xu, Guoliang Xue

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21083

要約:
Third-party Large Language Model (LLM) API gateways are rapidly emerging as unified access points to models offered by multiple vendors. However, the internal routing, caching, and billing policies of these gateways are largely undisclosed, leaving users with limited visibility into whether requests are served by the advertised models, whether responses remain faithful to upstream APIs, or whether invoices accurately reflect public pricing policies. To address this gap, we introduce GateScope, a lightweight black-box measurement framework for evaluating behavioral consistency and operational transparency in commercial LLM gateways. GateScope is designed to detect key misbehaviors, including model downgrading or switching, silent truncation, billing inaccuracies, and instability in latency by auditing gateways along four critical dimensions: response content analysis, multi-turn conversation performance, billing accuracy, and latency characteristics. Our measurements across 10 real-world commercial LLM API gateways reveal frequent gaps between expected and actual behaviors, including silent model substitutions, degraded memory retention, deviations from announced pricing, and substantial variation in latency stability across platforms.

#13 Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms

agent

著者: Ari Azarafrooz

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21131

要約:
AI-agent guardrails are memoryless: each message is judged in isolation, so an adversary who spreads a single attack across dozens of sessions slips past every session-bound detector because only the aggregate carries the payload. We make three contributions to cross-session threat detection. (1) Dataset. CSTM-Bench is 26 executable attack taxonomies classified by kill-chain stage and cross-session operation (accumulate, compose, launder, inject_on_reader), each bound to one of seven identity anchors that ground-truth "violation" as a policy predicate, plus matched Benign-pristine and Benign-hard confounders. Released on Hugging Face as intrinsec-ai/cstm-bench with two 54-scenario splits: dilution (compositional) and cross_session (12 isolation-invisible scenarios produced by a closed-loop rewriter that softens surface phrasing while preserving cross-session artefacts). (2) Measurement. Framing cross-session detection as an information bottleneck to a downstream correlator LLM, we find that a session-bound judge and a Full-Log Correlator concatenating every prompt into one long-context call both lose roughly half their attack recall moving from dilution to cross_session, well inside any frontier context window. Scope: 54 scenarios per shard, one correlator family (Anthropic Claude), no prompt optimisation; we release it to motivate larger, multi-provider datasets. (3) Algorithm and metric. A bounded-memory Coreset Memory Reader retaining highest-signal fragments at $K=50$ is the only reader whose recall survives both shards. Because ranker reshuffles break KV-cache prefix reuse, we promote $\mathrm{CSR\_prefix}$ (ordered prefix stability, LLM-free) to a first-class metric and fuse it with detection into $\mathrm{CSTM} = 0.7 F_1(\mathrm{CSDA@action}, \mathrm{precision}) + 0.3 \mathrm{CSR\_prefix}$, benchmarking rankers on a single Pareto of recall versus serving stability.

#14 Image-Based Malware Type Classification on MalNet-Image Tiny: Effects of Multi-Scale Fusion, Transfer Learning, Data Augmentation, and Schedule-Free Optimization

著者: Ahmed A. Abouelkhaire, Waleed A. Yousef, Issa Traor

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21153

要約:
This paper studies 43-class malware type classification on MalNet-Image Tiny, a public benchmark derived from Android APK files. The goal is to assess whether a compact image classifier benefits from four components evaluated in a controlled ablation: a feature pyramid network (FPN) for scale variation induced by resizing binaries of different lengths, ImageNet pretraining, lightweight augmentation through Mixup and TrivialAugment, and schedule-free AdamW optimization. All experiments use a ResNet18 backbone and the provided train/validation/test split. Reproducing the benchmark-style configuration yields macro-F1 (F1_macro) of 0.6510, consistent with the reported baseline of approximately 0.65. Replacing the optimizer with schedule-free AdamW and using unweighted cross-entropy increases F1_macro to 0.6535 in 10 epochs, compared with 96 epochs for the reproduced baseline. The best configuration combines pretraining, Mixup, TrivialAugment, and FPN, reaching F1_macro=0.6927, P_macro=0.7707, AUC_macro=0.9556, and L_test=0.8536. The ablation indicates that the largest gains in F1_macro arise from pretraining and augmentation, whereas FPN mainly improves P_macro, AUC_macro, and L_test in the strongest configuration.

#15 Adaptive Instruction Composition for Automated LLM Red-Teaming

著者: Jesse Zymet, Andy Luo, Swapnil Shinde, Sahil Wadhwa, Emily Chen

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21159

要約:
Many approaches to LLM red-teaming leverage an attacker LLM to discover jailbreaks against a target. Several of them task the attacker with identifying effective strategies through trial and error, resulting in a semantically limited range of successes. Another approach discovers diverse attacks by combining crowdsourced harmful queries and tactics into instructions for the attacker, but does so at random, limiting effectiveness. This article introduces a novel framework, Adaptive Instruction Composition, that combines crowdsourced texts according to an adaptive mechanism trained to jointly optimize effectiveness with diversity. We use reinforcement learning to balance exploration with exploitation in a combinatorial space of instructions to guide the attacker toward diverse generations tailored to target vulnerabilities. We demonstrate that our approach substantially outperforms random combination on a set of effectiveness and diversity metrics, even under model transfer. Further, we show that it surpasses a host of recent adaptive approaches on Harmbench. We employ a lightweight neural contextual bandit that adapts to contrastive embedding inputs, and provide ablations suggesting that the contrastive pretraining enables the network to rapidly generalize and scale to the massive space as it learns.

#16 Position Paper: Denial-of-Service Against Multi-Round Transaction Simulation

著者: Yuzhe Tang, Yibo Wang, Wanning Ding, Jiaqi Chen, Taesoo Kim

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21169

要約:
In Ethereum, transaction-bundling services are a critical component of block builders, such as Flashbots Bundles, and are widely used by MEV searchers. Disrupting bundling services can degrade searcher experience and reduce builder revenue. Despite the extensive studies, the existing denial-of-service attack designs are ineffective against bundling services due to their unique multi-round execution model. This paper studies the open problem of asymmetric denial-of-service against bundling services. We develop evasive, risk-free, and low-cost DoS attacks on Flashbots' bundling service, the only open-source bundling service known to us. Our attacks exploit inter-transaction dependencies through contract state to achieve evasiveness, and abuse bundling-specific features, such as atomic block inclusion, to significantly reduce both capital and operational costs of the attack. Experimental results show that our attacks achieve high success rates, substantially reduce builders' revenue, and slow block production. We further propose mitigation strategies for the identified risks.

#17 Physically Unclonable Functions for Secure IoT Authentication and Hardware-Anchored AI Model Integrity

著者: Maryam Taghi Zadeh, Mohsen Ahmadi

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21188

要約:
The rapid integration of artificial intelligence (AI) into Internet of Things (IoT) and edge computing systems has intensified the need for robust, hardware-rooted trust mechanisms capable of ensuring device authenticity and AI model integrity under strict resource and security constraints. This survey reviews and synthesizes existing literature on hardware-rooted trust mechanisms for AI-enabled IoT systems. It systematically examines and compares representative trust anchor mechanisms, including Trusted Platform Module (TPM)-based measurement and attestation, silicon and FPGA-based Physical Unclonable Functions (PUFs), hybrid container-aware hardware roots of trust, and software-only security approaches. The analysis highlights how hardware-rooted solutions generally provide stronger protection against physical tampering and device cloning compared to software-only approaches, particularly in adversarial and physically exposed environments, while hybrid designs extend hardware trust into runtime and containerized environments commonly used in modern edge deployments. By evaluating trade-offs among security strength, scalability, cost, and deployment complexity, the study shows that PUF-based and hybrid trust anchors offer a promising balance for large-scale, AI-enabled IoT systems, whereas software-only trust mechanisms remain insufficient in adversarial and physically exposed settings. The presented comparison aims to clarify current design challenges and guide future development of trustworthy AI-enabled IoT platforms.

#18 ECCFROG522PP: An Enhanced 522 bit Weierstrass Elliptic Curve

著者: Victor Duarte Melo

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21261

要約:
This paper presents ECCFROG522PP, a 522-bit prime-field elliptic curve in short Weierstrass form, designed with a focus on deterministic generation and public reproducibility. The central design principle is that all critical parameters are derived from a fixed public seed through a transparent and verifiable procedure. While many deployed systems rely on NIST P-256 and secp256k1, which target approximately 128-bit classical security, higher security applications typically consider curves such as NIST P-521, Curve448, and Brainpool P512. ECCFROG522PP is intended for the same general classical security range as P-521, with emphasis on transparency, auditability, and reproducibility rather than performance optimization. The curve parameters are generated through a BLAKE3-based deterministic pipeline with publicly specified indices. The resulting construction has prime order, cofactor one, and a deterministically derived base point of full order. The quadratic twist has a large proven prime factor, and the construction includes a documented lower bound on the embedding degree together with standard sanity checks against low embedding degree reductions and basic CM discriminant anomalies. The full generation and validation procedure can be reproduced end to end from public artifacts and reference scripts, enabling independent verification of all parameters and checks.

#19 Strategic Heterogeneous Multi-Agent Architecture for Cost-Effective Code Vulnerability Detection

agent

著者: Zhaohui Geoffrey Wang

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21282

要約:
Automated code vulnerability detection is critical for software security, yet existing approaches face a fundamental trade-off between detection accuracy and computational cost. We propose a heterogeneous multi-agent architecture inspired by game-theoretic principles, combining cloud-based LLM experts with a local lightweight verifier. Our "3+1" architecture deploys three cloud-based expert agents (DeepSeek-V3) that analyze code from complementary perspectives - code structure, security patterns, and debugging logic - in parallel, while a local verifier (Qwen3-8B) performs adversarial validation at zero marginal cost. We formalize this design through a two-layer game framework: (1) a cooperative game among experts capturing super-additive value from diverse perspectives, and (2) an adversarial verification game modeling quality assurance incentives. Experiments on 262 real samples from the NIST Juliet Test Suite across 14 CWE types, with balanced vulnerable and benign classes, demonstrate that our approach achieves a 77.2% F1 score with 62.9% precision and 100% recall at $0.002 per sample - outperforming both a single-expert LLM baseline (F1 71.4%) and Cppcheck static analysis (MCC 0). The adversarial verifier significantly improves precision (+10.3 percentage points, p < 1e-6, McNemar's test) by filtering false positives, while parallel execution achieves a 3.0x speedup. Our work demonstrates that game-theoretic design principles can guide effective heterogeneous multi-agent architectures for cost-sensitive software engineering tasks.

#20 CI-Work: Benchmarking Contextual Integrity in Enterprise LLM Agents

agent

著者: Wenjie Fu, Xiaoting Qin, Jue Zhang, Qingwei Lin, Lukas Wutschitz, Robert Sim, Saravan Rajmohan, Dongmei Zhang

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21308

要約:
Enterprise LLM agents can dramatically improve workplace productivity, but their core capability, retrieving and using internal context to act on a user's behalf, also creates new risks for sensitive information leakage. We introduce CI-Work, a Contextual Integrity (CI)-grounded benchmark that simulates enterprise workflows across five information-flow directions and evaluates whether agents can convey essential content while withholding sensitive context in dense retrieval settings. Our evaluation of frontier models reveals that privacy failures are prevalent (violation rates range from 15.8%-50.9%, with leakage reaching up to 26.7%) and uncovers a counterintuitive trade-off critical for industrial deployment: higher task utility often correlates with increased privacy violations. Moreover, the massive scale of enterprise data and potential user behavior further amplify this vulnerability. Simply increasing model size or reasoning depth fails to address the problem. We conclude that safeguarding enterprise workflows requires a paradigm shift, moving beyond model-centric scaling toward context-centric architectures.

#21 Adversarial Evasion in Non-Stationary Malware Detection: Minimizing Drift Signals through Similarity-Constrained Perturbations

著者: Pawan Acharya, Lan Zhang

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21310

要約:
Deep learning has emerged as a powerful approach for malware detection, demonstrating impressive accuracy across various data representations. However, these models face critical limitations in real-world, non-stationary environments where both malware characteristics and detection systems continuously evolve. Our research investigates a fundamental security question: Can an attacker generate adversarial malware samples that simultaneously evade classification and remain inconspicuous to drift monitoring mechanisms? We propose a novel approach that generates targeted adversarial examples in the classifier's standardized feature space, augmented with sophisticated similarity regularizers. By carefully constraining perturbations to maintain distributional similarity with clean malware, we create an optimization objective that balances targeted misclassification with drift signal minimization. We quantify the effectiveness of this approach by comprehensively comparing classifier output probabilities using multiple drift metrics. Our experiments demonstrate that similarity constraints can reduce output drift signals, with $\ell_2$ regularization showing the most promising results. We observe that perturbation budget significantly influences the evasion-detectability trade-off, with increased budget leading to higher attack success rates and more substantial drift indicators.

#22 Provably Secure Steganography Based on List Decoding

著者: Kaiyi Pang, Minhao Bai

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21394

要約:
Steganography embeds secret messages in seemingly innocuous carriers for covert communication under surveillance. Current Provably Secure Steganography (PSS) schemes based on language models can guarantee computational indistinguishability between the covertext and stegotext. However, achieving high embedding capacity remains a challenge for existing PSS. The inefficient entropy utilization renders them not well-suited for Large Language Models (LLMs), whose inherent low-entropy tendencies severely constrain feasible embedding capacity. To address this, we propose a provably secure steganography scheme with a theoretically proved high capacity. Our scheme is based on the concept of list decoding: it maintains a set of candidates that contain the correct secret message, instead of directly finding the correct message with more effort. This strategy fully utilizes the information content of the generated text, yielding higher capacity. To ensure the correctness of our scheme, we further introduce a suffix-matching mechanism to distinguish the correct secret message from the candidates. We provide theoretical proofs for both the security and correctness of our scheme, alongside a derivation of its theoretical capacity lower bound. Our approach is plug-and-play, requiring only a direct replacement of the model's standard random sampling module. Experiments on three LLMs and seven PSS baselines demonstrate that our method achieves computational efficiency comparable to prior PSS schemes while delivering a substantial improvement in embedding capacity.

#23 CSC: Turning the Adversary's Poison against Itself

著者: Yuchen Shi, Xin Guo, Huajie Chen, Tianqing Zhu, Bo Liu, Wanlei Zhou

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21416

要約:
Poisoning-based backdoor attacks pose significant threats to deep neural networks by embedding triggers in training data, causing models to misclassify triggered inputs as adversary-specified labels while maintaining performance on clean data. Existing poison restraint-based defenses often suffer from inadequate detection against specific attack variants and compromise model utility through unlearning methods that lead to accuracy degradation. This paper conducts a comprehensive analysis of backdoor attack dynamics during model training, revealing that poisoned samples form isolated clusters in latent space early on, with triggers acting as dominant features distinct from benign ones. Leveraging these insights, we propose Cluster Segregation Concealment (CSC), a novel poison suppression defense. CSC first trains a deep neural network via standard supervised learning while segregating poisoned samples through feature extraction from early epochs, DBSCAN clustering, and identification of anomalous clusters based on class diversity and density metrics. In the concealment stage, identified poisoned samples are relabeled to a virtual class, and the model's classifier is fine-tuned using cross-entropy loss to replace the backdoor association with a benign virtual linkage, preserving overall accuracy. CSC was evaluated on four benchmark datasets against twelve poisoning-based attacks, CSC outperforms nine state-of-the-art defenses by reducing average attack success rates to near zero with minimal clean accuracy loss. Contributions include robust backdoor patterns identification, an effective concealment mechanism, and superior empirical validation, advancing trustworthy artificial intelligence.

#24 Differentially Private De-identification of Dutch Clinical Notes: A Comparative Evaluation

privacy

著者: Michele Miranda, Xinlan Yan, Nishant Mishra, Rachel Murphy, Ameen Abu-Hanna, S\'ebastien Brati\`eres, Iacer Calixto

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21421

要約:
Protecting patient privacy in clinical narratives is essential for enabling secondary use of healthcare data under regulations such as GDPR and HIPAA. While manual de-identification remains the gold standard, it is costly and slow, motivating the need for automated methods that combine privacy guarantees with high utility. Most automated text de-identification pipelines employed named entity recognition (NER) to identify protected entities for redaction. Although methods based on differential privacy (DP) provide formal privacy guarantees, more recently also large language models (LLMs) are increasingly used for text de-identification in the clinical domain. In this work, we present the first comparative study of DP, NER, and LLMs for Dutch clinical text de-identification. We investigate these methods separately as well as hybrid strategies that apply NER or LLM preprocessing prior to DP, and assess performance in terms of privacy leakage and extrinsic evaluation (entity and relation classification). We show that DP mechanisms alone degrade utility substantially, but combining them with linguistic preprocessing, especially LLM-based redaction, significantly improves the privacy-utility trade-off.

#25 A Stackelberg Model for Hybridization in Cryptography

著者: Willie Kouam, Stefan Rass, Zahra Seyedi, Shahzad Ahmad, Eckhard Pfluegel

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21436

要約:
Similar to a strategic interaction between rational and intelligent agents, cryptography problems can be examined through the prism of game theory. In this setting, the agent aiming to protect a message is called the defender, while the one attempting to decrypt it, generally for malicious purposes, is the attacker. To strengthen security in cryptography, various strategies have been developed, among which hybridization stands out as a key concept in modern cryptographic design. This strategy allows the defender to select among different encryption algorithms (classical, post-quantum, or hybrid) while carefully balancing security and operational costs. On the other side, the attacker, limited by available resources, chooses cryptanalysis methods capable of breaching the selected algorithm. We model this interaction as a Stackelberg cryptographic hybridization problem under resource constraints. Here, the defender randomizes over encryption algorithms, and the attacker observes the choice before selecting suitable cryptanalysis methods. The attacker's decision is framed as a conditional optimization problem, which we refer to as the ``attacker subgame''. We then propose a dynamic programming approach for the attacker's subgame, while the defender's Stackelberg optimization is formulated as a linear program.

#26 MCP Pitfall Lab: Exposing Developer Pitfalls in MCP Tool Server Security under Multi-Vector Attacks

著者: Run Hao, Zhuoran Tan

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21477

要約:
Model Context Protocol (MCP) is increasingly adopted for tool-integrated LLM agents, but its multi-layer design and third-party server ecosystem expand risks across tool metadata, untrusted outputs, cross-tool flows, multimodal inputs, and supply-chain vectors. Existing MCP benchmarks largely measure robustness to malicious inputs but offer limited remediation guidance. We present MCP Pitfall Lab, a protocol-aware security testing framework that operationalizes developer pitfalls as reproducible scenarios and validates outcomes with MCP traces and objective validators (rather than agent self-report). We instantiate three workflow challenges (email, document, crypto) with six server variants (baseline and hardened) and model three attack families: tool-metadata poisoning, puppet servers, and multimodal image-to-tool chains, in a unified, trace-grounded evaluation. In Tier-1 static analysis over six variants (36 binary labels), our analyzer achieves F1 = 1.0 on four statically checkable pitfall classes (P1, P2, P5, P6) and flags cross-tool forwarding and image-to-tool leakage (P3, P4) as trace/dataflow-dependent. Applying recommended hardening eliminates all Tier-1 findings (29 to 0) and reduces the framework risk score (10.0 to 0.0) at a mean cost of 27 lines of code (LOC). Finally, in a preliminary 19-run corpus from the email system challenge (tool poisoning and puppet attacks), agent narratives diverge from trace evidence in 63.2% of runs and 100% of sink-action runs, motivating trace-based auditing and regression testing. Overall, Pitfall Lab enables practical, end-to-end assessment and hardening of MCP tool servers under realistic multi-vector conditions.

#27 Benchmarking the Utility of Privacy-Preserving Cox Regression Under Data-Driven Clipping Bounds: A Multi-Dataset Simulation Study

privacy

著者: Keita Fukuyama, Yukiko Mori, Tomohiro Kuroda, Hiroaki Kikuchi

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21491

要約:
Differential privacy (DP) is a mathematical framework that guarantees individual privacy; however, systematic evaluation of its impact on statistical utility in survival analyses remains limited. In this study, we systematically evaluated the impact of DP mechanisms (Laplace mechanism and Randomized Response) with data-driven clipping bounds on the Cox proportional hazards model, using 5 clinical datasets ($n = 168$--$6{,}524$), 15 levels of $\varepsilon$ (0.1--1000), and $B = 1{,}000$ Monte Carlo iterations. The data-driven clipping bounds used here are observed min/max and therefore do not provide formal $\varepsilon$-DP guarantees; the results represent an optimistic lower bound on utility degradation under formal DP. We compared three types of input perturbations (covariates only, all inputs, and the discrete-time model) with output perturbations (dfbeta-based sensitivity), using loss of significance rate (LSR), C-index, and coefficient bias as metrics. At standard DP levels ($\varepsilon \leq 1$), approximately 90% (90--94%) of the significant covariates lost significance, even in the largest dataset ($n = 6{,}524$), and the predictive performance approached random levels (test C-index $\approx 0.5$) under many conditions. Among the input perturbation approaches, perturbing only covariates preserved the risk-set structure and achieved the best recovery, whereas output perturbation (dfbeta-based sensitivity) maintained near-baseline performance at $\varepsilon \geq 5$. At $n \approx 3{,}000$, the significance recovered rapidly at $\varepsilon = 3$--10; however, in practice, $\varepsilon \geq 10$ (for predictive performance) to $\varepsilon \geq 30$--60 (for significance preservation) is required. In the moderate-to-high $\varepsilon$ range, false-positive rates increased for variables whose baseline $p$-values were near the significance threshold.

#28 Mitigate or Fail: How Risk Management Shapes Cybersecurity Competency

著者: Jeffrey T. Gardiner

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21604

要約:
Contemporary cybersecurity governance assumes that professionals apply risk reasoning. Yet major organisational failures persist despite investment in tools, staffing, and credentials. This study investigates the structural source of that paradox. Cybersecurity speaks the language of risk, but its training architecture has shaped the profession to think in terms of threats. A sequential mixed-methods design integrated four analyses; NLP of the NIST NICE Framework v2.0.0 (2,111 TKS statements), SEM (n = 126 cybersecurity professionals), a control-group comparison (n = 133 general professionals), and thematic coding of seven leadership interviews. Four convergent findings emerged. First, "likelihood" and "probability" appear zero times across all TKS statements. Risk management content accounts for 4.5% of high-confidence semantic classifications, ranking 18th of 29 competency domains. NICE codifies threat-management activity while invoking risk mainly at the category level. Second, SEM showed that training exposure significantly predicts risk management competence directly and indirectly through conceptual salience, for a total effect of Beta = .629. However, the theoretically four-dimensional competence construct collapsed into a single factor, indicating epistemic compression. Third, cybersecurity professionals showed no measurable advantage over the general professional population in foundational risk reasoning; only 11.9% showed high differentiation. Fourth, all seven leaders expected Likelihood x Impact reasoning, yet five did not articulate the formula themselves. These findings support a structural conclusion: cybersecurity has taken professional form as a threat-management discipline that has borrowed risk vocabulary. Remediation requires redesign of professional formation, not marginal curriculum reform.

#29 Process-Mining of Hypertraces: Enabling Scalable Formal Security Verification of (Automotive) Network Architectures

著者: Julius Figge, David Knuplesch, Andreas Maletti, Dragan Zuvic

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21606

要約:
The automotive domain is transitioning: vehicles act as rolling servers, persistently connected to numerous external entities. This connectivity, combined with rising on-board computing power for advanced driver assistance systems and similar use cases, creates escalating challenges for securing automotive network architectures. This work advances the security analysis of internet-connected automotive network architectures and their protocols. We introduce a strong, active adversary model tailored to the automotive domain. We substantially extend security protocol verification possible based on Attack Resilience Hyperproperties (ARHs) by introducing a verification-orchestration algorithm. Furthermore, we provide methods for comparative attribution of security property invalidations to specific, ne-grained component compromises. We present a novel integration of formal verification and process mining. By utilizing ARH counterexample traces for process mining, we systematically identify and aggregate attacker behavior that causes security property invalidations. This pipeline enables in-depth understanding of root causes and attack paths leading to protocol-security invalidations. We demonstrate real-world applicability through a prototype and case study on the secure transmission of battery management system data within an automotive network architecture.

#30 A-THENA: Early Intrusion Detection for IoT with Time-Aware Hybrid Encoding and Network-Specific Augmentation

著者: Ioannis Panopoulos, Maria Lamprini A. Bartsioka, Sokratis Nikolaidis, Stylianos I. Venieris, Dimitra I. Kaklamani, Iakovos S. Venieris

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21623

要約:
The proliferation of Internet of Things (IoT) devices has significantly expanded attack surfaces, making IoT ecosystems particularly susceptible to sophisticated cyber threats. To address this challenge, this work introduces A-THENA, a lightweight early intrusion detection system (EIDS) that significantly extends preliminary findings on time-aware encodings. A-THENA employs an advanced Transformer-based architecture augmented with a generalized Time-Aware Hybrid Encoding (THE), integrating packet timestamps to effectively capture temporal dynamics essential for accurate and early threat detection. The proposed system further employs a Network-Specific Augmentation (NA) pipeline, which enhances model robustness and generalization. We evaluate A-THENA on three benchmark IoT intrusion detection datasets-CICIoT23-WEB, MQTT-IoT-IDS2020, and IoTID20-where it consistently achieves strong performance. Averaged across all three datasets, it improves accuracy by 6.88 percentage points over the best-performing traditional positional encoding, 3.69 points over the strongest feature-based model, 6.17 points over the leading time-aware alternatives, and 5.11 points over related models, while achieving near-zero false alarms and false negatives. To assess real-world feasibility, we deploy A-THENA on the Raspberry Pi Zero 2 W, demonstrating its ability to perform real-time intrusion detection with minimal latency and memory usage. These results establish A-THENA as an agile, practical, and highly effective solution for securing IoT networks.

#31 On the Challenges of Holistic Intrusion Detection in ICS

著者: Stefan Lenz, Julia Raab, Benedikt Holzbach, Deniz K\"oller, Sotiris Michaelides, Martin Henze

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21626

要約:
Past attacks against industrial control systems (ICS) show that adversaries often target both the ICS network and the physical process to achieve potential catastrophic impact. To secure ICS, intrusion detection systems promise timely uncovering of such adversaries. However, as these detection mechanisms typically focus on isolated characteristics of ICS (e.g., packet timings), multiple detection systems have to be deployed in parallel, complicating their operation in practice. In this work, to spur discussion and further research, we present challenges encountered during our research towards a holistic intrusion detection system aiming to cover all dimensions of an ICS.

#32 A Sociotechnical, Practitioner-Centered Approach to Technology Adoption in Cybersecurity Operations: An LLM Case

著者: Francis Hahn, Mohd Mamoon, Alexandru G. Bardas, Michael Collins, Daniel Lende, Xinming Ou, S. Raj Rajagopalan

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21679

要約:
Technology for security operations centers (SOCs) has a storied history of slow adoption due to concerns about trust and reliability. These concerns are amplified with artificial intelligence, particularly large language models (LLMs), which exhibit issues such as hallucinations and inconsistent outputs. To assess whether LLM-based tools can improve SOC efficiency, we embedded two PhD researchers within a multinational company SOC for six months of ethnographic fieldwork. We identified recurring challenges, such as repetitive tasks, fragmented/unclear data, and tooling bottlenecks, and collaborated directly with practitioners to develop LLM companion tools aligned with their operational needs. Iterative refinement reduced workflow disruption and improved interpretability, leading from skepticism to sustained adoption. Ethnographic analysis indicates that this shift was enabled by our sociotechnical co-creation process consistent with Nonaka's SECI model. This framework explains the common challenges in traditional SOC technology adoption, including workflow misalignment, rigidity against evolving threats and internal requirements, and stagnation over time. Our findings show that the co-creation approach can overcome these old barriers and create a new paradigm for creating usable technology for cybersecurity operations.

#33 Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers

backdoor

著者: Jiali Wei, Ming Fan, Guoheng Sun, Xicheng Zhang, Haijun Wang, Ting Liu

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21700

要約:
The growing application of large language models (LLMs) in safety-critical domains has raised urgent concerns about their security. Many recent studies have demonstrated the feasibility of backdoor attacks against LLMs. However, existing methods suffer from three key shortcomings: explicit trigger patterns that compromise naturalness, unreliable injection of attacker-specified payloads in long-form generation, and incompletely specified threat models that obscure how backdoors are delivered and activated in practice. To address these gaps, we present BadStyle, a complete backdoor attack framework and pipeline. BadStyle leverages an LLM as a poisoned sample generator to construct natural and stealthy poisoned samples that carry imperceptible style-level triggers while preserving semantics and fluency. To stabilize payload injection during fine-tuning, we design an auxiliary target loss that reinforces the attacker-specified target content in responses to poisoned inputs and penalizes its emergence in benign responses. We further ground the attack in a realistic threat model and systematically evaluate BadStyle under both prompt-induced and PEFT-based injection strategies. Extensive experiments across seven victim LLMs, including LLaMA, Phi, DeepSeek, and GPT series, demonstrate that BadStyle achieves high attack success rates (ASRs) while maintaining strong stealthiness. The proposed auxiliary target loss substantially improves the stability of backdoor activation, yielding an average ASR improvement of around 30% across style-level triggers. Even in downstream deployment scenarios unknown during injection, the implanted backdoor remains effective. Moreover, BadStyle consistently evades representative input-level defenses and bypasses output-level defenses through simple camouflage.

#34 Adversarial Robustness of Near-Field Millimeter-Wave Imaging under Waveform-Domain Attacks

著者: Lhamo Dorje, Jordan Madden, Soamar Homsi, Xiaohua Li

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21774

要約:
Near-field millimeter-wave (mmWave) imaging is widely deployed in safety-critical applications such as airport passenger screening, yet its own security remains largely unexplored. This paper presents a systematic study of the adversarial robustness of mmWave imaging algorithms under waveform-domain physical attacks that directly manipulate the image reconstruction process. We propose a practical white-box adversarial model and develop a differential imaging attack framework that leverages the differentiable imaging pipeline to optimize attack waveforms. We also construct a real measured dataset of clean and attack waveforms using a mmWave imaging testbed. Experiments on 10 representative imaging algorithms show that mmWave imaging is highly vulnerable to such attacks, enabling an adversary to conceal or alter targets with moderate transmission power. Surprisingly, deep-learning-based imaging algorithms demonstrate higher robustness than classical algorithms. These findings expose critical security risks and motivate the development of robust and secure mmWave imaging systems.

#35 Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study

agent

著者: Zihan Wang, Rui Zhang, Yu Liu, Chi Liu, Qingchuan Zhao, Hongwei Li, Guowen Xu

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21829

要約:
LLM agents increasingly rely on skills to encapsulate reusable capabilities via progressively disclosed instructions. High-quality skills inject expert knowledge into general-purpose models, improving performance on specialized tasks. This quality and ease of dissemination drive the emergence of a skill economy: free skill marketplaces already report 90368 published skills, while paid marketplaces report more than 2000 listings and over $100,000 in creator earnings. Yet this growing marketplace also creates a new attack surface, as adversaries can interact with public agent to extract hidden proprietary skill content. We present the first empirical study of black-box skill stealing against LLM agent systems. To study this threat, we first derive an attack taxonomy from prior prompt-stealing methods and build an automated stealing prompt generation agent. This agent starts from model-generated seed prompts, expands them through scenario rationalization and structure injection, and enforces diversity via embedding filtering. This process yields a reproducible pipeline for evaluating agent systems. We evaluate such attacks across 3 commercial agent architectures and 5 LLMs. Our results show that agent skills can be extracted with only 3 interactions, posing a serious copyright risk. To mitigate this threat, we design defenses across three stages of the agent pipeline: input, inference, and output. Although these defenses achieve strong results, the attack remains inexpensive and readily automatable, allowing an adversary to launch repeated attempts with different variants; only one successful attempt is sufficient to compromise the protected skill. Overall, our findings suggest that these copyright risks are largely overlooked across proprietary agent ecosystems. We therefore advocate for more robust defense strategies that provide stronger protection guarantees.

#36 TraceScope: Interactive URL Triage via Decoupled Checklist Adjudication

著者: Haolin Zhang, William Reber, Yuxuan Zhang, Guofei Gu, Jeff Huang

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21840

要約:
Modern phishing campaigns increasingly evade snapshot-based URL classifiers using interaction gates (e.g., checkbox/slider challenges), delayed content rendering, and logo-less credential harvesters. This shifts URL triage from static classification toward an interactive forensics task: an analyst must actively navigate the page while isolating themselves from potential runtime exploits. We present TraceScope, a decoupled triage pipeline that operationalizes this workflow at scale. To prevent the observer effect and ensure safety, a sandboxed operator agent drives a real GUI browser guided by visual motivation to elicit page behavior, freezing the session into an immutable evidence bundle. Separately, an adjudicator agent circumvents LLM context limitations by querying evidence on demand to verify a MITRE ATT&CK checklist, and generates an audit-ready report with extracted indicators of compromise (IOCs) and a final verdict. Evaluated on 708 reachable URLs from existing dataset (241 verified phishing from PhishTank and 467 benign from Tranco-derived crawling), TraceScope achieves 0.94 precision and 0.78 recall, substantially improving recall over three prior visual/reference-based classifiers while producing reproducible, analyst-grade evidence suitable for review. More importantly, we manually curated a dataset of real-world phishing emails to evaluate our system in a practical setting. Our evaluation reveals that TraceScope demonstrates superior performance in a real-world scenario as well, successfully detecting sophisticated phishing attempts that current state-of-the-art defenses fail to identify.

#37 Cross-Modal Phantom: Coordinated Camera-LiDAR Spoofing Against Multi-Sensor Fusion in Autonomous Vehicles

著者: Shahriar Rahman Khan, Raiful Hasan

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21841

要約:
Autonomous Vehicles (AVs) increasingly depend on Multi-Sensor Fusion (MSF) to combine complementary modalities such as cameras and LiDAR for robust perception. While this redundancy is intended to safeguard against single-sensor failures, the fusion process itself introduces a subtle and underexplored vulnerability. In this work, we investigate whether an attacker can bypass MSF's redundancy by fabricating cross-sensor consistency, making multiple sensors agree on the same false object. We design a coordinated, data-level (early-fusion) attack that emulates the outcome of two synchronized physical spoofing sources: an infrared (IR) projection that induces a false camera detection and a LiDAR signal injection that produces a matching 3D point cluster. Rather than implementing the physical attack hardware, we simulate its sensor-level outcomes by inserting perspective-aware image patches and synthetic LiDAR point clusters aligned in 3D space. This approach preserves the perceptual effects that real IR and IEMI-based spoofing would create at the sensor output. Using 400 KITTI scenes, our large-scale evaluation shows that the coordinated spoofing deceives a state-of-the-art perception model with an 85.5% successful attack rate. These findings provide the first quantitative evidence that malicious cross-modal consistency can compromise MSF-based perception, revealing a critical vulnerability in the core data-fusion logic of modern autonomous vehicle systems.

#38 Transient Turn Injection: Exposing Stateless Multi-Turn Vulnerabilities in Large Language Models

著者: Naheed Rayhan, Sohely Jahan

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21860

要約:
Large language models (LLMs) are increasingly integrated into sensitive workflows, raising the stakes for adversarial robustness and safety. This paper introduces Transient Turn Injection(TTI), a new multi-turn attack technique that systematically exploits stateless moderation by distributing adversarial intent across isolated interactions. TTI leverages automated attacker agents powered by large language models to iteratively test and evade policy enforcement in both commercial and open-source LLMs, marking a departure from conventional jailbreak approaches that typically depend on maintaining persistent conversational context. Our extensive evaluation across state-of-the-art models-including those from OpenAI, Anthropic, Google Gemini, Meta, and prominent open-source alternatives-uncovers significant variations in resilience to TTI attacks, with only select architectures exhibiting substantial inherent robustness. Our automated blackbox evaluation framework also uncovers previously unknown model specific vulnerabilities and attack surface patterns, especially within medical and high stakes domains. We further compare TTI against established adversarial prompting methods and detail practical mitigation strategies, such as session level context aggregation and deep alignment approaches. Our study underscores the urgent need for holistic, context aware defenses and continuous adversarial testing to future proof LLM deployments against evolving multi-turn threats.

#39 CrossCommitVuln-Bench: A Dataset of Multi-Commit Python Vulnerabilities Invisible to Per-Commit Static Analysis

著者: Arunabh Majumdar

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21917

要約:
We present CrossCommitVuln-Bench, a curated benchmark of 15 real-world Python vulnerabilities (CVEs) in which the exploitable condition was introduced across multiple commits - each individually benign to per-commit static analysis - but collectively critical. We manually annotate each CVE with its contributing commit chain, a structured rationale for why each commit evades per-commit analysis, and baseline evaluations using Semgrep and Bandit in both per-commit and cumulative scanning modes. Our central finding: the per-commit detection rate (CCDR) is 13% across all 15 vulnerabilities - 87% of chains are invisible to per-commit SAST. Critically, both per-commit detections are qualitatively poor: one occurs on commits framed as security fixes (where developers suppress the alert), and the other detects only the minor hardcoded-key component while completely missing the primary vulnerability (200+ unprotected API endpoints). Even in cumulative mode (full codebase present), the detection rate is only 27%, confirming that snapshot-based SAST tools often miss vulnerabilities whose introduction spans multiple commits. The dataset, annotation schema, evaluation scripts, and reproducible baselines are released under open-source licenses to support research on cross-commit vulnerability detection.

#40 CRED-1: An Open Multi-Signal Domain Credibility Dataset for Automated Pre-Bunking of Online Misinformation

著者: Alexander Loth, Martin Kappes, Marc-Oliver Pahl

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.20856

要約:
This article presents CRED-1, an open, reproducible domain-level credibility dataset combining two openly-licensed source lists (OpenSources.co and Iffy.news) with four computed enrichment signals: domain age (WHOIS/RDAP), web popularity (Tranco Top-1M), fact-check frequency (Google Fact Check Tools API), and threat intelligence (Google Safe Browsing API). The dataset covers 2,672 domains categorized as fake, unreliable, mixed, conspiracy, or satire, each assigned a composite credibility score between 0.0 and 1.0. CRED-1 is designed for on-device deployment in privacy-preserving browser extensions to enable client-side pre-bunking of misinformation at the content delivery stage. The entire pipeline is implemented in Python using only standard library modules and is fully reproducible from publicly available sources. The dataset and pipeline code are released under CC~BY~4.0 and archived on Zenodo.

#41 Preserving Decision Sovereignty in Military AI: A Trade-Secret-Safe Architectural Framework for Model Replaceability, Human Authority, and State Control

著者: Peng Wei, Wesley Shu

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.20867

要約:
Recent events surrounding the relationship between frontier AI suppliers and national-security customers have made a structural problem newly visible: once a privately governed model becomes embedded in military workflows, the supplier can influence not only technical performance but also the operational boundary conditions under which the system may be used. This paper argues that the central strategic issue is not merely access to capable models, but preservation of decision sovereignty: the state's ability to retain authority over decision policy, version control, fallback behavior, auditability, and final action approval even when analytical modules are sourced from commercial vendors. Using the public Anthropic--Pentagon dispute of 2026, the broader history of Project Maven, and recent U.S., NATO, U.K., and intelligence-community guidance as a motivating context, the paper develops a trade-secret-safe architectural formulation of the Energetic Paradigm as a layered, model-agnostic command-support design. In this formulation, supplier models remain replaceable analytical components, while routing, constraints, logging, escalation, and action authorization remain state-owned functions. The paper contributes three things: a definition of decision sovereignty for military AI; a threat model for supplier-induced boundary control; and a public architectural specification showing how model replaceability, human authority, and sovereign orchestration can reduce strategic dependency without requiring disclosure of proprietary implementation details. The argument is conceptual rather than experimental, but it yields concrete implications for procurement, governance, and alliance interoperability.

#42 Differentially Private Model Merging

privacy

著者: Qichuan Yin, Manzil Zaheer, Tian Li

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.20985

要約:
In machine learning applications, privacy requirements during inference or deployment time could change constantly due to varying policies, regulations, or user experience. In this work, we aim to generate a magnitude of models to satisfy any target differential privacy (DP) requirement without additional training steps, given a set of existing models trained on the same dataset with different privacy/utility tradeoffs. We propose two post processing techniques, namely random selection and linear combination, to output a final private model for any target privacy parameter. We provide privacy accounting of these approaches from the lens of R'enyi DP and privacy loss distributions for general problems. In a case study on private mean estimation, we fully characterize the privacy/utility results and theoretically establish the superiority of linear combination over random selection. Empirically, we validate our approach and analyses on several models and both synthetic and real-world datasets.

#43 Residual Risk Analysis in Benign Code: How Far Are We? A Multi-Model Semantic and Structural Similarity Approach

著者: Mohammad Farhad, Shuvalaxmi Dass

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21051

要約:
Software security relies on effective vulnerability detection and patching, yet determining whether a patch fully eliminates risk remains an underexplored challenge. Existing vulnerability benchmarks often treat patched functions as inherently benign, overlooking the possibility of residual security risks. In this work, we analyze vulnerable-benign function pairs from the PrimeVul, a benchmark dataset using multiple code language models (Code LMs) to capture semantic similarity, complemented by Tree-sitter-based abstract syntax tree (AST) analysis for structural similarity. Building on these signals, we propose Residual Risk Scoring (RRS), a unified framework that integrates embedding-based semantic similarity, localized AST-based structural similarity, and cross-model agreement to estimate residual risk in code. Our analysis shows that benign functions often remain highly similar to their vulnerable counterparts both semantically and structurally, indicating potential persistence of residual risk. We further find that approximately $61\%$ of high-RRS code pairs exhibit $13$ distinct categories of residual issues (e.g., null pointer dereferences, unsafe memory allocation), validated using state-of-the-art static analysis tools including Cppcheck, Clang-Tidy, and Facebook-Infer. These results demonstrate that code-level similarity provides a practical signal for prioritizing post-patch inspection, enabling more reliable and scalable security assessment in real-world open-source software pipelines.

#44 A Ground-Truth-Based Evaluation of Vulnerability Detection Across Multiple Ecosystems

著者: Peter Mandl, Paul Mandl, Martin H\"ausl, Maximilian Auch

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.21111

要約:
Automated vulnerability detection tools are widely used to identify security vulnerabilities in software dependencies. However, the evaluation of such tools remains challenging due to the heterogeneous structure of vulnerability data sources, inconsistent identifier schemes, and ambiguities in version range specifications. In this paper, we present an empirical evaluation of vulnerability detection across multiple software ecosystems using a curated ground-truth dataset derived from the Open Source Vulnerabilities (OSV) database. The dataset explicitly maps vulnerabilities to concrete package versions and enables a systematic comparison of detection results across different tools and services. Since vulnerability databases such as OSV are continuously updated, the dataset used in this study represents a snapshot of the vulnerability landscape at the time of the evaluation. To support reproducibility and future studies, we provide an open-source tool that automatically reconstructs the dataset from the current OSV database using the methodology described in this paper. Our evaluation highlights systematic differences between vulnerability detection systems and demonstrates the importance of transparent dataset construction for reproducible empirical security research.

#45 SafeTrans: LLM-assisted Transpilation from C to Rust

著者: Muhammad Farrukh, Baris Coskun, Tapti Palit, Michalis Polychronakis

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2505.10708

要約:
Rust is a strong contender for a memory-safe alternative to C as a "systems" language, but porting the vast amount of existing C code to Rust remains daunting. In this paper, we evaluate the potential of large language models (LLMs) to automate the transpilation of C code to idiomatic Rust. We present SafeTrans, a generic framework that leverages LLMs to i) transpile C code into Rust, and ii) iteratively repair compilation and runtime errors. A key novelty of our approach is a few-shot guided repair technique for translation errors, which provides contextual information and example code snippets for specific error types, guiding the LLM toward the correct solution. Another novel aspect of our work is the evaluation of the security implications of the transpilation process, showing how some vulnerability classes in C persist in the translated Rust code. SafeTrans was evaluated with six leading LLMs on 2,653 C programs and two real-world C projects. Our results show that iterative repair improves the rate of successful translations from 54% to 80% for the best-performing LLM (gpt-4o).

#46 Incentivizing Collaboration for Detection of Credential Database Breaches

著者: Mridu Nanda, Michael K. Reiter

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2506.04634

要約:
Decoy passwords, or ``honeywords,'' alert a site to its breach if entered in a login attempt on that site. However, an attacker can identify a user-chosen password from among the decoys, without alerting the site to its breach, via credential stuffing, i.e., entering the stolen passwords at another site where a user reused her password. Prior work thus proposed that sites monitor for the entry of their honeywords at other sites, but the incentives for sites to participate in this monitoring remain unclear. In this paper, we propose and evaluate an algorithm by which sites can exchange monitoring favors. Through a model-checking analysis, we show that a site can improve its ability to detect its own breach when it increases the monitoring effort it expends for others. We quantify how key parameters impact detection effectiveness and their implications for deploying a monitoring ecosystem. Finally, we evaluate our algorithm on a breached credential dataset, demonstrating effectiveness at scale.

#47 Unveiling Unicode's Unseen Underpinnings in Undermining Authorship Attribution

著者: Robert Dilworth

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2508.15840

要約:
When using a public communication channel--whether formal or informal, such as commenting or posting on social media--end users have no expectation of privacy: they compose a message and broadcast it for the world to see. Even if an end user takes utmost precautions to anonymize their online presence--using an alias or pseudonym; masking their IP address; spoofing their geolocation; concealing their operating system and user agent; deploying encryption; registering with a disposable phone number or email; disabling non-essential settings; revoking permissions; and blocking cookies and fingerprinting--one obvious element still lingers: the message itself. Assuming they avoid lapses in judgment or accidental self-exposure, there should be little evidence to validate their actual identity, right? Wrong. The content of their message--necessarily open for public consumption--exposes an attack vector: stylometric analysis, or author profiling. In this paper, we dissect the technique of stylometry, discuss an antithetical counter-strategy in adversarial stylometry, and devise enhancements through Unicode steganography.

#48 A Comprehensive Guide to Differential Privacy: From Theory to User Expectations

privacy

著者: Napsu Karmitsa, Antti Airola, Tapio Pahikkala, Tinja Pitk\"am\"aki

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2509.03294

要約:
The increasing availability of personal data has enabled significant advances in fields such as machine learning, healthcare, and cybersecurity. However, this data abundance also raises serious privacy concerns, especially in light of powerful re-identification attacks and growing legal and ethical demands for responsible data use. Differential privacy (DP) has emerged as a principled, mathematically grounded framework for mitigating these risks. This review provides a comprehensive survey of DP, covering its theoretical foundations, practical mechanisms, and real-world applications. It explores key algorithmic tools and domain-specific challenges - particularly in privacy-preserving machine learning and synthetic data generation. The report also highlights usability issues and the need for improved communication and transparency in DP systems. Overall, the goal is to support informed adoption of DP by researchers and practitioners navigating the evolving landscape of data privacy.

#49 Privacy-Preserving Semantic Communication over Wiretap Channels with Learnable Differential Privacy

privacy

著者: Weixuan Chen, Qianqian Yang, Shuo Shao, Shunpu Tang, Zhiguo Shi, Shui Yu

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2510.23274

要約:
While semantic communication (SemCom) improves transmission efficiency by focusing on task-relevant information, it also raises critical privacy concerns. Many existing secure SemCom approaches rely on restrictive or impractical assumptions, such as favorable channel conditions for the legitimate user or prior knowledge of the eavesdropper's model. To address these limitations, this paper proposes a novel secure SemCom framework for image transmission over wiretap channels, leveraging differential privacy (DP) to provide approximate privacy guarantees. Specifically, our approach first extracts disentangled semantic representations from source images using generative adversarial network (GAN) inversion method, and then selectively perturbs private semantic representations with approximate DP noise. Distinct from conventional DP-based protection methods, we introduce DP noise with learnable pattern, instead of traditional white Gaussian or Laplace noise, achieved through adversarial training of neural networks (NNs). This design mitigates the inherent non-invertibility of DP while effectively protecting private information. Moreover, it enables explicitly controllable security levels by adjusting the privacy budget according to specific security requirements, which is not achieved in most existing secure SemCom approaches. Experimental results demonstrate that, compared with the previous DP-based method and direct transmission, the proposed method significantly degrades the reconstruction quality for the eavesdropper, while introducing only slight degradation in task performance. Under comparable security levels, our approach achieves an LPIPS advantage of 0.06-0.29 and an FPPSR advantage of 0.10-0.86 for the legitimate user compared with the previous DP-based method.

#50 Tuning for TraceTarnish: Techniques, Trends, and Testing Tangible Traits

著者: Robert Dilworth

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2512.03465

要約:
In this study, we more rigorously evaluated our attack script $\textit{TraceTarnish}$, which leverages adversarial stylometry principles to anonymize the authorship of text-based messages. To ensure the efficacy and utility of our attack, we sourced, processed, and analyzed Reddit comments -- comments that were later alchemized into $\textit{TraceTarnish}$ data -- to gain valuable insights. The transformed $\textit{TraceTarnish}$ data was then further augmented by $\textit{StyloMetrix}$ to manufacture stylometric features -- features that were culled using the Information Gain criterion, leaving only the most informative, predictive, and discriminative ones. Our results found that function words and function word types ($L\_FUNC\_A$ $\&$ $L\_FUNC\_T$); content words and content word types ($L\_CONT\_A$ $\&$ $L\_CONT\_T$); and the Type-Token Ratio ($ST\_TYPE\_TOKEN\_RATIO\_LEMMAS$) yielded significant Information-Gain readings. The identified stylometric cues -- function-word frequencies, content-word distributions, and the Type-Token Ratio -- serve as reliable indicators of compromise (IoCs), revealing when a text has been deliberately altered to mask its true author. Similarly, these features could function as forensic beacons, alerting defenders to the presence of an adversarial stylometry attack; granted, in the absence of the original message, this signal may go largely unnoticed, as it appears to depend on a pre- and post-transformation comparison. "In trying to erase a trace, you often imprint a larger one." Armed with this understanding, we framed $\textit{TraceTarnish}$'s operations and outputs around these five isolated features, using them to conceptualize and implement enhancements that further strengthen the attack.

#51 StegoStylo: Squelching Stylometric Scrutiny through Steganographic Stitching

著者: Robert Dilworth

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2601.09056

要約:
Stylometry--the identification of an author through analysis of a text's style (i.e., authorship attribution)--serves many constructive purposes: it supports copyright and plagiarism investigations, aids detection of harmful content, offers exploratory cues for certain medical conditions (e.g., early signs of dementia or depression), provides historical context for literary works, and helps uncover misinformation and disinformation. In contrast, when stylometry is employed as a tool for authorship verification--confirming whether a text truly originates from a claimed author--it can also be weaponized for malicious purposes. Techniques such as de-anonymization, re-identification, tracking, profiling, and downstream effects like censorship illustrate the privacy threats that stylometric analysis can enable. Building on these concerns, this paper further explores how adversarial stylometry combined with steganography can counteract stylometric analysis. We first present enhancements to our adversarial attack, $\textit{TraceTarnish}$, providing stronger evidence of its capacity to confound stylometric systems and reduce their attribution and verification accuracy. Next, we examine how steganographic embedding can be fine-tuned to mask an author's stylistic fingerprint, quantifying the level of authorship obfuscation achievable as a function of the proportion of words altered with zero-width Unicode characters. Based on our findings, steganographic coverage of 33% or higher seemingly ensures authorship obfuscation. Finally, we reflect on the ways stylometry can be used to undermine privacy and argue for the necessity of defensive tools like $\textit{TraceTarnish}$.

#52 Intent Laundering: AI Safety Datasets Are Not What They Seem

著者: Shahriar Golchin, Marc Wetter

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2602.16729

要約:
We systematically evaluate the quality of widely used adversarial safety datasets from two perspectives: in isolation and in practice. In isolation, we examine how well these datasets reflect real-world adversarial attacks based on three defining properties: being driven by ulterior intent, well-crafted, and out-of-distribution. We find that these datasets overrely on "triggering cues": words or phrases with overt negative/sensitive connotations that are intended to trigger safety mechanisms explicitly, which is unrealistic compared to real-world attacks. In practice, we evaluate whether these datasets genuinely measure safety risks or merely provoke refusals through triggering cues. To explore this, we introduce "intent laundering": a procedure that abstracts away triggering cues from adversarial attacks (data points) while strictly preserving their malicious intent and all relevant details. Our results show that current adversarial safety datasets fail to faithfully represent real-world adversarial behavior due to their overreliance on triggering cues. Once these cues are removed, all previously evaluated "reasonably safe" models become unsafe, including Gemini 3 Pro and Claude Sonnet 3.7/4. Moreover, when intent laundering is adapted as a jailbreaking technique, it consistently achieves high attack success rates, ranging from 90.00% to 100.00%, under fully black-box access. Overall, our findings expose a significant disconnect between how existing datasets evaluate model safety and how real-world adversaries behave.

#53 ATLAS: AI-Assisted Threat-to-Assertion Learning for System-on-Chip Security Verification

著者: Ishraq Tashdid, Kimia Tasnia, Alexander Garcia, Jonathan Valamehr, Sazadur Rahman

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.01170

要約:
This work presents ATLAS, an LLM-driven framework that bridges standardized threat modeling and property-based formal verification for System-on-Chip (SoC) security. Starting from vulnerability knowledge bases such as Common Weakness Enumeration (CWE), ATLAS identifies SoC-specific assets, maps relevant weaknesses, and generates assertion-based security properties and JasperGold scripts for verification. By combining asset-centric analysis with standardized threat model templates and multi-source SoC context, ATLAS automates the transformation from vulnerability reasoning to formal proof. Evaluated on three HACK@DAC benchmarks, ATLAS detected 39/48 CWEs and generated correct properties for 33 of those bugs, advancing automated, knowledge-driven SoC security verification toward a secure-by-design paradigm.

#54 Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models

著者: Rui Yang Tan, Yujia Hu, Roy Ka-Wei Lee

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.21697

要約:
Multimodal Large Language Models (MLLMs) extend text-only LLMs with visual reasoning, but also introduce new safety failure modes under visually grounded instructions. We study comic-template jailbreaks that embed harmful goals inside simple three-panel visual narratives and prompt the model to role-play and "complete the comic." Building on JailbreakBench and JailbreakV, we introduce ComicJailbreak, a comic-based jailbreak benchmark with 1,167 attack instances spanning 10 harm categories and 5 task setups. Across 15 state-of-the-art MLLMs (six commercial and nine open-source), comic-based attacks achieve success rates comparable to strong rule-based jailbreaks and substantially outperform plain-text and random-image baselines, with ensemble success rates exceeding 90% on several commercial models. Then, with the existing defense methodologies, we show that these methods are effective against the harmful comics, they will induce a high refusal rate when prompted with benign prompts. Finally, using automatic judging and targeted human evaluation, we show that current safety evaluators can be unreliable on sensitive but non-harmful content. Our findings highlight the need for safety alignment robust to narrative-driven multimodal jailbreaks.

#55 Toward a Multi-Layer ML-Based Security Framework for Industrial IoT

著者: Aymen Bouferroum (LIA), Valeria Loscri (LIA), Abderrahim Benslimane (LIA)

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.24111

要約:
The Industrial Internet of Things (IIoT) introduces significant security challenges as resource-constrained devices become increasingly integrated into critical industrial processes. Existing security approaches typically address threats at a single network layer, often relying on expensive hardware and remaining confined to simulation environments. In this paper, we present the research framework and contributions of our doctoral thesis, which aims to develop a lightweight, Machine Learning (ML)-based security framework for IIoT environments. We first describe our adoption of the Tm-IIoT trust model and the Hybrid IIoT (H-IIoT) architecture as foundational baselines, then introduce the Trust Convergence Acceleration (TCA) approach, our primary contribution that integrates ML to predict and mitigate the impact of degraded network conditions on trust convergence, achieving up to a 28.6% reduction in convergence time while maintaining robustness against adversarial behaviors. We then propose a real-world deployment architecture based on affordable, open-source hardware, designed to implement and extend the security framework. Finally, we outline our ongoing research toward multi-layer attack detection, including physical-layer threat identification and considerations for robustness against adversarial ML attacks.

#56 Hijacking Text Heritage: Hiding the Human Signature through Homoglyphic Substitution

著者: Robert Dilworth

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.10271

要約:
In what way could a data breach involving government-issued IDs such as passports, driver's licenses, etc., rival a random voluntary disclosure on a nondescript social-media platform? At first glance, the former appears more significant, and that is a valid assessment. The disclosed data could contain an individual's date of birth and address; for all intents and purposes, a leak of that data would be disastrous. Given the threat, the latter scenario involving an innocuous online post seems comparatively harmless--or does it? From that post and others like it, a forensic linguist could stylometrically uncover equivalent pieces of information, estimating an age range for the author (adolescent or adult) and narrowing down their geographical location (specific country). While not an exact science--the determinations are statistical--stylometry can reveal comparable, though noticeably diluted, information about an individual. To prevent an ID from being breached, simply sharing it as little as possible suffices. Preventing the leakage of personal information from written text requires a more complex solution: adversarial stylometry. In this paper, we explore how performing homoglyph substitution--the replacement of characters with visually similar alternatives (e.g., "h" $\texttt{[U+0068]}$ $\rightarrow$ "h" $\texttt{[U+04BB]}$)--on text can degrade stylometric systems.

#57 LogicEval: A Systematic Framework for Evaluating Automated Repair Techniques for Logical Vulnerabilities in Real-World Software

著者: Syed Md Mukit Rashid, Abdullah Al Ishtiaq, Kai Tu, Yilu Dong, Tianwei Wu, Ali Ranjbar, Tianchang Yang, Najrin Sultana, Shagufta Mehnaz, Syed Rafiul Hussain

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.12994

要約:
Logical vulnerabilities in software stem from flaws in program logic rather than memory safety, which can lead to critical security failures. Although existing automated program repair techniques primarily focus on repairing memory corruption vulnerabilities, they struggle with logical vulnerabilities because of their limited semantic understanding of the vulnerable code and its expected behavior. On the other hand, recent successes of large language models (LLMs) in understanding and repairing code are promising. However, no framework currently exists to analyze the capabilities and limitations of such techniques for logical vulnerabilities. We aim to systematically evaluate both traditional and LLM based repair approaches for addressing real world logical vulnerabilities. To facilitate our assessment, we created the first ever dataset, LogicDS, comprising 122 logical vulnerabilities that reflect tangible security impact. We also developed a systematic framework, LogicEval, to evaluate patches for logical vulnerabilities. Evaluations suggest that compilation and testing failures are primarily driven by prompt sensitivity, loss of code context, and difficulty in patch localization.

#58 Privacy, Prediction, and Allocation

privacy

著者: Ben Jacobsen, Nitin Kohli

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.15596

要約:
Algorithmic predictions are increasingly used to inform the allocation of scarce resources. The promise of these methods is that, through machine learning, they can better identify the people who would benefit most from interventions. Recently, however, several works have called this assumption into question by demonstrating the existence of settings where simple, unit-level allocation strategies can meet or even exceed the performance of those based on individual-level targeting. Separately, other works have objected to individual-level targeting on privacy grounds, leading to an unusual situation where a single solution, unit-level targeting, is recommended for reasons of both privacy and utility. Motivated by the desire to fully understand the interplay of privacy and targeting levels, we initiate the study of aid allocation systems that satisfy differential privacy, synthesizing existing works on private optimization with the economic models of aid allocation used in the non-private literature. To this end, we investigate private variants of both individual and unit-level allocation strategies in both stochastic and distribution-free settings under a range of constraints on data availability. Through this analysis, we provide clean, interpretable bounds characterizing the tradeoffs between privacy, efficiency, and targeting precision in allocation.

#59 Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps

agent

著者: Alankrit Chona, Igor Kozlov, Ambuj Kumar

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.19533

要約:
We introduce the Cyber Defense Benchmark, a benchmark for measuring how well large language model (LLM) agents perform the core SOC analyst task of threat hunting: given a database of raw Windows event logs with no guided questions or hints, identify the exact timestamps of malicious events. The benchmark wraps 106 real attack procedures from the OTRF Security-Datasets corpus - spanning 86 MITRE ATT&CK sub-techniques across 12 tactics - into a Gymnasium reinforcement-learning environment. Each episode presents the agent with an in-memory SQLite database of 75,000-135,000 log records produced by a deterministic campaign simulator that time-shifts and entity-obfuscates the raw recordings. The agent must iteratively submit SQL queries to discover malicious event timestamps and explicitly flag them, scored CTF-style against Sigma-rule-derived ground truth. Evaluating five frontier models - Claude Opus 4.6, GPT-5, Gemini 3.1 Pro, Kimi K2.5, and Gemini 3 Flash - on 26 campaigns covering 105 of 106 procedures, we find that all models fail dramatically: the best model (Claude Opus 4.6) submits correct flags for only 3.8% of malicious events on average, and no run across any model ever finds all flags. We define a passing score as >= 50% recall on every ATT&CK tactic - the minimum bar for unsupervised SOC deployment. No model passes: the leader clears this bar on 5 of 13 tactics and the remaining four on zero. These results suggest that current LLMs are poorly suited for open-ended, evidence-driven threat hunting despite strong performance on curated Q&A security benchmarks.

#60 Statistical Inference for Privatized Data with Unknown Sample Size

著者: Jordan Awan, Andres Felipe Barrientos, Nianqiao Ju

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2406.06231

要約:
We develop both theory and algorithms to analyze privatized data in unbounded differential privacy (DP), where even the sample size is considered a sensitive quantity that requires privacy protection. We show that the distance between the sampling distributions under unbounded DP and bounded DP goes to zero as the sample size $n$ goes to infinity, provided that the noise used to privatize $n$ is at an appropriate rate; we also establish that Approximate Bayesian Computation (ABC)-type posterior distributions converge under similar assumptions. We further give asymptotic results in the regime where the privacy budget for $n$ goes to infinity, establishing similarity of sampling distributions as well as showing that the MLE in the unbounded setting converges to the bounded-DP MLE. To facilitate valid, finite-sample Bayesian inference on privatized data under unbounded DP, we propose a reversible jump MCMC algorithm which extends the data augmentation MCMC of Ju et al, (2022). We also propose a Monte Carlo EM algorithm to compute the MLE from privatized data in both bounded and unbounded DP. We apply our methodology to analyze a linear regression model as well as a 2019 American Time Use Survey Microdata File which we model using a Dirichlet distribution.

#61 Secure LLM Fine-Tuning via Safety-Aware Probing

著者: Chengcan Wu, Zhixin Zhang, Zeming Wei, Yihao Zhang, Xiaokun Luan, Meng Sun

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2505.16737

要約:
Large language models (LLMs) have achieved remarkable success across many applications, but their ability to generate harmful content raises serious safety concerns. Although safety alignment techniques are often applied during pre-training or post-training, recent studies show that subsequent fine-tuning on adversarial or even benign data can still compromise model safety. In this paper, we revisit the fundamental question of why fine-tuning on non-harmful data may nevertheless degrade safety. We show that the safety and task-performance loss landscapes are partially decoupled, so updates that improve task-specific performance may still move the model toward unsafe regions. Based on this insight, we propose a safety-aware probing (SAP) optimization framework for mitigating safety risks during fine-tuning. Concretely, SAP uses contrastive safety signals to locate safety-correlated directions, and optimizes a lightweight probe that perturbs hidden-state propagation during fine-tuning, thereby steering parameter updates away from harmful trajectories while preserving task-specific learning. Extensive experiments show that SAP consistently improves the safety--utility tradeoff across multiple models and tasks. Averaged over multiple LLMs, SAP reduces the harmful score significantly relative to standard fine-tuning, outperforming strong baselines while maintaining competitive task-specific performance. SAP also demonstrates stronger robustness under harmful data poisoning, adversarial fine-tuning, and a dedicated post-fine-tuning adaptive attack, validating that SAP is an effective and scalable framework for preserving LLM safety during fine-tuning. Our code is available at https://github.com/ChengcanWu/SAP.

#62 Focus on What Matters: Fisher-Guided Adaptive Multimodal Fusion for Vulnerability Detection

著者: Yun Bian, Yi Chen, HaiQuan Wang, ShiHao Li, Zhe Cui

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2601.02438

要約:
Software vulnerability detection can be formulated as a binary classification problem that determines whether a given code snippet contains security defects. Existing multimodal methods typically fuse Natural Code Sequence (NCS) representations extracted by pretrained models with Code Property Graph (CPG) representations extracted by graph neural networks, under the implicit assumption that introducing an additional modality necessarily yields information gain. Through empirical analysis, we demonstrate the limitations of this assumption: pretrained models already encode substantial structural information implicitly, leading to strong overlap between the two modalities; moreover, graph encoders are generally less effective than pretrained language models in feature extraction. As a result, naive fusion not only struggles to obtain complementary signals but can also dilute effective discriminative cues due to noise propagation. To address these challenges, we propose a task-conditioned complementary fusion strategy that uses Fisher information to quantify task relevance, transforming cross-modal interaction from full-spectrum matching into selective fusion within a task-sensitive subspace. Our theoretical analysis shows that, under an isotropic perturbation assumption, this strategy significantly tightens the upper bound on the output error. Based on this insight, we design the TaCCS-DFA framework, which combines online low-rank Fisher subspace estimation with an adaptive gating mechanism to enable efficient task-oriented fusion. Experiments on the BigVul, Devign, and ReVeal benchmarks demonstrate that TaCCS-DFA delivers up to a 6.3-point gain in F1 score with only a 3.4% increase in inference latency, while maintaining low calibration error.

#63 How Generative AI Empowers Attackers and Defenders Across the Trust & Safety Landscape

著者: Patrick Gage Kelley, Steven Rousso-Schindler, Renee Shelby, Kurt Thomas, Allison Woodruff

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2601.06033

要約:
Generative AI (GenAI) is a powerful technology poised to reshape Trust & Safety. While misuse by attackers is a growing concern, its defensive capacity remains underexplored. This paper examines these effects through a qualitative study with 43 Trust & Safety experts across five domains: child safety, election integrity, hate and harassment, scams, and violent extremism. Our findings characterize a landscape in which GenAI empowers both attackers and defenders. GenAI dramatically increases the scale and speed of attacks, lowering the barrier to entry for creating harmful content, including sophisticated propaganda and deepfakes. Conversely, defenders envision leveraging GenAI to detect and mitigate harmful content at scale, conduct investigations, deploy persuasive counternarratives, improve moderator wellbeing, and offer user support. This work provides a strategic framework for understanding GenAI's impact on Trust & Safety and charts a path for its responsible use in creating safer online environments.

#64 Measuring and Exploiting Contextual Bias in LLM-Assisted Security Code Review

著者: Dimitris Mitropoulos, Nikolaos Alexopoulos, Georgios Alexopoulos, Diomidis Spinellis

公開日: Fri, 24 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.18740

要約:
Automated Code Review (ACR) systems integrating Large Language Models (LLMs) are increasingly adopted in software development workflows, ranging from interactive assistants to autonomous agents in CI/CD pipelines. In this paper, we study how LLM-based vulnerability detection in ACR is affected by the framing effect: the tendency to let the presentation of information override its semantic content in forming judgments. We examine whether adversaries can exploit this through contextual-bias injection: crafting PR metadata to bias ACR security judgments as a supply-chain attack vector against real-world ACR pipelines. To this end, we first conduct a large-scale exploratory study across 6 LLMs under five framing conditions, establishing the framing effect as a systematic and widespread phenomenon in LLM-based vulnerability detection, with bug-free framing producing the strongest effect. We then design a realistic and controlled experimental environment, evaluating 17 CVEs across 10 real-world projects, to assess the susceptibility of real-world ACR pipelines to vulnerability reintroduction attacks. We employ two attack strategies: a template-based attack inspired by prior related work, and a novel LLM-assisted iterative refinement attack. We find that template-based attacks are ineffective and may even backfire, as direct biasing attempts raise suspicions. Our iterative refinement attack, on the other hand, achieves 100% success, exploiting a fundamental asymmetry: attackers can iteratively refine attacks against a local clone of the review pipeline, while defenders have only one chance to detect them. Debiasing via metadata redaction and explicit instructions restores detection in all affected cases. Overall, our findings highlight the dangers of over-relying on ACR and stress the importance of human oversight and contributor trust in the development process.

cs.CR updates on arXiv.org

📋 論文タイトル一覧