arXiv論文一覧 - cs.CR updates on arXiv.org

#1 How Effective Are Publicly Accessible Deepfake Detection Tools? A Comparative Evaluation of Open-Source and Free-to-Use Platforms

著者: Michael Rettinger, Ben Beaumont, Nhien-An Le-Khac, Hong-Hanh Nguyen-Le

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.04456

要約:
The proliferation of deepfake imagery poses escalating challenges for practitioners tasked with verifying digital media authenticity. While detection algorithm research is abundant, empirical evaluations of publicly accessible tools that practitioners actually use remain scarce. This paper presents the first cross-paradigm evaluation of six tools, spanning two complementary detection approaches: forensic analysis tools (InVID \& WeVerify, FotoForensics, Forensically) and AI-based classifiers (DecopyAI, FaceOnLive, Bitmind). Both tool categories were evaluated by professional investigators with law enforcement experience using blinded protocols across datasets comprising authentic, tampered, and AI-generated images sourced from DF40, CelebDF, and CASIA-v2. We report three principal findings: forensic tools exhibit high recall but poor specificity, while AI classifiers demonstrate the inverse pattern; human evaluators substantially outperform all automated tools; and human-AI disagreement is asymmetric, with human judgment prevailing in the vast majority of discordant cases. We discuss implications for practitioner workflows and identify critical gaps in current detection capabilities.

#2 Benchmark of Benchmarks: Unpacking Influence and Code Repository Quality in LLM Safety Benchmarks

著者: Junjie Chu, Xinyue Shen, Ye Leng, Michael Backes, Yun Shen, Yang Zhang

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.04459

要約:
The rapid growth of research in LLM safety makes it hard to track all advances. Benchmarks are therefore crucial for capturing key trends and enabling systematic comparisons. Yet, it remains unclear why certain benchmarks gain prominence, and no systematic assessment has been conducted on their academic influence or code quality. This paper fills this gap by presenting the first multi-dimensional evaluation of the influence (based on five metrics) and code quality (based on both automated and human assessment) on LLM safety benchmarks, analyzing 31 benchmarks and 382 non-benchmarks across prompt injection, jailbreak, and hallucination. We find that benchmark papers show no significant advantage in academic influence (e.g., citation count and density) over non-benchmark papers. We uncover a key misalignment: while author prominence correlates with paper influence, neither author prominence nor paper influence shows a significant correlation with code quality. Our results also indicate substantial room for improvement in code and supplementary materials: only 39% of repositories are ready-to-use, 16% include flawless installation guides, and a mere 6% address ethical considerations. Given that the work of prominent researchers tends to attract greater attention, they need to lead the effort in setting higher standards.

#3 Beyond Input Guardrails: Reconstructing Cross-Agent Semantic Flows for Execution-Aware Attack Detection

agent

著者: Yangyang Wei, Yijie Xu, Zhenyuan Li, Xiangmin Shen, Shouling Ji

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.04469

要約:
Multi-Agent System is emerging as the \textit{de facto} standard for complex task orchestration. However, its reliance on autonomous execution and unstructured inter-agent communication introduces severe risks, such as indirect prompt injection, that easily circumvent conventional input guardrails. To address this, we propose \SysName, a framework that shifts the defensive paradigm from static input filtering to execution-aware analysis. By extracting and reconstructing Cross-Agent Semantic Flows, \SysName synthesizes fragmented operational primitives into contiguous behavioral trajectories, enabling a holistic view of system activity. We leverage a Supervisor LLM to scrutinize these trajectories, identifying anomalies across data flow violations, control flow deviations, and intent inconsistencies. Empirical evaluations demonstrate that \SysName effectively detects over ten distinct compound attack vectors, achieving F1-scores of 85.3\% and 66.7\% for node-level and path-level end-to-end attack detection, respectively. The source code is available at https://anonymous.4open.science/r/MAScope-71DC.

#4 Impact of 5G SA Logical Vulnerabilities on UAV Communications: Threat Models and Testbed Evaluation

著者: Wagner Comin Sonaglio, \'Agney Lopes Roth Ferraz, Louren\c{c}o Alves Pereira J\'unior

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.04662

要約:
This paper examines how logical vulnerabilities in 5G Standalone networks affect UAV command and control communication. The study looks at three attacker positions in the architecture: a malicious user equipment (UE) connected to the same logical network as the UAV, an attacker with access to the 5G core, and a compromised gNodeB. To test these scenarios, a testbed was created using Open5GS, UERANSIM, and Kubernetes. The setup simulates a UAV-GCS communication system over a 5G SA network and allows for controlled attacks on various network interfaces. The experiments reveal that attacks at different points in the architecture can disrupt UAV operations. These disruptions include manipulating control commands and terminating data sessions. The findings emphasize the need for isolation measures in the 5G user plane and integrity protection in UAV command protocols.

#5 When Denoising Becomes Unsigning: Theoretical and Empirical Analysis of Watermark Fragility Under Diffusion-Based Image Editing

intellectual propertydiffusion

著者: Fai Gu, Qiyu Tang, Te Wen, Emily Davis, Finn Carter

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.04696

要約:
Robust invisible watermarking systems aim to embed imperceptible payloads that remain decodable after common post-processing such as JPEG compression, cropping, and additive noise. In parallel, diffusion-based image editing has rapidly matured into a default transformation layer for modern content pipelines, enabling instruction-based editing, object insertion and composition, and interactive geometric manipulation. This paper studies a subtle but increasingly consequential interaction between these trends: diffusion-based editing procedures may unintentionally compromise, and in extreme cases practically bypass, robust watermarking mechanisms that were explicitly engineered to survive conventional distortions. We develop a unified view of diffusion editors that (i) inject substantial Gaussian noise in a latent space and (ii) project back to the natural image manifold via learned denoising dynamics. Under this view, watermark payloads behave as low-energy, high-frequency signals that are systematically attenuated by the forward diffusion step and then treated as nuisance variation by the reverse generative process. We formalize this degradation using information-theoretic tools, proving that for broad classes of pixel-level watermark encoders/decoders the mutual information between the watermark payload and the edited output decays toward zero as the editing strength increases, yielding decoding error close to random guessing. We complement the theory with a realistic hypothetical experimental protocol and tables spanning representative watermarking methods and representative diffusion editors. Finally, we discuss ethical implications, responsible disclosure norms, and concrete design guidelines for watermarking schemes that remain meaningful in the era of generative transformations.

#6 Efficient Privacy-Preserving Sparse Matrix-Vector Multiplication Using Homomorphic Encryption

privacy

著者: Yang Gao, Gang Quan, Wujie Wen, Scott Piersall, Qian Lou, Liqiang Wang

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.04742

要約:
Sparse matrix-vector multiplication (SpMV) is a fundamental operation in scientific computing, data analysis, and machine learning. When the data being processed are sensitive, preserving privacy becomes critical, and homomorphic encryption (HE) has emerged as a leading approach for addressing this challenge. Although HE enables privacy-preserving computation, its application to SpMV has remained largely unaddressed. To the best of our knowledge, this paper presents the first framework that efficiently integrates HE with SpMV, addressing the dual challenges of computational efficiency and data privacy. In particular, we introduce a novel compressed matrix format, named Compressed Sparse Sorted Column (CSSC), which is specifically designed to optimize encrypted sparse matrix computations. By preserving sparsity and enabling efficient ciphertext packing, CSSC significantly reduces storage and computational overhead. Our experimental results on real-world datasets demonstrate that the proposed method achieves significant gains in both processing time and memory usage. This study advances privacy-preserving SpMV and lays the groundwork for secure applications in federated learning, encrypted databases, scientific computing, and beyond.

#7 ShieldBypass: On the Persistence of Impedance Leakage Beyond EM Shielding

著者: Md Sadik Awal, Md Tauhidur Rahman

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.04801

要約:
Electromagnetic (EM) shielding is widely used to suppress radiated emissions and limit passive EM side-channel leakage. However, shielding does not address active probing, where an adversary injects external radio-frequency (RF) signals and observes the device's reflective response. This work studies whether such impedance-modulated backscattering persists when radiated emissions are suppressed by shielding. By injecting controlled RF signals and analyzing the reflections, we demonstrate that state-dependent impedance variations remain observable at frequencies outside the shields' primary attenuation band. Using processors implemented on FPGA and microcontroller prototypes, and evaluating workload profiles under three industry-standard shields, we find that passive EM measurements lose discriminative power under shielding, while backscattering responses remain separable. These results indicate that active RF probing can expose execution-dependent behavior even in shielded systems, motivating the need to consider active impedance-based probing within hardware security evaluation flows.

#8 Osmosis Distillation: Model Hijacking with the Fewest Samples

model extraction

著者: Yuchen Shi, Huajie Chen, Heng Xu, Zhiquan Liu, Jialiang Shen, Chi Liu, Shuai Zhou, Tianqing Zhu, Wanlei Zhou

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.04859

要約:
Transfer learning is devised to leverage knowledge from pre-trained models to solve new tasks with limited data and computational resources. Meanwhile, dataset distillation has emerged to synthesize a compact dataset that preserves critical information from the original large dataset. Therefore, a combination of transfer learning and dataset distillation offers promising performance in evaluations. However, a non-negligible security threat remains undiscovered in transfer learning using synthetic datasets generated by dataset distillation methods, where an adversary can perform a model hijacking attack with only a few poisoned samples in the synthetic dataset. To reveal this threat, we propose Osmosis Distillation (OD) attack, a novel model hijacking strategy that targets deep learning models using the fewest samples. Comprehensive evaluations on various datasets demonstrate that the OD attack attains high attack success rates in hidden tasks while preserving high model utility in original tasks. Furthermore, the distilled osmosis set enables model hijacking across diverse model architectures, allowing model hijacking in transfer learning with considerable attack performance and model utility. We argue that awareness of using third-party synthetic datasets in transfer learning must be raised.

#9 AgentSCOPE: Evaluating Contextual Privacy Across Agentic Workflows

privacyagent

著者: Ivoline C. Ngong, Keerthiram Murugesan, Swanand Kadhe, Justin D. Weisz, Amit Dhurandhar, Karthikeyan Natesan Ramamurthy

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.04902

要約:
Agentic systems are increasingly acting on users' behalf, accessing calendars, email, and personal files to complete everyday tasks. Privacy evaluation for these systems has focused on the input and output boundaries, but each task involves several intermediate information flows, from agent queries to tool responses, that are not currently evaluated. We argue that every boundary in an agentic pipeline is a site of potential privacy violation and must be assessed independently. To support this, we introduce the Privacy Flow Graph, a Contextual Integrity-grounded framework that decomposes agentic execution into a sequence of information flows, each annotated with the five CI parameters, and traces violations to their point of origin. We present AgentSCOPE, a benchmark of 62 multi-tool scenarios across eight regulatory domains with ground truth at every pipeline stage. Our evaluation across seven state-of-the-art LLMs show that privacy violations in the pipeline occur in over 80% of scenarios, even when final outputs appear clean (24%), with most violations arising at the tool-response stage where APIs return sensitive data indiscriminately. These results indicate that output-level evaluation alone substantially underestimates the privacy risk of agentic systems.

#10 Modification to Fully Homomorphic Modified Rivest Scheme

著者: Sona Alex, Bian Yang

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.04952

要約:
This document details the Fully Homomorphic Modified Rivest Scheme (FHMRS), a security issue in FHMRS, and a modification to FHMRS (mFHMRS) to mitigate the security issue.

#11 A Practical Post-Quantum Distributed Ledger Protocol for Financial Institutions

著者: Yeoh Wei Zhu, Naresh Goud Boddu, Yao Ma, Shaltiel Eloul, Giulio Golinelli, Yash Satsangi, Rob Otter, Kaushik Chakraborty

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.05005

要約:
Traditional financial institutions face inefficiencies that can be addressed by distributed ledger technology. However, a primary barrier to adoption is the privacy concerns surrounding publicly available transaction data. Existing private protocols for distributed ledger that focus on the Ring-CT model are not suitable for adoption for financial institutions. We propose a post-quantum, lattice-based transaction scheme for encrypted ledgers which better aligns with institutions' requirements for confidentiality and audit-ability. The construction leverages various zero-knowledge proof techniques, and introduces a new method for equating two commitment messages, without the capability to open one of the commitment during the re-commitment. Subsequently, we build a publicly verifiable transaction scheme that is efficient for single or multi-assets, by introducing a new compact range-proof. We then provide a security analysis of it. The techniques used and the proofs constructed could be of independent interest.

#12 Good-Enough LLM Obfuscation (GELO)

著者: Anatoly Belikov, Ilya Fedotov

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.05035

要約:
Large Language Models (LLMs) are increasingly served on shared accelerators where an adversary with read access to device memory can observe KV caches and hidden states, threatening prompt privacy for open-source models. Cryptographic protections such as MPC and FHE offer strong guarantees but remain one to two orders of magnitude too slow for interactive inference, while static obfuscation schemes break under multi-run statistical attacks once the model is known. We present GELO (Good-Enough LLM Obfuscation), a lightweight protocol for privacy-preserving inference that limits information leakage from untrusted accelerator observations by hiding hidden states with fresh, per-batch invertible mixing. For each offloaded projection, the TEE samples a random matrix A, forms $U = AH$, offloads U and weights W to the accelerator, and then applies $A^-1$ on return, so that $A^-1 ((AH)W ) = HW$ and outputs are unchanged. Because mixing is never reused across batches, the attacker faces only a single-batch blind source separation problem. We analyze information leakage and introduce two practical defenses: (i) non-orthogonal mixing to mask Gram matrices, and (ii) orthogonal mixing augmented with a small fraction of high-energy "shield" vectors that pollute higher-order statistics. On Llama-2 7B, GELO preserves float32 outputs exactly, closely matches low-precision baselines, offloads the dominant matrix multiplications with about 20-30% latency overhead, and defeats a range of ICA/BSS and anchor-based attacks.

#13 Cyber Threat Intelligence for Artificial Intelligence Systems

著者: Natalia Krawczyk, Mateusz Szczepkowski, Adrian Brodzik, Krzysztof Bocianiak

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.05068

要約:
As artificial intelligence (AI) becomes deeply embedded in critical services and everyday products, it is increasingly exposed to security threats which traditional cyber defenses were not designed to handle. In this paper, we investigate how cyber threat intelligence (CTI) may evolve to address attacks that target AI systems. We first analyze the assumptions and workflows of conventional threat intelligence with the needs of AI-focused defense, highlighting AI-specific assets and vulnerabilities. We then review and organize the current landscape of AI security knowledge. Based on this, we outline what an AI-oriented threat intelligence knowledge base should contain, describing concrete indicators of compromise (IoC) for different AI supply-chain phases and artifacts, and showing how such a knowledge base could support security tools. Finally, we discuss techniques for measuring similarity between collected indicators and newly observed AI artifacts. The review reveals gaps and quality issues in existing resources and identifies potential future research directions toward a practical threat intelligence framework tailored to AI.

#14 Robust Single-message Shuffle Differential Privacy Protocol for Accurate Distribution Estimation

privacy

著者: Xiaoguang Li, Hanyi Wang, Yaowei Huang, Jungang Yang, Qingqing Ye, Haonan Yan, Ke Pan, Zhe Sun, Hui Li

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.05073

要約:
Shuffler-based differential privacy (shuffle-DP) is a privacy paradigm providing high utility by involving a shuffler to permute noisy report from users. Existing shuffle-DP protocols mainly focus on the design of shuffler-based categorical frequency oracle (SCFO) for frequency estimation on categorical data. However, numerical data is a more prevalent type and many real-world applications depend on the estimation of data distribution with ordinal nature. In this paper, we study the distribution estimation under pure shuffle model, which is a prevalent shuffle-DP framework without strong security assumptions. We initially attempt to transplant existing SCFOs and the na\"ive distribution recovery technique to this task, and demonstrate that these baseline protocols cannot simultaneously achieve outstanding performance in three metrics: 1) utility, 2) message complexity; and 3) robustness to data poisoning attacks. Therefore, we further propose a novel single-message \textit{adaptive shuffler-based piecewise} (ASP) protocol with high utility and robustness. In ASP, we first develop a randomizer by parameter optimization using our proposed tighter bound of mutual information. We also design an \textit{Expectation Maximization with Adaptive Smoothing} (EMAS) algorithm to accurately recover distribution with enhanced robustness. To quantify robustness, we propose a new evaluation framework to examine robustness under different attack targets, enabling us to comprehensively understand the protocol resilience under various adversarial scenarios. Extensive experiments demonstrate that ASP outperforms baseline protocols in all three metrics. Especially under small $\epsilon$ values, ASP achieves an order of magnitude improvement in utility with minimal message complexity, and exhibits over threefold robustness compared to baseline methods.

#15 Lambda-randomization: multi-dimensional randomized response made easy

著者: Nicolas Ruiz

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.05261

要約:
Randomized response is a popular local anonymization approach that can deliver anonymized multi-dimensional data sets with rigorous privacy guarantees. At the same time, it can ensure validity for exploratory analysis and machine learning tasks as, under fairly general conditions, unbiased estimates of the underlying true distributions can be retrieved. However, and like for many other anonymization techniques, one of the main pitfalls of this approach is the curse of dimensionality. When coping with data sets with many attributes, one quickly runs into unsustainable computational costs for estimating true distributions, as well as a degradation in their accuracies. Relying on new theoretical insights developed in this paper, we propose an approach to multi-dimensional randomized response that avoids these traditional limitations. From simple yet intuitive parameterizations of the randomization matrices that we introduce, we develop a protocol called Lambda-randomization that entails low computational costs to retrieve estimates of multivariate distributions, and that makes use of solely three simple elements: a set of parameters ranging between 0 and 1 (one per attribute of the data set), the identity matrix, and the all-ones vector. We also present an empirical application to illustrate the proposed protocol.

#16 EVMbench: Evaluating AI Agents on Smart Contract Security

agent

著者: Justin Wang, Andreas Bigger, Xiaohai Xu, Justin W. Lin, Andy Applebaum, Tejal Patwardhan, Alpin Yukseloglu, Olivia Watkins

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.04915

要約:
Smart contracts on public blockchains now manage large amounts of value, and vulnerabilities in these systems can lead to substantial losses. As AI agents become more capable at reading, writing, and running code, it is natural to ask how well they can already navigate this landscape, both in ways that improve security and in ways that might increase risk. We introduce EVMbench, an evaluation that measures the ability of agents to detect, patch, and exploit smart contract vulnerabilities. EVMbench draws on 117 curated vulnerabilities from 40 repositories and, in the most realistic setting, uses programmatic grading based on tests and blockchain state under a local Ethereum execution environment. We evaluate a range of frontier agents and find that they are capable of discovering and exploiting vulnerabilities end-to-end against live blockchain instances. We release code, tasks, and tooling to support continued measurement of these capabilities and future work on security.

#17 Automated TEE Adaptation with LLMs: Identifying, Transforming, and Porting Sensitive Functions in Programs

著者: Ruidong Han, Zhou Yang, Chengyan Ma, Ye Liu, Yuqing Niu, Siqi Ma, Debin Gao, David Lo

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2502.13379

要約:
Trusted Execution Environments (TEEs) isolate a special space within a device memory that is not accessible to the normal world (also known as the untrusted environment), even when the device is compromised. Therefore, developers can utilize TEEs to provide robust security guarantees for their programs, protecting sensitive operations, such as encrypted data storage, fingerprint verification, and remote attestation, from software-based attacks. Despite the robust protections offered by TEEs, adapting existing programs to leverage such security guarantees is challenging, often requiring extensive domain knowledge and manual intervention, which makes TEEs less accessible to developers. This motivates us to design AUTOTEE, the first Large Language Model (LLM) enabled approach that can automatically identify, transform, and port functions containing sensitive operations into TEEs with minimal developer intervention. By manually reviewing 68 repositories, we constructed a benchmark dataset consisting of 385 sensitive functions eligible for transformation, on which AUTOTEE achieves a F1 score of 0.94 on Java and 0.87 on Python. AUTOTEE effectively transforms these sensitive functions into TEE-compatible versions, achieving success rates of 91.8% and 84.3% for Java and Python, respectively, when using GPT-4o.

#18 Accurate BGV Parameters Selection: Accounting for Secret and Public Key Dependencies in Average-Case Analysis

著者: Beatrice Biasioli, Chiara Marcolla, Nadir Murru, Matilda Urani

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2504.18597

要約:
The Brakerski-Gentry-Vaikuntanathan (BGV) scheme is one of the most significant fully homomorphic encryption (FHE) schemes. It belongs to a class of FHE schemes whose security is based on the presumed intractability of the Learning with Errors (LWE) problem and its ring variant (RLWE). Such schemes deal with a quantity, called noise, which increases each time a homomorphic operation is performed. Specifically, in order for the scheme to work properly, it is essential that the noise remains below a certain threshold throughout the process. For BGV, this threshold strictly depends on the ciphertext modulus, which is one of the initial parameters whose selection heavily affects both the efficiency and security of the scheme. For an optimal parameter choice, it is crucial to accurately estimate the noise growth, particularly that arising from multiplication, which is the most complex operation. In this work, we propose a novel average-case approach that precisely models noise evolution and guides the selection of initial parameters, improving efficiency while ensuring security. The key innovation of our method lies in accounting for the dependencies among ciphertext errors generated with the same key, and in providing general guidelines for accurate parameter selection that are library-independent.

#19 CyberSleuth: Autonomous Blue-Team LLM Agent for Web Attack Forensics

agent

著者: Stefano Fumero, Kai Huang, Matteo Boffa, Danilo Giordano, Marco Mellia, Dario Rossi

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2508.20643

要約:
Post-mortem analysis of compromised systems is a key aspect of cyber forensics, today a mostly manual, slow, and error-prone task. Agentic AI, i.e., LLM-powered agents, is a promising avenue for automation. However, applying such agents to cybersecurity remains largely unexplored and difficult, as this domain demands long-term reasoning, contextual memory, and consistent evidence correlation - capabilities that current LLM agents struggle to master. In this paper, we present the first systematic study of LLM agents to automate post-mortem investigation. As a first scenario, we consider realistic attacks in which remote attackers try to abuse online services using well-known CVEs (30 controlled cases). The agent receives as input the network traces of the attack and extracts forensic evidence. We compare three AI agent architectures, six LLM backends, and assess their ability to i) identify compromised services, ii) map exploits to exact CVEs, and iii) prepare thorough reports. Our best-performing system, CyberSleuth, achieves 80% accuracy on 2025 incidents, producing complete, coherent, and practically useful reports (judged by a panel of 25 experts). We next illustrate how readily CyberSleuth adapts to face the analysis of infected machine traffic, showing that the effective AI agent design can transfer across forensic tasks. Our findings show that (i) multi-agent specialisation is key to sustained reasoning; (ii) simple orchestration outperforms nested hierarchical architectures; and (iii) the CyberSleuth design generalises across different forensic tasks.

#20 Secure human oversight of AI: Threat modeling in a socio-technical context

著者: Jonas C. Ditz, Veronika Lazar, Elmar Lichtme{\ss}, Carola Plesch, Matthias Heck, Kevin Baum, Markus Langer

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.12290

要約:
Human oversight of AI is promoted as a safeguard against risks such as inaccurate outputs, system malfunctions, or violations of fundamental rights, and is mandated in regulation like the European AI Act. Yet debates on human oversight have largely focused on its effectiveness, while overlooking a critical dimension: the security of human oversight. We argue that human oversight creates a new attack surface within the safety, security, and accountability architecture of AI operations. Drawing on cybersecurity perspectives, we model human oversight as an IT application for the purpose of systematic threat modeling of the human oversight process. Threat modeling allows us to identify security risks within human oversight and points towards possible mitigation strategies. Our contributions are: (1) introducing a security perspective on human oversight, (2) offering researchers and practitioners guidance on how to approach their human oversight applications from a security point of view, and (3) providing a systematic overview of attack vectors and hardening strategies to enable secure human oversight of AI.

#21 GhostEI-Bench: Do Mobile Agents Resilience to Environmental Injection in Dynamic On-Device Environments?

agent

著者: Chiyu Chen, Xinhao Song, Yunkai Chai, Yang Yao, Haodong Zhao, Lijun Li, Jie Li, Yan Teng, Gongshen Liu, Yingchun Wang

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.20333

要約:
Vision-Language Models (VLMs) are increasingly deployed as autonomous agents to navigate mobile graphical user interfaces (GUIs). Operating in dynamic on-device ecosystems, which include notifications, pop-ups, and inter-app interactions, exposes them to a unique and underexplored threat vector: environmental injection. Unlike prompt-based attacks that manipulate textual instructions, environmental injection corrupts an agent's visual perception by inserting adversarial UI elements (for example, deceptive overlays or spoofed notifications) directly into the GUI. This bypasses textual safeguards and can derail execution, causing privacy leakage, financial loss, or irreversible device compromise. To systematically evaluate this threat, we introduce GhostEI-Bench, the first benchmark for assessing mobile agents under environmental injection attacks within dynamic, executable environments. Moving beyond static image-based assessments, GhostEI-Bench injects adversarial events into realistic application workflows inside fully operational Android emulators and evaluates performance across critical risk scenarios. We further propose a judge-LLM protocol that conducts fine-grained failure analysis by reviewing the agent's action trajectory alongside the corresponding screenshot sequence, pinpointing failure in perception, recognition, or reasoning. Comprehensive experiments on state-of-the-art agents reveal pronounced vulnerability to deceptive environmental cues: current models systematically fail to perceive and reason about manipulated UIs. GhostEI-Bench provides a framework for quantifying and mitigating this emerging threat, paving the way toward more robust and secure embodied agents.

#22 BRIDG-ICS: AI-Grounded Knowledge Graphs for Intelligent Threat Analytics in Industry~5.0 Cyber-Physical Systems

著者: Padmeswari Nandiya, Ahmad Mohsin, Ahmed Ibrahim, Iqbal H. Sarker, Helge Janicke

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12112

要約:
Industry 5.0's increasing integration of IT and OT systems is transforming industrial operations but also expanding the cyber-physical attack surface. Industrial Control Systems (ICS) face escalating security challenges as traditional siloed defences fail to provide coherent, cross-domain threat insights. We present BRIDG-ICS (BRIDge for Industrial Control Systems), an AI-driven Knowledge Graph (KG) framework for context-aware threat analysis and quantitative assessment of cyber resilience in smart manufacturing environments. BRIDG-ICS fuses heterogeneous industrial and cybersecurity data into an integrated Industrial Security Knowledge Graph linking assets, vulnerabilities, and adversarial behaviours with probabilistic risk metrics (e.g. exploit likelihood, attack cost). This unified graph representation enables multi-stage attack path simulation using graph-analytic techniques. To enrich the graph's semantic depth, the framework leverages Large Language Models (LLMs): domain-specific LLMs extract cybersecurity entities, predict relationships, and translate natural-language threat descriptions into structured graph triples, thereby populating the knowledge graph with missing associations and latent risk indicators. This unified AI-enriched KG supports multi-hop, causality-aware threat reasoning, improving visibility into complex attack chains and guiding data-driven mitigation. In simulated industrial scenarios, BRIDG-ICS scales well, reduces potential attack exposure, and can enhance cyber-physical system resilience in Industry 5.0 settings.

#23 Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections

agent

著者: Xianglin Yang, Yufei He, Shuo Ji, Bryan Hooi, Jin Song Dong

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2602.15654

要約:
Self-evolving LLM agents update their internal state across sessions, often by writing and reusing long-term memory. This design improves performance on long-horizon tasks but creates a security risk: untrusted external content observed during a benign session can be stored as memory and later treated as instruction. We study this risk and formalize a persistent attack we call a Zombie Agent, where an attacker covertly implants a payload that survives across sessions, effectively turning the agent into a puppet of the attacker. We present a black-box attack framework that uses only indirect exposure through attacker-controlled web content. The attack has two phases. During infection, the agent reads a poisoned source while completing a benign task and writes the payload into long-term memory through its normal update process. During trigger, the payload is retrieved or carried forward and causes unauthorized tool behavior. We design mechanism-specific persistence strategies for common memory implementations, including sliding-window and retrieval-augmented memory, to resist truncation and relevance filtering. We evaluate the attack on representative agent setups and tasks, measuring both persistence over time and the ability to induce unauthorized actions while preserving benign task quality. Our results show that memory evolution can convert one-time indirect injection into persistent compromise, which suggests that defenses focused only on per-session prompt filtering are not sufficient for self-evolving agents.

#24 UC-Secure Star DKG for Non-Exportable Key Shares with VSS-Free Enforcement

著者: Vipin Singh Sehrawat

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2602.22187

要約:
Distributed Key Generation (DKG) lets parties derive a common public key while keeping the signing key secret-shared. UC-secure DKG requires a verifiable-sharing enforcement layer -- classically satisfied via Verifiable Secret Sharing (VSS) and/or commitment-and-proof mechanisms -- for secrecy, uniqueness, and affine consistency. We target the Non-eXportable Key (NXK) setting enforced by hardware-backed key-isolation modules (e.g., TEEs, HSM-like APIs), formalized via an ideal KeyBox (keystore) functionality $\mathcal{F}_{KeyBox}$ that keeps shares non-exportable and permits only attested KeyBox-to-KeyBox sealing. With confidentiality delegated to the NXK boundary, the remaining challenge is enforcing transcript-defined affine consistency without exporting or resharing shares. State continuity rules out rewinding-based extraction, mandating straight-line techniques. We combine (i) KeyBox confidentiality; (ii) Unique Structure Verification (USV), a publicly verifiable certificate whose certified scalar never leaves the KeyBox yet whose public group element is transcript-derivable; and (iii) Fischlin-based UC-extractable NIZK arguments of knowledge in a gRO-CRP (global Random Oracle with Context-Restricted Programmability) model. We construct Star DKG (SDKG), a UC-secure scheme for multi-device threshold wallets where a designated service must co-sign but cannot sign alone, realizing a 1+1-out-of-$n$ star access structure (center plus any leaf) over roles (primary vs. recovery) with role-based device registration. In the $\mathcal{F}_{KeyBox}$-hybrid and gRO-CRP models, under DL and DDH assumptions with adaptive corruptions and secure erasures, SDKG UC-realizes a transcript-driven refinement of the standard UC-DKG functionality. Over a prime-order group of size $p$, SDKG incurs $\widetilde{O}(n\log p)$ communication overhead and $\widetilde{O}(n\log^{2.585}p)$ bit-operation cost.

#25 Lap2: Revisiting Laplace DP-SGD for High Dimensions via Majorization Theory

著者: Meisam Mohammady, Qin Yang, Nicholas Stout, Ayesha Samreen, Han Wang, Christopher J Quinn, Yuan Hong

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2602.23516

要約:
Differentially Private Stochastic Gradient Descent (DP-SGD) is a cornerstone technique for ensuring privacy in deep learning, widely used in both training from scratch and fine-tuning large-scale language models. While DP-SGD predominantly relies on the Gaussian mechanism, the Laplace mechanism remains underutilized due to its reliance on L1 norm clipping. This constraint severely limits its practicality in high-dimensional models because the L1 norm of an n-dimensional gradient can be up to sqrt(n) times larger than its L2 norm. As a result, the required noise scale grows significantly with model size, leading to poor utility or untrainable models. In this work, we introduce Lap2, a new solution that enables L2 clipping for Laplace DP-SGD while preserving strong privacy guarantees. We overcome the dimensionality-driven clipping barrier by computing coordinate-wise moment bounds and applying majorization theory to construct a tight, data-independent upper bound over the full model. By exploiting the Schur-convexity of the moment accountant function, we aggregate these bounds using a carefully designed majorization set that respects the L2 clipping constraint. This yields a multivariate privacy accountant that scales gracefully with model dimension and enables the use of thousands of moments. Empirical evaluations demonstrate that our approach significantly improves the performance of Laplace DP-SGD, achieving results comparable to or better than Gaussian DP-SGD under strong privacy constraints. For instance, fine-tuning RoBERTa-base (125M parameters) on SST-2 achieves 87.88% accuracy at epsilon=0.54, outperforming Gaussian (87.16%) and standard Laplace (48.97%) under the same budget.

#26 Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking

著者: Zhicheng Fang, Jingjie Zheng, Chenxu Fu, Wei Xu

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2602.24009

要約:
Jailbreak techniques for large language models (LLMs) evolve faster than benchmarks, making robustness estimates stale and difficult to compare across papers due to drift in datasets, harnesses, and judging protocols. We introduce JAILBREAK FOUNDRY (JBF), a system that addresses this gap via a multi-agent workflow to translate jailbreak papers into executable modules for immediate evaluation within a unified harness. JBF features three core components: (i) JBF-LIB for shared contracts and reusable utilities; (ii) JBF-FORGE for the multi-agent paper-to-module translation; and (iii) JBF-EVAL for standardizing evaluations. Across 30 reproduced attacks, JBF achieves high fidelity with a mean (reproduced-reported) attack success rate (ASR) deviation of +0.26 percentage points. By leveraging shared infrastructure, JBF reduces attack-specific implementation code by nearly half relative to original repositories and achieves an 82.5% mean reused-code ratio. This system enables a standardized AdvBench evaluation of all 30 attacks across 10 victim models using a consistent GPT-4o judge. By automating both attack integration and standardized evaluation, JBF offers a scalable solution for creating living benchmarks that keep pace with the rapidly shifting security landscape.

#27 Real Money, Fake Models: Deceptive Model Claims in Shadow APIs

著者: Yage Zhang, Yukun Jiang, Zeyuan Chen, Michael Backes, Xinyue Shen, Yang Zhang

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.01919

要約:
Access to frontier large language models (LLMs), such as GPT-5 and Gemini-2.5, is often hindered by high pricing, payment barriers, and regional restrictions. These limitations drive the proliferation of $\textit{shadow APIs}$, third-party services that claim to provide access to official model services without regional limitations via indirect access. Despite their widespread use, it remains unclear whether shadow APIs deliver outputs consistent with those of the official APIs, raising concerns about the reliability of downstream applications and the validity of research findings that depend on them. In this paper, we present the first systematic audit between official LLM APIs and corresponding shadow APIs. We first identify 17 shadow APIs that have been utilized in 187 academic papers, with the most popular one reaching 5,966 citations and 58,639 GitHub stars by December 6, 2025. Through multidimensional auditing of three representative shadow APIs across utility, safety, and model verification, we uncover both indirect and direct evidence of deception practices in shadow APIs. Specifically, we reveal performance divergence reaching up to $47.21\%$, significant unpredictability in safety behaviors, and identity verification failures in $45.83\%$ of fingerprint tests. These deceptive practices critically undermine the reproducibility and validity of scientific research, harm the interests of shadow API users, and damage the reputation of official model providers.

#28 Reckless Designs and Broken Promises: Privacy Implications of Targeted Interactive Advertisements on Social Media Platforms

privacy

著者: Julia B. Kieserman, Athanasios Andreou, Laura Edelson, Sandra Siby, Damon McCoy

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03659

要約:
Popular social media platforms TikTok, Facebook and Instagram allow third-parties to run targeted advertising campaigns on sensitive attributes in-platform. These ads are interactive by default, meaning users can comment or ``react'' (e.g., ``like'', ``love'') to them. We find that this platform-level design choice creates a privacy loophole such that advertisers can view the profiles of those who interact with their ads, thus identifying individuals that fulfill certain targeting criteria. This behavior is in contradiction to the promises made by the platforms to hide user data from advertisers. We conclude by suggesting design modifications that could provide users with transparency about the consequences of ad interaction to protect against unintentional disclosure.

#29 Zero-Knowledge Proof (ZKP) Authentication for Offline CBDC Payment System Using IoT Devices

著者: Santanu Mondal, T. Chithralekha

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03804

要約:
Central Bank Digital Currency (CBDCs) are becoming a new digital financial tool aimed at financial inclusion, increased monetary stability, and improved efficiency of payment systems, as they are issued by central banks. One of the most important aspects is that the CBDC must offer secure offline payment methods to users, allowing them to retain cash-like access without violating Anti-Money Laundering and Counter-terrorism Financing (AML/CFT) rules. The offline CBDC ecosystems will provide financial inclusion, empower underserved communities, and ensure equitable access to digital payments, even in connectivity-poor remote locations. With the rapid growth of Internet of Things (IoT) devices in our everyday lives, they are capable of performing secure digital transactions. Integrating offline CBDC payment with IoT devices enables seamless, automated payment without internet connectivity. However, IoT devices face special challenges due to their resource-constrained nature. This makes it difficult to include features such as double-spending prevention, privacy preservation, low-computation operation, and digital identity management. The work proposes a privacy-preserving offline CBDC model with integrated secure elements (SEs), zero-knowledge proofs (ZKPs), and intermittent synchronisation to conduct offline payments on IoT hardware. The proposed model is based on recent improvements in offline CBDC prototypes, regulations and cryptographic design choices such as hybrid architecture that involves using combination of online and offline payment in IoT devices using secure hardware with lightweight zero-knowledge proof cryptographic algorithm.

#30 Measuring Privacy vs. Fidelity in Synthetic Social Media Datasets

privacy

著者: Henry Tari, Adriana Iamnitchi

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03906

要約:
Synthetic data is increasingly used to support research without exposing sensitive user content. Social media data is one of the types of datasets that would hugely benefit from representative synthetic equivalents that can be used to bootstrap research and allow reproducibility through data sharing. However, recent studies show that (tabular) synthetic data is not inherently privacy-preserving. Much less is known, however, about the privacy risks of synthetically generated unstructured texts. This work evaluates the privacy of synthetic Instagram posts generated by three state-of-the-art large language models using two prompting strategies. We propose a methodology that quantifies privacy by framing re-identification as an authorship attribution attack. A RoBERTa-large classifier trained on real posts achieved 81\% accuracy in authorship attribution on real data, but only 16.5--29.7\% on synthetic posts, showing reduced, though non-negligible, risk. Fidelity was assessed via text traits, sentiment, topic overlap, and embedding similarity, confirming the expected trade-off: higher fidelity coincides with greater privacy leakage. This work provides a framework for evaluating privacy in synthetic text and demonstrates the privacy--fidelity tension in social media datasets.

#31 No exponential quantum speedup for $\mathrm{SIS}^\infty$ anymore

著者: Robin Kothari, Ryan O'Donnell, Kewen Wu

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.07515

要約:
In 2021, Chen, Liu, and Zhandry presented an efficient quantum algorithm for the average-case $\ell_\infty$-Short Integer Solution ($\mathrm{SIS}^\infty$) problem, in a parameter range outside the normal range of cryptographic interest, but still with no known efficient classical algorithm. This was particularly exciting since $\mathrm{SIS}^\infty$ is a simple problem without structure, and their algorithmic techniques were different from those used in prior exponential quantum speedups. We present efficient classical algorithms for all of the $\mathrm{SIS}^\infty$ and (more general) Constrained Integer Solution problems studied in their paper, showing there is no exponential quantum speedup anymore.

#32 Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems

agent

著者: Rishi Jha, Harold Triedman, Justin Wagle, Vitaly Shmatikov

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.17276

要約:
Control-flow hijacking attacks manipulate orchestration mechanisms in multi-agent systems into performing unsafe actions that compromise the system and exfiltrate sensitive information. Recently proposed defenses, such as LlamaFirewall, rely on alignment checks of inter-agent communications to ensure that all agent invocations are "related to" and "likely to further" the original objective. We start by demonstrating control-flow hijacking attacks that evade these defenses even if alignment checks are performed by advanced LLMs. We argue that the safety and functionality objectives of multi-agent systems fundamentally conflict with each other. This conflict is exacerbated by the brittle definitions of "alignment" and the checkers' incomplete visibility into the execution context. We then propose, implement, and evaluate ControlValve, a new defense inspired by the principles of control-flow integrity and least privilege. ControlValve (1) generates permitted control-flow graphs for multi-agent systems, and (2) enforces that all executions comply with these graphs, along with contextual rules (generated in a zero-shot manner) for each agent invocation.

#33 IoUCert: Robustness Verification for Anchor-based Object Detectors

著者: Benedikt Br\"uckner, Alejandro J. Mercado, Yanghao Zhang, Panagiotis Kouvaros, Alessio Lomuscio

公開日: Fri, 06 Mar 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2603.03043

要約:
While formal robustness verification has seen significant success in image classification, scaling these guarantees to object detection remains notoriously difficult due to complex non-linear coordinate transformations and Intersection-over-Union (IoU) metrics. We introduce IoUCert, a novel formal verification framework designed specifically to overcome these bottlenecks in foundational anchor-based object detection architectures. Focusing on the object localisation component in single-object settings, we propose a coordinate transformation that enables our algorithm to circumvent precision-degrading relaxations of non-linear box prediction functions. This allows us to optimise bounds directly with respect to the anchor box offsets which enables a novel Interval Bound Propagation method that derives optimal IoU bounds. We demonstrate that our method enables, for the first time, the robustness verification of realistic, anchor-based models including SSD, YOLOv2, and YOLOv3 variants against various input perturbations.

cs.CR updates on arXiv.org

📋 論文タイトル一覧