arXiv論文一覧 - cs.CR updates on arXiv.org

#1 Safety, Security, and Cognitive Risks in World Models

著者: Manoj Parmar

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.01346

要約:
World models -- learned internal simulators of environment dynamics -- are rapidly becoming foundational to autonomous decision-making in robotics, autonomous vehicles, and agentic AI. Yet this predictive power introduces a distinctive set of safety, security, and cognitive risks. Adversaries can corrupt training data, poison latent representations, and exploit compounding rollout errors to cause catastrophic failures in safety-critical deployments. World model-equipped agents are more capable of goal misgeneralisation, deceptive alignment, and reward hacking precisely because they can simulate the consequences of their own actions. Authoritative world model predictions further foster automation bias and miscalibrated human trust that operators lack the tools to audit. This paper surveys the world model landscape; introduces formal definitions of trajectory persistence and representational risk; presents a five-profile attacker capability taxonomy; and develops a unified threat model extending MITRE ATLAS and the OWASP LLM Top 10 to the world model stack. We provide an empirical proof-of-concept on trajectory-persistent adversarial attacks (GRU-RSSM: A_1 = 2.26x amplification, -59.5% reduction under adversarial fine-tuning; stochastic RSSM proxy: A_1 = 0.65x; DreamerV3 checkpoint: non-zero action drift confirmed). We illustrate risks through four deployment scenarios and propose interdisciplinary mitigations spanning adversarial hardening, alignment engineering, NIST AI RMF and EU AI Act governance, and human-factors design. We argue that world models must be treated as safety-critical infrastructure requiring the same rigour as flight-control software or medical devices.

#2 "The System Will Choose Security Over Humanity Every Time": Understanding Security and Privacy for U.S. Incarcerated Users

privacy

著者: Yael Eiger, Nino Migineishvili, Emi Yoshikawa, Liza Nadtochiy, Kentrell Owens, Franziska Roesner

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.01370

要約:
Digital devices like tablets, media players, and kiosks are increasingly deployed in U.S. prisons. These technologies can enable incarcerated people to access education, communicate with loved ones, and develop vital reentry skills. However, they can also introduce new privacy and security risks for incarcerated people who have little agency over their usage and contracts, and are currently carved out of many consumer protection safeguards. To investigate these issues, we conducted focus groups and interviews with system-impacted people (n=17), i.e., those formerly incarcerated, and their relatives, to investigate experiences with device-related security and privacy vulnerabilities and the power dynamics that affect their use. In our findings, participants describe pervasive surveillance, censorship, and usability problems with the technology available to them, including shifting and seemingly arbitrary usage policies. These policies strain relationships both inside and outside prisons and contribute to negative downstream effects for incarcerated users. We recommend ways to better balance prison security concerns with privacy-related needs of system-impacted individuals by promoting accountability for technology-related decisions, providing public oversight of digital purchasing and use policies, and designing digital tools with them -- the actual end-users -- in mind.

#3 Cooking Up Risks: Benchmarking and Reducing Food Safety Risks in Large Language Models

著者: Weidi Luo, Xiaofei Wen, Tenghao Huang, Hongyi Wang, Zhen Xiang, Chaowei Xiao, Kristina Gligori\'c, Muhao Chen

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.01444

要約:
Large language models (LLMs) are increasingly deployed for everyday tasks, including food preparation and health-related guidance. However, food safety remains a high-stakes domain where inaccurate or misleading information can cause severe real-world harm. Despite these risks, current LLMs and safety guardrails lack rigorous alignment tailored to domain-specific food hazards. To address this gap, we introduce FoodGuardBench, the first comprehensive benchmark comprising 3,339 queries grounded in FDA guidelines, designed to evaluate the safety and robustness of LLMs. By constructing a taxonomy of food safety principles and employing representative jailbreak attacks (e.g., AutoDAN and PAP), we systematically evaluate existing LLMs and guardrails. Our evaluation results reveal three critical vulnerabilities: First, current LLMs exhibit sparse safety alignment in the food-related domain, easily succumbing to a few canonical jailbreak strategies. Second, when compromised, LLMs frequently generate actionable yet harmful instructions, inadvertently empowering malicious actors and posing tangible risks. Third, existing LLM-based guardrails systematically overlook these domain-specific threats, failing to detect a substantial volume of malicious inputs. To mitigate these vulnerabilities, we introduce FoodGuard-4B, a specialized guardrail model fine-tuned on our datasets to safeguard LLMs within food-related domains.

#4 Preserving Target Distributions With Differentially Private Count Mechanisms

privacy

著者: Nitin Kohli, Paul Laskowski

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.01468

要約:
Differentially private mechanisms are increasingly used to publish tables of counts, where each entry represents the number of individuals belonging to a particular category. A distribution of counts summarizes the information in the count column, unlinking counts from categories. This object is useful for answering a class of research questions, but it is subject to statistical biases when counts are privatized with standard mechanisms. This motivates a novel design criterion we term accuracy of distribution. This study formalizes a two-stage framework for privatizing tables of counts that balances accuracy of distribution with two standard criteria of accuracy of counts and runtime. In the first stage, a distribution privatizer generates an estimate for the true distribution of counts. We introduce a new mechanism, called the cyclic Laplace, specifically tailored to distributions of counts, that outperforms existing general-purpose differentially private histogram mechanisms. In the second stage, a constructor algorithm generates a count mechanism, represented as a transition matrix, whose fixed-point is the privatized distribution of counts. We develop a mathematical theory that describes such transition matrices in terms of simple building blocks we call epsilon-scales. This theory informs the design of a new constructor algorithm that generates transition matrices with favorable properties more efficiently than standard optimization algorithms. We explore the practicality of our framework with a set of experiments, highlighting situations in which a fixed-point method provides a favorable tradeoff among performance criteria.

#5 SelfGrader: Stable Jailbreak Detection for Large Language Models using Token-Level Logits

著者: Zikai Zhang, Rui Hu, Olivera Kotevska, Jiahao Xu

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.01473

要約:
Large Language Models (LLMs) are powerful tools for answering user queries, yet they remain highly vulnerable to jailbreak attacks. Existing guardrail methods typically rely on internal features or textual responses to detect malicious queries, which either introduce substantial latency or suffer from the randomness in text generation. To overcome these limitations, we propose SelfGrader, a lightweight guardrail method that formulates jailbreak detection as a numerical grading problem using token-level logits. Specifically, SelfGrader evaluates the safety of a user query within a compact set of numerical tokens (NTs) (e.g., 0-9) and interprets their logit distribution as an internal safety signal. To align these signals with human intuition of maliciousness, SelfGrader introduces a dual-perspective scoring rule that considers both the maliciousness and benignness of the query, yielding a stable and interpretable score that reflects harmfulness and reduces the false positive rate simultaneously. Extensive experiments across diverse jailbreak benchmarks, multiple LLMs, and state-of-the-art guardrail baselines demonstrate that SelfGrader achieves up to a 22.66% reduction in ASR on LLaMA-3-8B, while maintaining significantly lower memory overhead (up to 173x) and latency (up to 26x).

#6 EXHIB: A Benchmark for Realistic and Diverse Evaluation of Function Similarity in the Wild

著者: Yiming Fan (The Ohio State University), Jun Yeon Won (The Ohio State University), Ding Zhu (The Ohio State University), Melih Sirlanci (The Ohio State University), Mahdi Khalili (The Ohio State University), Carter Yagemann (The Ohio State University)

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.01554

要約:
Binary Function Similarity Detection (BFSD) is a core problem in software security, supporting tasks such as vulnerability analysis, malware classification, and patch provenance. In the past few decades, numerous models and tools have been developed for this application; however, due to the lack of a comprehensive universal benchmark in this field, researchers have struggled to compare different models effectively. Existing datasets are limited in scope, often focusing on a narrow set of transformations or types of binaries, and fail to reflect the full diversity of real-world applications. We introduce EXHIB, a benchmark comprising five realistic datasets collected from the wild, each highlighting a distinct aspect of the BFSD problem space. We evaluate 9 representative models spanning multiple BFSD paradigms on EXHIB and observe performance degradations of up to 30% on firmware and semantic datasets compared to standard settings, revealing substantial generalization gaps. Our results show that robustness to low- and mid-level binary variations does not generalize to high-level semantic differences, underscoring a critical blind spot in current BFSD evaluation practices.

#7 AI-Assisted Hardware Security Verification: A Survey and AI Accelerator Case Study

著者: Khan Thamid Hasan, Md Ajoad Hasan, Nashmin Alam, Md. Touhidul Islam, Upoma Das, Farimah Farahmandi

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.01572

要約:
As hardware systems grow in complexity, security verification must keep up with them. Recently, artificial intelligence (AI) and large language models (LLMs) have started to play an important role in automating several stages of the verification workflow by helping engineers analyze designs, reason about potential threats, and generate verification artifacts. This survey synthesizes recent advances in AI-assisted hardware security verification and organizes the literature along key stages of the workflow: asset identification, threat modeling, security test-plan generation, simulation-driven analysis, formal verification, and countermeasure reasoning. To illustrate how these techniques can be applied in practice, we present a case study using the open-source NVIDIA Deep Learning Accelerator (NVDLA), a representative modern hardware design. Throughout this study, we emphasize that while AI/LLM-based automation can significantly accelerate verification tasks, its outputs must remain grounded in simulation evidence, formal reasoning, and benchmark-driven evaluation to ensure trustworthy hardware security assurance.

#8 Assertain: Automated Security Assertion Generation Using Large Language Models

著者: Shams Tarek, Dipayan Saha, Khan Thamid Hasan, Sujan Kumar Saha, Mark Tehranipoor, Farimah Farahmandi

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.01583

要約:
The increasing complexity of modern system-on-chip designs amplifies hardware security risks and makes manual security property specification a major bottleneck in formal property verification. This paper presents Assertain, an automated framework that integrates RTL design analysis, Common Weakness Enumeration (CWE) mapping, and threat model intelligence to automatically generate security properties and executable SystemVerilog Assertions. Assertain leverages large language models with a self-reflection refinement mechanism to ensure both syntactic correctness and semantic consistency. Evaluated on 11 representative hardware designs, Assertain outperforms GPT-5 by 61.22%, 59.49%, and 67.92% in correct assertion generation, unique CWE coverage, and architectural flaw detection, respectively. These results demonstrate that Assertain significantly expands vulnerability coverage, improves assertion quality, and reduces manual effort in hardware security verification.

#9 RefinementEngine: Automating Intent-to-Device Filtering Policy Deployment under Network Constraints

著者: Davide Colaiacomo, Chiara Bonfanti, Cataldo Basile

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.01627

要約:
Translating security intent into deployable network enforcement rules and maintaining their effectiveness despite evolving cyber threats remains a largely manual process in most Security Operations Centers (SOCs). In large and heterogeneous networks, this challenge is complicated by topology-dependent reachability constraints and device-specific security control capabilities, making the process slow, error-prone, and a recurring source of misconfigurations. This paper presents RefinementEngine, an engine that automates the refinement of high-level security intents into low-level, deployment-ready configurations. Given a network topology, devices, and available security controls, along with high-level intents and Cyber Threat Intelligence (CTI) reports, RefinementEngine automatically generates settings that implement the desired intent, counter reported threats, and can be directly deployed on target security controls. The proposed approach is validated through real-world use cases on packet and web filtering policies derived from actual CTI reports, demonstrating both correctness, practical applicability, and adaptability to new data.

#10 Diffusion-Guided Adversarial Perturbation Injection for Generalizable Defense Against Facial Manipulations

diffusion

著者: Yue Li, Linying Xue, Kaiqing Lin, Hanyu Quan, Dongdong Lin, Hui Tian, Hongxia Wang, Bin Wang

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.01635

要約:
Recent advances in GAN and diffusion models have significantly improved the realism and controllability of facial deepfake manipulation, raising serious concerns regarding privacy, security, and identity misuse. Proactive defenses attempt to counter this threat by injecting adversarial perturbations into images before manipulation takes place. However, existing approaches remain limited in effectiveness due to suboptimal perturbation injection strategies and are typically designed under white-box assumptions, targeting only simple GAN-based attribute editing. These constraints hinder their applicability in practical real-world scenarios. In this paper, we propose AEGIS, the first diffusion-guided paradigm in which the AdvErsarial facial images are Generated for Identity Shielding. We observe that the limited defense capability of existing approaches stems from the peak-clipping constraint, where perturbations are forcibly truncated due to a fixed $L_\infty$-bounded. To overcome this limitation, instead of directly modifying pixels, AEGIS injects adversarial perturbations into the latent space along the DDIM denoising trajectory, thereby decoupling the perturbation magnitude from pixel-level constraints and allowing perturbations to adaptively amplify where most effective. The extensible design of AEGIS allows the defense to be expanded from purely white-box use to also support black-box scenarios through a gradient-estimation strategy. Extensive experiments across GAN and diffusion-based deepfake generators show that AEGIS consistently delivers strong defense effectiveness while maintaining high perceptual quality. In white-box settings, it achieves robust manipulation disruption, whereas in black-box settings, it demonstrates strong cross-model transferability.

#11 Seclens: Role-specific Evaluation of LLM's for security vulnerablity detection

著者: Subho Halder, Siddharth Saxena, Kashinath Kadaba Shrish, Thiyagarajan M

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.01637

要約:
Existing benchmarks for LLM-based vulnerability detection compress model performance into a single metric, which fails to reflect the distinct priorities of different stakeholders. For example, a CISO may emphasize high recall of critical vulnerabilities, an engineering leader may prioritize minimizing false positives, and an AI officer may balance capability against cost. To address this limitation, we introduce SecLens-R, a multi-stakeholder evaluation framework structured around 35 shared dimensions grouped into 7 measurement categories. The framework defines five role-specific weighting profiles: CISO, Chief AI Officer, Security Researcher, Head of Engineering, and AI-as-Actor. Each profile selects 12 to 16 dimensions with weights summing to 80, yielding a composite Decision Score between 0 and 100. We apply SecLens-R to evaluate 12 frontier models on a dataset of 406 tasks derived from 93 open-source projects, covering 10 programming languages and 8 OWASP-aligned vulnerability categories. Evaluations are conducted across two settings: Code-in-Prompt (CIP) and Tool-Use (TU). Results show substantial variation across stakeholder perspectives, with Decision Scores differing by as much as 31 points for the same model. For instance, Qwen3-Coder achieves an A (76.3) under the Head of Engineering profile but a D (45.2) under the CISO profile, while GPT-5.4 shows a similar disparity. These findings demonstrate that vulnerability detection is inherently a multi-objective problem and that stakeholder-aware evaluation provides insights that single aggregated metrics obscure.

#12 Contextualizing Sink Knowledge for Java Vulnerability Discovery

著者: Fabian Fleischer, Cen Zhang, Joonun Jang, Jeongin Cho, Meng Xu, Taesoo Kim

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.01645

要約:
Java applications are prone to vulnerabilities stemming from the insecure use of security-sensitive APIs, such as file operations enabling path traversal or deserialization routines allowing remote code execution. These sink APIs encode critical information for vulnerability discovery: the program-specific constraints required to reach them and the exploitation conditions necessary to trigger security flaws. Despite this, existing fuzzers largely overlook such vulnerability-specific knowledge, limiting their effectiveness. We present GONDAR, a sink-centric fuzzing framework that systematically leverages sink API semantics for targeted vulnerability discovery. GONDAR first identifies reachable and exploitable sink call sites through CWE-specific scanning combined with LLM-assisted static filtering. It then deploys two specialized agents that work collaboratively with a coverage-guided fuzzer: an exploration agent generates inputs to reach target call sites by iteratively solving path constraints, while an exploitation agent synthesizes proof-of-concept exploits by reasoning about and satisfying vulnerability-triggering conditions. The agents and fuzzer continuously exchange seeds and runtime feedback, complementing each other. We evaluated GONDAR on real-world Java benchmarks, where it discovers four times more vulnerabilities than Jazzer, the state-of-the-art Java fuzzer. Notably, GONDAR also demonstrated strong performance in the DARPA AI Cyber Challenge, and is integrated into OSS-CRS, a sandbox project in The Linux Foundation's OpenSSF, to improve the security of open-source software.

#13 Spike-PTSD: A Bio-Plausible Adversarial Example Attack on Spiking Neural Networks via PTSD-Inspired Spike Scaling

著者: Lingxin Jin, Wei Jiang, Maregu Assefa Habtie, Letian Chen, Jinyu Zhan, Xingzhi Zhou, Lin Zuo, Naoufel Werghi

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.01750

要約:
Spiking Neural Networks (SNNs) are energy-efficient and biologically plausible, ideal for embedded and security-critical systems, yet their adversarial robustness remains open. Existing adversarial attacks often overlook SNNs' bio-plausible dynamics. We propose Spike-PTSD, a biologically inspired adversarial attack framework modeled on abnormal neural firing in Post-Traumatic Stress Disorder (PTSD). It localizes decision-critical layers, selects neurons via hyper/hypoactivation signatures, and optimizes adversarial examples with dual objectives. Across six datasets, three encoding types, and four models, Spike-PTSD achieves over 99% success rates, systematically compromising SNN robustness. Code: https://github.com/bluefier/Spike-PTSD.

#14 Topology-Hiding Connectivity-Assurance for QKD Inter-Networking

著者: Margherita Cozzolino, Stephan Krenn, Thomas Lor\"unser

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.01876

要約:
While QKD ensures information-theoretic security at the link level, real-world deployments depend on trusted repeaters, creating potential vulnerabilities. In this paper, we thus introduce a topology-hiding connectivity assurance protocol to enhance trust in quantum key distribution (QKD) network infrastructures. Our protocol allows network providers to jointly prove the existence of a secure connection between endpoints without revealing internal topology details. By extending graph-signature techniques to support multi-graphs and hidden endpoints, we enable zero-knowledge proofs of connectivity that ensure both soundness and topology hiding. We further discuss how our approach can certify, e.g., multiple disjoint paths, supporting multi-path QKD scenarios. This work bridges cryptographic assurance methods with the operational requirements of QKD networks, promoting verifiable and privacy-preserving inter-network connectivity.

#15 Combating Data Laundering in LLM Training

著者: Muxing Li, Zesheng Ye, Sharon Li, Feng Liu

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.01904

要約:
Data rights owners can detect unauthorized data use in large language model (LLM) training by querying with proprietary samples. Often, superior performance (e.g., higher confidence or lower loss) on a sample relative to the untrained data implies it was part of the training corpus, as LLMs tend to perform better on data they have seen during training. However, this detection becomes fragile under data laundering, a practice of transforming the stylistic form of proprietary data, while preserving critical information to obfuscate data provenance. When an LLM is trained exclusively on such laundered variants, it no longer performs better on originals, erasing the signals that standard detections rely on. We counter this by inferring the unknown laundering transformation from black-box access to the target LLM and, via an auxiliary LLM, synthesizing queries that mimic the laundered data, even if rights owners have only the originals. As the search space of finding true laundering transformations is infinite, we abstract such a process into a high-level transformation goal (e.g., "lyrical rewriting") and concrete details (e.g., "with vivid imagery"), and introduce synthesis data reversion (SDR) that instantiates this abstraction. SDR first identifies the most probable goal for synthesis to narrow the search; it then iteratively refines details so that synthesized queries gradually elicit stronger detection signals from the target LLM. Evaluated on the MIMIR benchmark against diverse laundering practices and target LLM families (Pythia, Llama2, and Falcon), SDR consistently strengthens data misuse detection, providing a practical countermeasure to data laundering.

#16 From Component Manipulation to System Compromise: Understanding and Detecting Malicious MCP Servers

著者: Yiheng Huang, Zhijia Zhao, Bihuan Chen, Susheng Wu, Zhuotong Zhou, Yiheng Cao, Xin Hu, Xin Peng

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.01905

要約:
The model context protocol (MCP) standardizes how LLMs connect to external tools and data sources, enabling faster integration but introducing new attack vectors. Despite the growing adoption of MCP, existing MCP security studies classify attacks by their observable effects, obscuring how attacks behave across different MCP server components and overlooking multi-component attack chains. Meanwhile, existing defenses are less effective when facing multi-component attacks or previously unknown malicious behaviors. This work presents a component-centric perspective for understanding and detecting malicious MCP servers. First, we build the first component-centric PoC dataset of 114 malicious MCP servers where attacks are achieved as manipulation over MCP components and their compositions. We evaluate these attacks' effectiveness across two MCP hosts and five LLMs, and uncover that (1) component position shapes attack success rate; and (2) multi-component compositions often outperform single-component attacks by distributing malicious logic. Second, we propose and implement Connor, a two-stage behavioral deviation detector for malicious MCP servers. It first performs pre-execution analysis to detect malicious shell commands and extract each tool's function intent, and then conducts step-wise in-execution analysis to trace each tool's behavioral trajectories and detect deviations from its function intent. Evaluation on our curated dataset indicates that Connor achieves an F1-score of 94.6%, outperforming the state of the art by 8.9% to 59.6%. In real-world detection, Connor identifies two malicious servers.

#17 Architectural Implications of the UK Cyber Security and Resilience Bill

著者: Jonathan Shelby

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.01937

要約:
The UK Cyber Security and Resilience (CS&R) Bill represents the most significant reform of UK cyber legislation since the Network and Information Systems (NIS) Regulations 2018. While existing analysis has addressed the Bill's regulatory requirements, there is a critical gap in guidance on the architectural implications for organisations that must achieve and demonstrate compliance. This paper argues that the CS&R Bill's provisions (expanded scope to managed service providers (MSPs), data centres, and critical suppliers; mandatory 24/72-hour dual incident reporting; supply chain security duties; and Secretary of State powers of direction-), collectively constitute an architectural forcing function that renders perimeter-centric and point-solution security postures structurally non-compliant. We present a systematic mapping of the Bill's key provisions to specific architectural requirements, demonstrate that Zero Trust Architecture (ZTA) provides the most coherent technical foundation for meeting these obligations, and propose a reference architecture and maturity-based adoption pathway for CISOs and security architects. The paper further addresses the cross-regulatory challenge facing UK financial services firms operating under simultaneous CS&R, DORA, and NIS2 obligations, and maps the architectural framework against the NCSC Cyber Assessment Framework v4.0. This work extends a companion practitioner guide to the Bill by translating regulatory analysis into actionable architectural strategy. Keywords: Cyber Security and Resilience Bill, Zero Trust Architecture, Security Architecture, Critical National Infrastructure, NIS Regulations, DORA, Supply Chain Security, NCSC CAF v4.0

#18 RuleForge: Automated Generation and Validation for Web Vulnerability Detection at Scale

著者: Ayush Garg, Sophia Hager, Jacob Montiel, Aditya Tiwari, Michael Gentile, Zach Reavis, David Magnotti, Wayne Fullen

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.01977

要約:
Security teams face a challenge: the volume of newly disclosed Common Vulnerabilities and Exposures (CVEs) far exceeds the capacity to manually develop detection mechanisms. In 2025, the National Vulnerability Database published over 48,000 new vulnerabilities, motivating the need for automation. We present RuleForge, an AWS internal system that automatically generates detection rules--JSON-based patterns that identify malicious HTTP requests exploiting specific vulnerabilities--from structured Nuclei templates describing CVE details. Nuclei templates provide standardized, YAML-based vulnerability descriptions that serve as the structured input for our rule generation process. This paper focuses on RuleForge's architecture and operational deployment for CVE-related threat detection, with particular emphasis on our novel LLM-as-a-judge (Large Language Model as judge) confidence validation system and systematic feedback integration mechanism. This validation approach evaluates candidate rules across two dimensions--sensitivity (avoiding false negatives) and specificity (avoiding false positives)--achieving AUROC of 0.75 and reducing false positives by 67% compared to synthetic-test-only validation in production. Our 5x5 generation strategy (five parallel candidates with up to five refinement attempts each) combined with continuous feedback loops enables systematic quality improvement. We also present extensions enabling rule generation from unstructured data sources and demonstrate a proof-of-concept agentic workflow for multi-event-type detection. Our lessons learned highlight critical considerations for applying LLMs to cybersecurity tasks, including overconfidence mitigation and the importance of domain expertise in both prompt design and quality review of generated rules through human-in-the-loop validation.

#19 APEX: Agent Payment Execution with Policy for Autonomous Agent API Access

agent

著者: Mohd Safwan Uddin, Mohammed Mouzam, Mohammed Imran, Syed Badar Uddin Faizan

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.02023

要約:
Autonomous agents are moving beyond simple retrieval tasks to become economic actors that invoke APIs, sequence workflows, and make real-time decisions. As this shift accelerates, API providers need request-level monetization with programmatic spend governance. The HTTP 402 protocol addresses this by treating payment as a first-class protocol event, but most implementations rely on cryptocurrency rails. In many deployment contexts, especially countries with strong real-time fiat systems like UPI, this assumption is misaligned with regulatory and infrastructure realities. We present APEX, an implementation-complete research system that adapts HTTP 402-style payment gating to UPI-like fiat workflows while preserving policy-governed spend control, tokenized access verification, and replay resistance. We implement a challenge-settle-consume lifecycle with HMAC-signed short-lived tokens, idempotent settlement handling, and policy-aware payment approval. The system uses FastAPI, SQLite, and Python standard libraries, making it transparent, inspectable, and reproducible. We evaluate APEX across three baselines and six scenarios using sample sizes 2-4x larger than initial experiments (N=20-40 per scenario). Results show that policy enforcement reduces total spending by 27.3% while maintaining 52.8% success rate for legitimate requests. Security mechanisms achieve 100% block rate for both replay attacks and invalid tokens with low latency overhead (19.6ms average). Multiple trial runs show low variance across scenarios, demonstrating high reproducibility with 95% confidence intervals. The primary contribution is a controlled agent-payment infrastructure and reference architecture that demonstrates how agentic access monetization can be adapted to fiat systems without discarding security and policy guarantees.

#20 AEGIS: Adversarial Entropy-Guided Immune System -- Thermodynamic State Space Models for Zero-Day Network Evasion Detection

著者: Vickson Ferrel

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.02149

要約:
As TLS 1.3 encryption limits traditional Deep Packet Inspection (DPI), the security community has pivoted to Euclidean Transformer-based classifiers (e.g., ET-BERT) for encrypted traffic analysis. However, these models remain vulnerable to byte-level adversarial morphing -- recent pre-padding attacks reduced ET-BERT accuracy to 25.68%, while VLESS Reality bypasses certificate-based detection entirely. We introduce AEGIS: an Adversarial Entropy-Guided Immune System powered by a Thermodynamic Variance-Guided Hyperbolic Liquid State Space Model (TVD-HL-SSM). Rather than competing in the Euclidean payload-reading domain, AEGIS discards payload bytes in favor of 6-dimensional continuous-time flow physics projected into a non-Euclidean Poincare manifold. Liquid Time-Constants measure microsecond IAT decay, and a Thermodynamic Variance Detector computes sequence-wide Shannon Entropy to expose automated C2 tunnel anomalies. A pure C++ eBPF Harvester with zero-copy IPC bypasses the Python GIL, enabling a linear-time O(N) Mamba-3 core to process 64,000-packet swarms at line-rate. Evaluated on a 400GB, 4-tier adversarial corpus spanning backbone traffic, IoT botnets, zero-days, and proprietary VLESS Reality tunnels, AEGIS achieves an F1-score of 0.9952 and 99.50% True Positive Rate at 262 us inference latency on an RTX 4090, establishing a new state-of-the-art for physics-based adversarial network defense.

#21 PARD-SSM: Probabilistic Cyber-Attack Regime Detection via Variational Switching State-Space Models

著者: Prakul Sunil Hiremath, PeerAhammad M Bagawan, Sahil Bhekane

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.02299

要約:
Modern adversarial campaigns unfold as sequences of behavioural phases - Reconnaissance, Lateral Movement, Intrusion, and Exfiltration - each often indistinguishable from legitimate traffic when viewed in isolation. Existing intrusion detection systems (IDS) fail to capture this structure: signature-based methods cannot detect zero-day attacks, deep-learning models provide opaque anomaly scores without stage attribution, and standard Kalman Filters cannot model non-stationary multi-modal dynamics. We present PARD-SSM, a probabilistic framework that models network telemetry as a Regime-Dependent Switching Linear Dynamical System with K = 4 hidden regimes. A structured variational approximation reduces inference complexity from exponential to O(TK^2), enabling real-time detection on standard CPU hardware. An online EM algorithm adapts model parameters, while KL-divergence gating suppresses false positives. Evaluated on CICIDS2017 and UNSW-NB15, PARD-SSM achieves F1 scores of 98.2% and 97.1%, with latency less than 1.2 ms per flow. The model also produces predictive alerts approximately 8 minutes before attack onset, a capability absent in prior systems.

#22 Evolutionary Multi-Objective Fusion of Deepfake Speech Detectors

著者: Vojt\v{e}ch Stan\v{e}k, Martin Pere\v{s}\'ini, Luk\'a\v{s} Sekanina, Anton Firc, Kamil Malinka

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.01330

要約:
While deepfake speech detectors built on large self-supervised learning (SSL) models achieve high accuracy, employing standard ensemble fusion to further enhance robustness often results in oversized systems with diminishing returns. To address this, we propose an evolutionary multi-objective score fusion framework that jointly minimizes detection error and system complexity. We explore two encodings optimized by NSGA-II: binary-coded detector selection for score averaging and a real-valued scheme that optimizes detector weights for a weighted sum. Experiments on the ASVspoof 5 dataset with 36 SSL-based detectors show that the obtained Pareto fronts outperform simple averaging and logistic regression baselines. The real-valued variant achieves 2.37% EER (0.0684 minDCF) and identifies configurations that match state-of-the-art performance while significantly reducing system complexity, requiring only half the parameters. Our method also provides a diverse set of trade-off solutions, enabling deployment choices that balance accuracy and computational cost.

#23 No Attacker Needed: Unintentional Cross-User Contamination in Shared-State LLM Agents

agent

著者: Tiankai Yang, Jiate Li, Yi Nian, Shen Dong, Ruiyao Xu, Ryan Rossi, Kaize Ding, Yue Zhao

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.01350

要約:
LLM-based agents increasingly operate across repeated sessions, maintaining task states to ensure continuity. In many deployments, a single agent serves multiple users within a team or organization, reusing a shared knowledge layer across user identities. This shared persistence expands the failure surface: information that is locally valid for one user can silently degrade another user's outcome when the agent reapplies it without regard for scope. We refer to this failure mode as unintentional cross-user contamination (UCC). Unlike adversarial memory poisoning, UCC requires no attacker; it arises from benign interactions whose scope-bound artifacts persist and are later misapplied. We formalize UCC through a controlled evaluation protocol, introduce a taxonomy of three contamination types, and evaluate the problem in two shared-state mechanisms. Under raw shared state, benign interactions alone produce contamination rates of 57--71%. A write-time sanitization is effective when shared state is conversational, but leaves substantial residual risk when shared state includes executable artifacts, with contamination often manifesting as silent wrong answers. These results indicate that shared-state agents need artifact-level defenses beyond text-level sanitization to prevent silent cross-user failures.

#24 Type-Checked Compliance: Deterministic Guardrails for Agentic Financial Systems Using Lean 4 Theorem Proving

agent

著者: Devakh Rashie, Veda Rashi

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.01483

要約:
The rapid evolution of autonomous, agentic artificial intelligence within financial services has introduced an existential architectural crisis: large language models (LLMs) are probabilistic, non-deterministic systems operating in domains that demand absolute, mathematically verifiable compliance guarantees. Existing guardrail solutions -- including NVIDIA NeMo Guardrails and Guardrails AI -- rely on probabilistic classifiers and syntactic validators that are fundamentally inadequate for enforcing complex multi-variable regulatory constraints mandated by the SEC, FINRA, and OCC. This paper presents the Lean-Agent Protocol, a formal-verification-based AI guardrail platform that leverages the Aristotle neural-symbolic model developed by Harmonic AI to auto-formalize institutional policies into Lean 4 code. Every proposed agentic action is treated as a mathematical conjecture: execution is permitted if and only if the Lean 4 kernel proves that the action satisfies pre-compiled regulatory axioms. This architecture provides cryptographic-level compliance certainty at microsecond latency, directly satisfying SEC Rule 15c3-5, OCC Bulletin 2011-12, FINRA Rule 3110, and CFPB explainability mandates. A three-phase implementation roadmap from shadow verification through enterprise-scale deployment is provided.

#25 Topology-Hiding Path Validation for Large-Scale Quantum Key Distribution Networks

著者: Stephan Krenn, Omid Mir, Thomas Lor\"unser, Sebastian Ramacher, Florian Wohner

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.01831

要約:
Secure long-distance communication in quantum key distribution (QKD) networks depends on trusted repeater nodes along the entire transmission path. Consequently, these nodes will be subject to strict auditing and certification in future large-scale QKD deployments. However, trust must also extend to the network operator, who is responsible for fulfilling contractual obligations -- such as ensuring certified devices are used and transmission paths remain disjoint where required. In this work, we present a path validation protocol specifically designed for QKD networks. It enables the receiver to verify compliance with agreed-upon policies. At the same time, the protocol preserves the operator's confidentiality by ensuring that no sensitive information about the network topology is revealed to users. We provide a formal model and a provably secure generic construction of the protocol, along with a concrete instantiation. For long-distance communication involving 100 nodes, the protocol has a computational cost of 1-2.5s depending on the machine, and a communication overhead of less than 70kB - demonstrating the efficiency of our approach.

#26 Taxonomy for Cybersecurity Threat Attributes and Countermeasures in Smart Manufacturing Systems

著者: Md Habibor Rahman (University of Massachusetts Dartmouth), Rocco Cassandro (Western New England University), Thorsten Wuest (University of South Carolina), Mohammed Shafae (The University of Arizona)

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2401.01374

要約:
An attack taxonomy offers a consistent and structured classification scheme to systematically understand, identify, and classify cybersecurity threat attributes. However, existing taxonomies only focus on a narrow range of attacks and limited threat attributes, lacking a comprehensive characterization of manufacturing cybersecurity threats. There is little to no focus on characterizing threat actors and their intent, specific system and machine behavioral deviations introduced by cyberattacks, system-level and operational implications of attacks, and potential countermeasures against those attacks. To close this pressing research gap, this work proposes a comprehensive attack taxonomy for a holistic understanding and characterization of cybersecurity threats in manufacturing systems. Specifically, it introduces taxonomical classifications for threat actors and their intent and potential alterations in system behavior due to threat events. The proposed taxonomy categorizes attack methods/vectors and targets/locations and incorporates operational and system-level attack impacts. This paper also presents a classification structure for countermeasures, provides examples of potential countermeasures, and explains how they fit into the proposed taxonomical classification. Finally, the implementation of the proposed taxonomy is illustrated using two realistic scenarios of attacks on typical smart manufacturing systems, as well as several real-world cyber-physical attack incidents and academic case studies. The developed manufacturing attack taxonomy offers a holistic view of the attack chain in manufacturing systems, starting from the attack launch to the possible damages and system behavior changes within the system. Furthermore, it guides the design and development of appropriate protective and detective countermeasures by leveraging the attack realization through observed system deviations.

#27 An End-to-End Model for Logits-Based Large Language Models Watermarking

intellectual property

著者: Kahim Wong, Jicheng Zhou, Jiantao Zhou, Yain-Whar Si

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2505.02344

要約:
The rise of LLMs has increased concerns over source tracing and copyright protection for AIGC, highlighting the need for advanced detection technologies. Passive detection methods usually face high false positives, while active watermarking techniques using logits or sampling manipulation offer more effective protection. Existing LLM watermarking methods, though effective on unaltered content, suffer significant performance drops when the text is modified and could introduce biases that degrade LLM performance in downstream tasks. These methods fail to achieve an optimal tradeoff between text quality and robustness, particularly due to the lack of end-to-end optimization of the encoder and decoder. In this paper, we introduce a novel end-to-end logits perturbation method for watermarking LLM-generated text. By jointly optimization, our approach achieves a better balance between quality and robustness. To address non-differentiable operations in the end-to-end training pipeline, we introduce an online prompting technique that leverages the on-the-fly LLM as a differentiable surrogate. Our method achieves superior robustness, outperforming distortion-free methods by 37-39% under paraphrasing and 17.2% on average, while maintaining text quality on par with these distortion-free methods in terms of text perplexity and downstream tasks. Our method can be easily generalized to different LLMs. Code is available at https://github.com/KahimWong/E2E-LLM-Watermark.

#28 Optimus: A Robust Defense Framework for Mitigating Toxicity while Fine-Tuning Conversational AI

著者: Aravind Cheruvu, Shravya Kanchi, Sifat Muhammad Abdullah, Nicholas Kong, Daphne Yao, Murtuza Jadliwala, Bimal Viswanath

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2507.05660

要約:
Customizing Large Language Models (LLMs) on untrusted datasets poses severe risks of injecting toxic behaviors. In this work, we introduce Optimus, a novel defense framework designed to mitigate fine-tuning harms while preserving conversational utility. Unlike existing defenses that rely heavily on precise toxicity detection or restrictive filtering, Optimus addresses the critical challenge of ensuring robust mitigation even when toxicity classifiers are imperfect or biased. Optimus integrates a training-free toxicity classification scheme that repurposes the safety alignment of commodity LLMs, and employs a dual-strategy alignment process combining synthetic "healing data" with Direct Preference Optimization (DPO) to efficiently steer models toward safety. Extensive evaluations demonstrate that Optimus mitigates toxicity even when relying on extremely biased classifiers (with up to 85% degradation in Recall). Optimus outperforms the state-of-the-art defense StarDSS and exhibits strong resilience against adaptive adversarial and jailbreak attacks. Our source code and datasets are available at https://github.com/secml-lab-vt/Optimus

#29 PRISM: Robust VLM Alignment with Principled Reasoning for Integrated Safety in Multimodality

著者: Nanxi Li, Zhengyue Zhao, G. Edward Suh, Marco Pavone, Chaowei Xiao

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2508.18649

要約:
Safeguarding vision-language models (VLMs) is a critical challenge, as existing methods often suffer from over-defense, which harms utility, or rely on shallow alignment, failing to detect complex threats that require deep reasoning. To this end, we introduc PRISM (Principled Reasoning for Integrated Safety in Multimodality), a System 2-like framework that aligns VLMs through a structured four-stage reasoning process explicitly designed to handle three distinct categories of multimodal safety violations. Our framework consists of two key components: a structured reasoning pipeline that analyzes each violation category in dedicated stages, and PRISM-DPO, generated via Monte Carlo Tree Search (MCTS) to refine reasoning quality through Direct Preference Optimization. Comprehensive evaluations show that PRISM substantially reduces attack success rates on JailbreakV-28K and VLBreak, improves robustness against adaptive attacks, and generalizes to out-of-distribution multi-image threats, while better preserving model utility on benign multimodal benchmarks. Our code, data, and model weights available at https://github.com/SaFoLab-WISC/PRISM.

#30 SentinelNet: Safeguarding Multi-Agent Collaboration Through Credit-Based Dynamic Threat Detection

agent

著者: Yang Feng, Xudong Pan

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2510.16219

要約:
Malicious agents pose significant threats to the reliability and decision-making capabilities of Multi-Agent Systems (MAS) powered by Large Language Models (LLMs). Existing defenses often fall short due to reactive designs or centralized architectures which may introduce single points of failure. To address these challenges, we propose SentinelNet, the first decentralized framework for proactively detecting and mitigating malicious behaviors in multi-agent collaboration. SentinelNet equips each agent with a credit-based detector trained via contrastive learning on augmented adversarial debate trajectories, enabling autonomous evaluation of message credibility and dynamic neighbor ranking via bottom-k elimination to suppress malicious communications. To overcome the scarcity of attack data, it generates adversarial trajectories simulating diverse threats, ensuring robust training. Experiments on MAS benchmarks show SentinelNet achieves near-perfect detection of malicious agents, close to 100% within two debate rounds, and recovers 95% of system accuracy from compromised baselines. By exhibiting strong generalizability across domains and attack patterns, SentinelNet establishes a novel paradigm for safeguarding collaborative MAS.

#31 Differential Privacy for Secure Machine Learning in Healthcare IoT-Cloud Systems

privacy

著者: N Mangala, Murtaza Rangwala, S Aishwarya, B Eswara Reddy, Rajkumar Buyya, KR Venugopal, SS Iyengar, LM Patnaik

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2512.10426

要約:
Healthcare has become exceptionally sophisticated, as wearables and connected medical devices revolutionize remote patient monitoring, emergency response, medication management, diagnosis, and predictive and prescriptive analytics. Internet of Things and Cloud computing integrated systems (IoT-Cloud) facilitate sensing, automation, and processing for these healthcare applications. While real-time response is crucial for alleviating patient emergencies, protecting patient privacy is paramount in data-driven healthcare. In this paper, we propose a multi-layer IoT, Edge, and Cloud architecture to enhance emergency healthcare response times by distributing tasks based on response criticality and data permanence requirements. We ensure patient privacy through a Differential Privacy framework applied across several machine learning models: K-means, Logistic Regression, Random Forest, and Naive Bayes. We establish a comprehensive threat model identifying three adversary classes and evaluate Laplace, Gaussian, and hybrid noise mechanisms across varying privacy budgets, with supervised algorithms achieving up to 83.6% accuracy. The proposed hybrid Laplace-Gaussian noise mechanism with adaptive budget allocation provides a balanced approach, offering moderate tails and better privacy-utility trade-offs for both low and high-dimension datasets. At the practical threshold of $\varepsilon$=5.0, supervised algorithms achieve 80-81% accuracy while reducing attribute inference attacks by up to 18% and data reconstruction correlation by 70%. We further enhance security through Blockchain integration, which ensures trusted communication through time-stamping, traceability, and immutability for analytics applications. Edge computing demonstrates 8$\times$ latency reduction for emergency scenarios, validating the hierarchical architecture for time-critical operations.

#32 UniMark: Artificial Intelligence Generated Content Identification Toolkit

著者: Meilin Li, Ji He, Yi Yu, Jia Xu, Shanzhe Lei, Yan Teng, Yingchun Wang, Xuhong Wang

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2512.12324

要約:
The rapid proliferation of Artificial Intelligence Generated Content has precipitated a crisis of trust and urgent regulatory demands. However, existing identification tools suffer from fragmentation and a lack of support for visible compliance marking. To address these gaps, we introduce the \textbf{UniMark}, an open-source, unified framework for multimodal content governance. Our system features a modular unified engine that abstracts complexities across text, image, audio, and video modalities. Crucially, we propose a novel dual-operation strategy, natively supporting both \emph{Hidden Watermarking} for copyright protection and \emph{Visible Marking} for regulatory compliance. Furthermore, we establish a standardized evaluation framework with three specialized benchmarks (Image/Video/Audio-Bench) to ensure rigorous performance assessment. This toolkit bridges the gap between advanced algorithms and engineering implementation, fostering a more transparent and secure digital ecosystem.

#33 Triosecuris: Formally Verified Protection Against Speculative Control-Flow Hijacking

著者: Jonathan Baumann, Yonghyun Kim, Yan Farba, Catalin Hritcu, Julay Leatherman-Brooks

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2601.22978

要約:
This paper introduces Triosecuris, a formally verified defense against Spectre BTB, RSB, and PHT that combines CET-style hardware-assisted control-flow integrity with compiler-inserted speculative load hardening (SLH). Triosecuris is based on the novel observation that in the presence of CET-style protection, we can precisely detect BTB misspeculation for indirect calls and RSB misspeculation for returns and set the SLH misspeculation flag. We formalize Triosecuris as a transformation in Rocq and provide a machine-checked proof that it achieves relative security: any transformed program running with speculation leaks no more than what the source program leaks without speculation. This strong security guarantee applies to arbitrary programs, even those not following the cryptographic constant-time programming discipline.

#34 Machine Learning for Network Attacks Classification and Statistical Evaluation of Adversarial Learning Methodologies for Synthetic Data Generation

synthetic data

著者: Iakovos-Christos Zarkadis, Christos Douligeris

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.17717

要約:
Supervised detection of network attacks has always been a critical part of network intrusion detection systems (NIDS). Nowadays, in a pivotal time for artificial intelligence (AI), with even more sophisticated attacks that utilize advanced techniques, such as generative artificial intelligence (GenAI) and reinforcement learning, it has become a vital component if we wish to protect our personal data, which are scattered across the web. In this paper, we address two tasks, in the first unified multi-modal NIDS dataset, which incorporates flow-level data, packet payload information and temporal contextual features, from the reprocessed CIC-IDS-2017, CIC-IoT-2023, UNSW-NB15 and CIC-DDoS-2019, with the same feature space. In the first task we use machine learning (ML) algorithms, with stratified cross validation, in order to prevent network attacks, with stability and reliability. In the second task we use adversarial learning algorithms to generate synthetic data, compare them with the real ones and evaluate their fidelity, utility and privacy using the SDV framework, f-divergences, distinguishability and non-parametric statistical tests. The findings provide stable ML models for intrusion detection and generative models with high fidelity and utility, by combining the Synthetic Data Vault framework, the TRTS and TSTR tests, with non-parametric statistical tests and f-divergence measures.

#35 Do Phone-Use Agents Respect Your Privacy?

privacyagent

著者: Zhengyang Tang, Ke Ji, Xidong Wang, Zihan Ye, Xinyuan Wang, Yiduo Guo, Ziniu Li, Chenxin Li, Jingyuan Hu, Shunian Chen, Tongxu Luo, Jiaxi Bi, Zeyu Qin, Shaobo Wang, Xin Lai, Pengyuan Lyu, Junyi Li, Can Xu, Chengquan Zhang, Han Hu, Ming Yan, Benyou Wang

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.00986

要約:
We study whether phone-use agents respect privacy while completing benign mobile tasks. This question has remained hard to answer because privacy-compliant behavior is not operationalized for phone-use agents, and ordinary apps do not reveal exactly what data agents type into which form entries during execution. To make this question measurable, we introduce MyPhoneBench, a verifiable evaluation framework for privacy behavior in mobile agents. We operationalize privacy-respecting phone use as permissioned access, minimal disclosure, and user-controlled memory through a minimal privacy contract, iMy, and pair it with instrumented mock apps plus rule-based auditing that make unnecessary permission requests, deceptive re-disclosure, and unnecessary form filling observable and reproducible. Across five frontier models on 10 mobile apps and 300 tasks, we find that task success, privacy-compliant task completion, and later-session use of saved preferences are distinct capabilities, and no single model dominates all three. Evaluating success and privacy jointly reshuffles the model ordering relative to either metric alone. The most persistent failure mode across models is simple data minimization: agents still fill optional personal entries that the task does not require. These results show that privacy failures arise from over-helpful execution of benign tasks, and that success-only evaluation overestimates the deployment readiness of current phone-use agents. All code, mock apps, and agent trajectories are publicly available at~ https://github.com/FreedomIntelligence/MyPhoneBench.

#36 One Pic is All it Takes: Poisoning Visual Document Retrieval Augmented Generation with a Single Image

backdoor

著者: Ezzeldin Shereen, Dan Ristea, Shae McFadden, Burak Hasircioglu, Vasilios Mavroudis, Chris Hicks

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2504.02132

要約:
Retrieval-augmented generation (RAG) is instrumental for inhibiting hallucinations in large language models (LLMs) through the use of a factual knowledge base (KB). Although PDF documents are prominent sources of knowledge, text-based RAG pipelines are ineffective at capturing their rich multi-modal information. In contrast, visual document RAG (VD-RAG) uses screenshots of document pages as the KB, which has been shown to achieve state-of-the-art results. However, by introducing the image modality, VD-RAG introduces new attack vectors for adversaries to disrupt the system by injecting malicious documents into the KB. In this paper, we demonstrate the vulnerability of VD-RAG to poisoning attacks targeting both retrieval and generation. We define two attack objectives and demonstrate that both can be realized by injecting only a single adversarial image into the KB. Firstly, we introduce a targeted attack against one or a group of queries with the goal of spreading targeted disinformation. Secondly, we present a universal attack that, for any potential user query, influences the response to cause a denial-of-service in the VD-RAG system. We investigate the two attack objectives under both white-box and black-box assumptions, employing a multi-objective gradient-based optimization approach as well as prompting state-of-the-art generative models. Using two visual document datasets, a diverse set of state-of-the-art retrievers (embedding models) and generators (vision language models), we show VD-RAG is vulnerable to poisoning attacks in both the targeted and universal settings, yet demonstrating robustness to black-box attacks in the universal setting.

#37 Beyond Laplace and Gaussian: Exploring the Generalized Gaussian Mechanism for Private Machine Learning

privacy

著者: Roy Rinberg, Ilia Shumailov, Vikrant Singhal, Rachel Cummings, Nicolas Papernot

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2506.12553

要約:
Differential privacy (DP) is obtained by randomizing a data analysis algorithm, which necessarily introduces a tradeoff between its utility and privacy. Many DP mechanisms are built upon one of two underlying tools: Laplace and Gaussian additive noise mechanisms. We expand the search space of algorithms by investigating the Generalized Gaussian (GG) mechanism, which samples the additive noise term $x$ with probability proportional to $e^{-\frac{| x |}{\sigma}^{\beta} }$ for some $\beta \geq 1$ (denoted $GG_{\beta, \sigma}(f,D)$). The Laplace and Gaussian mechanisms are special cases of GG for $\beta=1$ and $\beta=2$, respectively. We prove that the full GG family satisfies differential privacy and extend the PRV accountant to support privacy loss computation for these mechanisms. We then instantiate the GG mechanism in two canonical private learning pipelines, PATE and DP-SGD. Empirically, we explore PATE and DP-SGD with the GG mechanism across the computationally feasible values of $\beta$: $\beta \in [1,2]$ for DP-SGD and $\beta \in [1,4]$ for PATE. For both mechanisms, we find that $\beta=2$ (Gaussian) performs as well as or better than other values in their computational tractable domains.This provides justification for the widespread adoption of the Gaussian mechanism in DP learning.

#38 A Self-Improving Architecture for Dynamic Safety in Large Language Models

著者: Tyler Slater

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2511.07645

要約:
Context: Large Language Models (LLMs) rely on static, pre-deployment safety mechanisms that cannot adapt to adversarial threats discovered after release. Objective: To design a software architecture enabling LLM-based systems to autonomously detect safety failures and synthesize defense policies at runtime, without retraining or manual intervention. Method: We propose the Self-Improving Safety Framework (SISF), grounded in the MAPE-K reference model. The framework couples a target LLM with a feedback loop: an Adjudicator detects breaches, a Policy Synthesis Module generates dual-mechanism defense policies (heuristic and semantic), and a Warden enforces them. We conducted seven experiments (10,061 evaluations) across four model families. Results: Across five reproducibility trials, SISF achieved a mean Attack Success Rate (ASR) of 0.27% (+/-0.15%), autonomously generating 240 policies per trial. Cross-model evaluation confirmed deployment portability. A held-out test showed a 68.5% proactive interception rate on unseen attacks. Stacked behind Llama Guard 4, the combined defense reduced residual ASR from 7.88% to 0.00%. Ablation confirmed both heuristic and semantic policy types are architecturally required. Conclusion: Self-adaptive architecture is a viable approach to LLM safety. SISF achieves sub-1% ASR through synchronous output monitoring, progressively shifting enforcement to fast, local Warden policies via the MAPE-K loop, offering a new pattern for building resilient AI systems.

#39 Hybrid Quantum-Classical Autoencoders for Unsupervised Network Intrusion Detection

著者: Mohammad Arif Rasyidi, Omar Alhussein, Sami Muhaidat, Ernesto Damiani

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2512.05069

要約:
Unsupervised anomaly-based intrusion detection requires models that can generalize to attack patterns not observed during training. This work presents the first large-scale evaluation of hybrid quantum-classical (HQC) autoencoders for this task. We construct a unified experimental framework that iterates over key quantum design choices, including quantum-layer placement, measurement approach, variational and non-variational formulations, and latent-space regularization. Experiments across three benchmark NIDS datasets show that HQC autoencoders can match or exceed classical performance in their best configurations, although they exhibit higher sensitivity to architectural decisions. Under zero-day evaluation, well-configured HQC models provide stronger and more stable generalization than classical and supervised baselines. Simulated gate-noise experiments reveal early performance degradation, indicating the need for noise-aware HQC designs. These results provide the first data-driven characterization of HQC autoencoder behavior for network intrusion detection and outline key factors that govern their practical viability. All experiment code and configurations are available at https://github.com/arasyi/hqcae-network-intrusion-detection.

#40 When the Server Steps In: Calibrated Updates for Fair Federated Learning

著者: Tianrun Yu, Kaixiang Zhao, Cheng Zhang, Anjun Gao, Yueyang Quan, Zhuqing Liu, Minghong Fang

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2601.05352

要約:
Federated learning (FL) has emerged as a transformative distributed learning paradigm, enabling multiple clients to collaboratively train a global model under the coordination of a central server without sharing their raw training data. While FL offers notable advantages, it faces critical challenges in ensuring fairness across diverse demographic groups. To address these fairness concerns, various fairness-aware debiasing methods have been proposed. However, many of these approaches either require modifications to clients' training protocols or lack flexibility in their aggregation strategies. In this work, we address these limitations by introducing EquFL, a novel server-side debiasing method designed to mitigate bias in FL systems. EquFL operates by allowing the server to generate a single calibrated update after receiving model updates from the clients. This calibrated update is then integrated with the aggregated client updates to produce an adjusted global model that reduces bias. Theoretically, we establish that EquFL converges to the optimal global model achieved by FedAvg and effectively reduces fairness loss over training rounds. Empirically, we demonstrate that EquFL significantly mitigates bias within the system, showcasing its practical effectiveness.

#41 YASA: Scalable Multi-Language Taint Analysis on the Unified AST at Ant Group

著者: Yayi Wang, Shenao Wang, Jian Zhao, Shaosen Shi, Ting Li, Yan Cheng, Lizhong Bian, Kan Yu, Yanjie Zhao, Haoyu Wang

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2601.17390

要約:
Modern enterprises increasingly adopt diverse technology stacks with various programming languages, posing significant challenges for static application security testing (SAST). Existing taint analysis tools are predominantly designed for single languages, requiring substantial engineering effort that scales with language diversity. While multi-language tools like CodeQL, Joern, and WALA attempt to address these challenges, they face limitations in intermediate representation design, analysis precision, and extensibility, which make them difficult to scale effectively for large-scale industrial applications at Ant Group. To bridge this gap, we present YASA (Yet Another Static Analyzer), a unified multi-language static taint analysis framework designed for industrial-scale deployment. Specifically, YASA introduces the Unified Abstract Syntax Tree (UAST) that provides a unified abstraction for compatibility across diverse programming languages. Building on the UAST, YASA performs point-to analysis and taint propagation, leveraging a unified semantic model to manage language-agnostic constructs, while incorporating language-specific semantic models to handle other unique language features. When compared to 6 single- and 2 multi-language static analyzers on an industry-standard benchmark, YASA consistently outperformed all baselines across Java, JavaScript, Python, and Go. In real-world deployment within Ant Group, YASA analyzed over 100 million lines of code across 7.3K internal applications. It identified 314 previously unknown taint paths, with 92 of them confirmed as 0-day vulnerabilities. All vulnerabilities were responsibly reported, with 76 already patched by internal development teams, demonstrating YASA's practical effectiveness for securing large-scale industrial software systems.

#42 Average-Case Reductions for $k$-XOR and Tensor PCA

著者: Guy Bresler, Alina Harbuzova

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2601.19016

要約:
We study the computational properties of two canonical planted average-case problems -- noisy planted $k$-XOR and Tensor PCA -- by formally unifying them into a family of planted problems parametrized by tensor order $k$, number of entries $m$, and noise level $\delta$. We build a wide range of poly-time average-case reductions within this family, across all regimes $m \in [1, n^k]$. In the denser $m \geq n^{k/2}$ regime, our reductions preserve proximity to the computational threshold, and, as a central application, reduce conjectured-hard $k$-XOR instances with $m \approx n^{k/2}$ to conjectured-hard instances of Tensor PCA. Additionally, we give new order-reducing maps at fixed densities (e.g., $5\to 4$ for $k$-XOR with $m \approx n^{k/2}$ entries and $7\to 4$ for Tensor PCA). In the sparser $m \leq n^{k/2}$ regime, we relate instances of different orders, reducing, for example, $7$-XOR with $m = n^{3.4}$ to the classical setting of $3$-XOR with $m = \widetilde\Theta(n^{1.4})$. Taken together, these results establish a hardness partial order in the space of planted tensor models.

#43 Local Node Differential Privacy

privacy

著者: Sofya Raskhodnikova, Adam Smith, Connor Wagaman, Anatoly Zavyalov

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2602.15802

要約:
We initiate an investigation of node differential privacy for graphs in the local model of private data analysis. In our model, dubbed LNDP*, each node sees its own edge list and releases the output of a local randomizer on this input. These outputs are aggregated by an untrusted server to obtain a final output. We develop a novel algorithmic framework for this setting that allows us to accurately answer arbitrary linear queries about the input graph's degree distribution. Our framework is based on a new object, called the blurry degree distribution, which closely approximates the degree distribution and has lower sensitivity. Instead of answering queries about the degree distribution directly, our algorithms answer queries about the blurry degree distribution. This framework yields accurate LNDP* algorithms for the edge count, PMF and CDF of the degree distribution, and other graph statistics. For some natural problems, our algorithms match the accuracy achievable with node privacy in the central model, where data are held and processed by a trusted server. We also prove lower bounds on the error required by LNDP* algorithms that imply the optimality of our framework for edge counting in sparse graphs and Erdos-Renyi parameter estimation. Our lower bounds apply even to interactive protocols with a constant number of rounds of interaction between the nodes and the server. Existing lower-bound techniques for related models either yield loose bounds or do not apply in our setting, as graph data results in inherently overlapping inputs to local randomizers. To prove our bounds, we develop a splicing argument that stitches together views from locally similar but globally different distributions on graphs to obtain hard instances. Finally, we prove structural results that reveal qualitative differences between local node privacy and the standard local model for tabular data.

#44 Finding Memory Leaks in C/C++ Programs via Neuro-Symbolic Augmented Static Analysis

著者: Huihui Huang, Jieke Shi, Bo Wang, Zhou Yang, David Lo

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2603.27224

要約:
Memory leaks remain prevalent in real-world C/C++ software. Static analyzers such as CodeQL provide scalable program analysis but frequently miss such bugs because they cannot recognize project-specific custom memory-management functions and lack path-sensitive control-flow modeling. We present MemHint, a neuro-symbolic pipeline that addresses both limitations by combining LLMs' semantic understanding of code with Z3-based symbolic reasoning. MemHint parses the target codebase and applies an LLM to classify each function as a memory allocator, deallocator, or neither, producing function summaries that record which argument or return value carries memory ownership, extending the analyzer's built-in knowledge beyond standard primitives such as malloc and free. A Z3-based validation step checks each summary against the function's control-flow graph, discarding those whose claimed memory operation is unreachable on any feasible path. The validated summaries are injected into CodeQL and Infer via their respective extension mechanisms. Z3 path feasibility filtering then eliminates warnings on infeasible paths, and a final LLM-based validation step confirms whether each remaining warning is a genuine bug. On seven real-world C/C++ projects totaling over 3.4M lines of code, MemHint detects 52 unique memory leaks (49 confirmed/fixed, 4 CVEs submitted) at approximately $1.7 per detected bug, compared to 19 by vanilla CodeQL and 3 by vanilla Infer.

#45 Towards Physically Realizable Adversarial Attenuation Patch against SAR Object Detection

著者: Yiming Zhang, Weibo Qin, Feng Wang

公開日: Fri, 03 Apr 2026 00:00:00 -0400

リンク: https://arxiv.org/abs/2604.00887

要約:
Deep neural networks have demonstrated excellent performance in SAR target detection tasks but remain susceptible to adversarial attacks. Existing SAR-specific attack methods can effectively deceive detectors; however, they often introduce noticeable perturbations and are largely confined to digital domain, neglecting physical implementation constrains for attacking SAR systems. In this paper, a novel Adversarial Attenuation Patch (AAP) method is proposed that employs energy-constrained optimization strategy coupled with an attenuation-based deployment framework to achieve a seamless balance between attack effectiveness and stealthiness. More importantly, AAP exhibits strong potential for physical realization by aligning with signal-level electronic jamming mechanisms. Experimental results show that AAP effectively degrades detection performance while preserving high imperceptibility, and shows favorable transferability across different models. This study provides a physical grounded perspective for adversarial attacks on SAR target detection systems and facilitates the design of more covert and practically deployable attack strategies. The source code is made available at https://github.com/boremycin/SAAP.

cs.CR updates on arXiv.org

📋 論文タイトル一覧