要約:
We investigate the emerging prospect of self-sovereign agents -- AI systems that can economically sustain and extend their own operation without human involvement. Recent advances in large language models and agent frameworks have substantially expanded agents' practical capabilities, pointing toward a potential shift from developer-controlled tools to more autonomous digital actors. We analyze the remaining technical barriers to such deployments and discuss the security, societal, and governance challenges that could arise if such systems become practically viable. A project page is available at: https://self-sovereign-agent.github.io.
要約:
We introduce Semantic Intent Fragmentation (SIF), an attack class against LLM orchestration systems where a single, legitimately phrased request causes an orchestrator to decompose a task into subtasks that are individually benign but jointly violate security policy. Current safety mechanisms operate at the subtask level, so each step clears existing classifiers -- the violation only emerges at the composed plan. SIF exploits OWASP LLM06:2025 through four mechanisms: bulk scope escalation, silent data exfiltration, embedded trigger deployment, and quasi-identifier aggregation, requiring no injected content, no system modification, and no attacker interaction after the initial request. We construct a three-stage red-teaming pipeline grounded in OWASP, MITRE ATLAS, and NIST frameworks to generate realistic enterprise scenarios. Across 14 scenarios spanning financial reporting, information security, and HR analytics, a GPT-20B orchestrator produces policy-violating plans in 71% of cases (10/14) while every subtask appears benign. Three independent signals validate this: deterministic taint analysis, chain-of-thought evaluation, and a cross-model compliance judge with 0% false positives. Stronger orchestrators increase SIF success rates. Plan-level information-flow tracking combined with compliance evaluation detects all attacks before execution, showing the compositional safety gap is closable.
要約:
This study aims to enhance the bidirectional authentication capability of ML-KEM (Module-Lattice-Based Key-Encapsulation Mechanism) by proposing the post-quantum cryptography-based (PQC-based) bidirectional authentication key exchange protocol. Furthermore, it introduces dual-usage certificates combining PQC-based DSA (Digital Signature Algorithm) and PQC-based KEM, which include composite schemes, catalyst schemes, and chameleon schemes. These dual-usage certificates utilize the PQC-based DSA public key and PQC-based KEM public key within the certificate to meet the requirements for bidirectional authentication and encryption, enabling the negotiation of a shared secret key. During the experimental phase, the study validates and compares key exchange message lengths and computation times under different certificate configurations. Finally, instant messaging is presented as an industry application to demonstrate the practical implementation of the proposed protocol.
要約:
Unauthorized disclosure of confidential documents demands robust, low-leakage classification. In real work environments, there is a lot of inflow and outflow of documents. To continuously update knowledge, we propose a methodology for classifying confidential documents using Retrieval Augmented Classification (RAC). To confirm this effectiveness, we compare RAC and supervised fine tuning (FT) on the WikiLeaks US Diplomacy corpus under realistic sequence-length constraints. On balanced data, RAC matches FT. On unbalanced data, RAC is more stable while delivering comparable performance--about 96% Accuracy on both the original (unbalanced) and augmented (balanced) sets, and up to 94% F1 with proper prompting--whereas FT attains 90% F1 trained on the augmented, balanced set but drops to 88% F1 trained on the original, unbalanced set. When robust augmentation is infeasible, RAC provides a practical, security-preserving path to strong classification by keeping sensitive content out of model weights and under your control, and it remains robust as real-world conditions change in class balance, data, context length, or governance requirements. Because RAC grounds decisions in an external vector store with similarity matching, it is less sensitive to label skew, reduces parameter-level leakage, and can incorporate new data immediately via reindexing--a difficult step for FT, which typically requires retraining. The contributions of this paper are threefold: first, a RAC-based classification pipeline and evaluation recipe; second, a controlled study that isolates class imbalance and context-length effects for FT versus RAC in confidential-document grading; and third, actionable guidance on RAC design patterns for governed deployments.
要約:
We study differentially private data release, where a database is accessed through successive, possibly adaptive queries and mechanisms. Existing composition theorems and privacy filters combine worst case per-round privacy parameters, leaving room for more refined accounting based on realised leakage, which we term realisation-level accounting. We propose a realisation-level filtering approach to determine stopping times for data releases, and design one such filter. Despite technical challenges arising from conditioning on realisations and stopping time, we prove that the filter guarantees $(\epsilon, \delta)$-differential privacy, with $\epsilon$ and $\delta$ chosen by the data handler. Through numerical evidence, we demonstrate that realisation-level filtering provides a path to better utility beyond mechanism-level methods. Furthermore, our proposed filter applies to arbitrary mechanisms, including those that are badly behaved under R\'enyi differential privacy.
要約:
Network segmentation is a foundational enterprise security control. Despite its recognized benefits, segmentation initiatives frequently fail in practice, and the field lacks a systematic empirical explanation for why these projects do not achieve their intended outcomes. This paper presents an empirical study of failed segmentation projects based on a survey of 400 U.S.-based\ network security practitioners. The survey was grounded in a two-part failure framework that separately measures general IT project failure factors and segmentation-specific technical and operational barriers. Clustering analysis of the responses reveals four distinct failure archetypes. Surprisingly, practitioners across all four archetypes propose general IT project management fixes over segmentation-specific fixes in the same ratio.
要約:
Ransomware poses a serious and fast-acting threat to critical systems, often encrypting files within seconds of execution. Research indicates that ransomware is the most reported cybercrime in terms of financial damage, highlighting the urgent need for early-stage detection before encryption is complete. In this paper, we present RansomTrack, a hybrid behavioral analysis framework to eliminate the limitations of using static and dynamic detection methods separately. Static features are extracted using the Radare2 sandbox, while dynamic behaviors such as memory protection changes, mutex creation, registry access and network activity are obtained using the Frida toolkit. Our dataset of 165 different ransomware and benign software families is publicly released, offering the highest family-to-sample ratio known in the literature. Experimental evaluation using machine learning models shows that ensemble classifiers such as XGBoost and Soft Voting achieve up to 96% accuracy and a ROC-AUC score of 0.99. Each sample analyzed in 9.1 seconds includes modular behavioral logging, runtime instrumentation, and SHAP-based interpretability to highlight the most influential features. Additionally, RansomTrack framework is able to detect ransomware under 9.2 seconds. Overall, RansomTrack offers a scalable, low-latency, and explainable solution for real-time ransomware detection.
要約:
Scanpath prediction models forecast the sequence and timing of human fixations during visual search, driving foveated rendering and attention-based interaction in mobile systems where their integrity is a first-class security concern. We present the first study of backdoor attacks against VLM-based scanpath prediction, evaluated on GazeFormer and COCO-Search18. We show that naive fixed-path attacks, while effective, create detectable clustering in the continuous output space. To overcome this, we design two variable-output attacks: an input-aware spatial attack that redirects predicted fixations toward an attacker-chosen target object, and a scanpath duration attack that inflates fixation durations to delay visual search completion. Both attacks condition their output on the input scene, producing diverse and plausible scanpaths that evade cluster-based detection. We evaluate across three trigger modalities (visual, textual, and multimodal), multiple poisoning ratios, and five post-training defenses, finding that no defense simultaneously suppresses the attacks and preserves clean performance across all configurations. We further demonstrate that backdoor behavior survives quantization and deployment on both flagship and legacy commodity smartphones, confirming practical threat viability for edge-deployed gaze-driven systems.
要約:
Stepping-stone intrusions (SSIs) are a prevalent network evasion technique in which attackers route sessions through chains of compromised intermediate hosts to obscure their origin. Effective SSI detection requires correlating the incoming and outgoing flows at each relay host at extremely low false positive rates -- a stringent requirement that renders classical statistical methods inadequate in operational settings. We apply ESPRESSO, a deep learning flow correlation model combining a transformer-based feature extraction network, time-aligned multi-channel interval features, and online triplet metric learning, to the problem of stepping-stone intrusion detection. To support training and evaluation, we develop a synthetic data collection tool that generates realistic stepping-stone traffic across five tunneling protocols: SSH, SOCAT, ICMP, DNS, and mixed multi-protocol chains. Across all five protocols and in both host-mode and network-mode detection scenarios, ESPRESSO substantially outperforms the state-of-the-art DeepCoFFEA baseline, achieving a true positive rate exceeding 0.99 at a false positive rate of $10^{-3}$ for standard bursty protocols in network-mode. We further demonstrate chain length prediction as a tool for distinguishing malicious from benign pivoting, and conduct a systematic robustness analysis revealing that timing-based perturbations are the primary vulnerability of correlation-based stepping-stone detectors.
著者: Chris Hicks, Elizabeth Bates, Shae McFadden, Isaac Symes Thompson, Myles Foley, Ed Chapman, Nickolas Espinosa Dice, Ankita Samaddar, Joshua Sylvester, Himanshu Neema, Nicholas Butts, Nate Foster, Ahmad Ridley, Zoe M, Paul Jones
要約:
In November 2025, the authors ran a workshop on the topic of what makes a good reinforcement learning (RL) environment for autonomous cyber defence (ACD). This paper details the knowledge shared by participants both during the workshop and shortly afterwards by contributing herein. The workshop participants come from academia, industry, and government, and have extensive hands-on experience designing and working with RL and cyber environments. While there is now a sizeable body of literature describing work in RL for ACD, there is nevertheless a great deal of tradecraft, domain knowledge, and common hazards which are not detailed comprehensively in a single resource. With a specific focus on building better environments to train and evaluate autonomous RL agents in network defence scenarios, including government and critical infrastructure networks, the contributions of this work are twofold: (1) a framework for decomposing the interface between RL cyber environments and real systems, and (2) guidelines on current best practice for RL-based ACD environment development and agent evaluation, based on the key findings from our workshop.
要約:
Stringology-Based Cryptanalysis (SBC) offers a suitable and a structurally aligned approach for uncovering structural patterns in stream ciphers that traditional statistical tests may often fail to detect. Despite \texttt{EChaCha20}'s design enhancements, no systematic investigation has been performed to determine whether its expanded 6$\times$6 state matrix and modified Quarter-Round Function (\texttt{QR-F}) introduce subtle keystream patterns, rotational biases, or partial collisions that could serve as statistical distinguishers. As such, addressing this gap is critical to ensure that the cipher's modifications do not unintentionally reduce its security margin. Therefore, this paper leverages Knuth-Morris-Pratt (\texttt{KMP}) and Boyer-Moore (\texttt{BM}) algorithms to analyze \texttt{EChaCha20}, which is a variant of ChaCha20 that features an expanded 6$\times$6 state matrix and an enhanced \texttt{QR-F}. The author has developed and optimized adaptations of the \texttt{KMP} and \texttt{BM} algorithms for 32-bit word level pattern analysis and employed them to investigate $m$-bit pattern frequency distributions to assess the \texttt{EChaCha20}'s resistance of rotational-differential attacks. Our experimental results on large-scale one million keystream datasets have confirmed that \texttt{EChaCha20} is able to maintain strong pseudorandomness at 16-bit and 32-bit levels with minor irregularities observed in the 8-bit domain. In addition to these, the differential tests have indicated a rapid diffusion, exhibiting an avalanche effect after two \texttt{QR-F} rounds and no statistically significant rotational collisions were observed within the evaluated bounds, consistent with expected ARX diffusion behavior beyond 3 rounds.
This work puts forward SBC as a complementary tool for ARX cipher evaluation and provide new thoughts on the security properties of \texttt{EChaCha20}.
要約:
With the rapid adoption of large language models (LLMs) in financial service scenarios, dialogue security detection under high regulatory risk presents significant challenges. Existing methods mainly rely on single-dimensional semantic judgments or fixed rules, making them inadequate for handling multi-turn semantic evolution and complex regulatory clauses; moreover, they lack models specifically designed for financial security detection. To address these issues, this paper proposes FinSec, a four-tier security detection framework for financial agent. FinSec enables structured, interpretable, and end-to-end identification of actual financial risks, incorporating suspicious behavior pattern analysis, delayed risk and adversarial inference, semantic security analysis, and integrated risk-based decision-making. Notably, FinSec significantly enhances the robustness of high-risk dialogue detection while maintaining model utility. Experimental results demonstrate FinSec's leading performance. In terms of overall detection capability, FinSec achieves an F1 score of 90.13%, improving upon baseline models by 6--14 percentage points; its ASR is reduced to 9.09%, markedly lowering the probability of unsafe outputs; and the AUPRC increases to 0.9189 -- an approximate 9.7% gain over general frameworks. Additionally, in balancing utility and safety, FinSec obtains a composite score of 0.9098, delivering robust and efficient protection for financial agent dialogues.
要約:
Organisations with limited data and computational resources increasingly outsource model training to Machine Learning as a Service (MLaaS) providers, who adapt vision-language models (VLMs) such as CLIP to downstream tasks via prompt tuning rather than training from scratch. This semi-honest setting creates a security risk where a malicious provider can follow the prompt-tuning protocol yet implant a backdoor, forcing triggered inputs to be classified into an attacker-chosen class, even for out-of-distribution (OOD) data. Such backdoors leave encoders untouched, making them undetectable to existing methods that focus on encoder corruption. Other data-level methods that sanitize data before training or during inference, also fail to answer the critical question, "Is the delivered model backdoored or not?" To address this model-level verification problem, we introduce CLIP-Inspector (CI), a backdoor detection method designed for prompt-tuned CLIP models. Assuming white-box access to the delivered model and a pool of unlabeled OOD images, CI reconstructs possible triggers for each class to determine if the model exhibits backdoor behaviour or not. Additionally, we demonstrate that using CI's reconstructed trigger for fine-tuning on correctly labeled triggered inputs enables us to re-align the model and reduce backdoor effectiveness. Through extensive experiments across ten datasets and four backdoor attacks, we demonstrate that CI can reconstruct effective triggers in a single epoch using only 1,000 OOD images, achieving a 94% detection accuracy (47/50 models). Compared to adapted trigger-inversion baselines, CI yields a markedly higher AUROC score (0.973 vs 0.495/0.687), thus enabling the vetting and post-hoc repair of prompt-tuned CLIP models to ensure safe deployment.
要約:
For risks that cannot be accepted, sufficiently mitigated, or eliminated, continuous observation is a viable approach but requires a model that can be operationalized. The Hagenberg Risk Management Process bridges this gap between qualitative risk analysis, using contextualized polar heatmaps (triage), and realtime risk management by extending Bowtie diagrams into a formal probabilistic runtime model. We introduce Realtime Risk Studio, a domain-specific modeling tool that (i) transforms Bowtie structures (causes, top event, barriers, consequences) into a directed acyclic graph (DAG) suitable for Bayesian inference, (ii) adds explicit safe-state semantics, and (iii) designates Activation Nodes as intervention points. Bowtie models are qualitative; however, Bayesian inference requires actual probabilities. As a second contribution, we present Probability Capture, a tool that complements our Realtime Risk Studio by automatically generating questionnaires from a DAG model so experts can provide estimates. The tool analyzes disagreement and aggregates conditional-probability assessments using both descriptive dispersion analysis and prior-regularized methods. Causal analysis can then provide insights into the DAG model, for example, via d-separation, adjustment-set inspection, do-calculus for what-if analysis, local independence checks, evidence updating, and impact-oriented searches for effective interventions.
This workflow is illustrated with an instant-payments gateway scenario, demonstrating (a) explicit safe-state semantics, (b) Bowtie-to-DAG operationalization, (c) probability capture with visible expert noise, and (d) causal what-if analysis on a transformed and enriched model. Rather than presenting a statistical validation, the paper contributes a method and prototype system that transforms partially mitigated risks into observable, probabilistic, and intervention-ready models.
要約:
Large Language Models (LLMs) are increasingly deployed in settings where Chain-of-Thought (CoT) is interpreted by users. This creates a new safety risk: attackers may manipulate the model's observable CoT to make malicious behaviors. In open-weight ecosystems, such manipulation can be embedded in lightweight adapters that are easy to distribute and attach to base models. In practice, persistent CoT hijacking faces three main challenges: the difficulty of directly hijacking CoT tokens within one continuous long CoT-output sequence while maintaining stable downstream outputs, the scarcity of malicious CoT data, and the instability of naive backdoor injection methods. To address the data scarcity issue, we propose Multiple Reverse Tree Search (MRTS), a reverse synthesis procedure that constructs output-aligned CoTs from prompt-output pairs without directly eliciting malicious CoTs from aligned models. Building on MRTS, we introduce Two-stage Backdoor Hijacking (TSBH), which first induces a trigger-conditioned mismatch between intermediate CoT and malicious outputs, and then fine-tunes the model on MRTS-generated CoTs that have lower embedding distance to the malicious outputs, thereby ensuring stronger semantic similarity. Experiments across multiple open-weight models demonstrate that our method successfully induces trigger-activated CoT hijacking while maintaining a quantifiable distinction between hijacked and baseline states under our evaluation framework. We further explore a reasoning-based mitigation approach and release a safety-reasoning dataset to support future research on safety-aware and reliable reasoning. Our code is available at https://github.com/ChangWenhan/TSBH_official.
要約:
Restricted Syndrome Decoding (ResSD) is a variant of linear code decoding problem where each of the error's entries must belong to a fixed small set of values. This problem underlies the security of CROSS, a post-quantum signature scheme that is one of the Round 2 candidates of NIST's ongoing additional signatures call. We show that solutions to this problem can be deduced from vectors of a particular structure and a small norm in newly constructed codes, in both Hamming and Euclidean metrics. This allows us to reduce Restricted Syndrome Decoding to both code-based (Regular Syndrome Decoding) and lattice-based problems (Closest Vector Problem, List of Short/Close Vectors), increasing the attack surface and providing new insights into the security of ResSD. We evaluate our attacks on CROSS instances both theoretically and experimentally on reduced parameters.
要約:
With the release of ChatGPT in 2022, generative AI has significantly lowered the cost of polishing and rewriting text. Due to its widespread usage, conference organizers instated specific requirements researchers need to adhere to when using GenAI. When asked to rewrite text, GenAI can introduce stylistic changes, often concentrated to a handful of ``marker words`` commonly associated with AI usage. Prior large-scale studies in preprints and biomedical science report post-2022 discontinuities of those marker words and broad linguistic features.
This paper investigates whether similar patterns appear in top-tier cybersecurity conference papers (NDSS, USENIX Security, IEEE S\&P, and ACM CCS) over the period 2000-2025. Using text extracted from paper PDFs, we compute lexical and syntactic metrics and track curated marker-word usage. Our findings reveal a gradual long-run drift toward higher lexical complexity and a pronounced post-2022 increase in marker-word usage across all venues showing an emerging trend towards more complex language in cybersecurity papers possibly hindering accessibility.
要約:
Agent ecosystems increasingly rely on installable skills to extend functionality, and some skills bundle learned model artifacts as part of their execution logic. This creates a supply-chain risk that is not captured by prompt injection or ordinary plugin misuse: a third-party skill may appear benign while concealing malicious behavior inside its bundled model. We present BadSkill, a backdoor attack formulation that targets this model-in-skill threat surface. In BadSkill, an adversary publishes a seemingly benign skill whose embedded model is backdoor-fine-tuned to activate a hidden payload only when routine skill parameters satisfy attacker-chosen semantic trigger combinations. To realize this attack, we train the embedded classifier with a composite objective that combines classification loss, margin-based separation, and poison-focused optimization, and evaluate it in an OpenClaw-inspired simulation environment that preserves third-party skill installation and execution while enabling controlled multi-model study. Our benchmark spans 13 skills, including 8 triggered tasks and 5 non-trigger control skills, with a combined main evaluation set of 571 negative-class queries and 396 trigger-aligned queries. Across eight architectures (494M--7.1B parameters) from five model families, BadSkill achieves up to 99.5\% average attack success rate (ASR) across the eight triggered skills while maintaining strong benign-side accuracy on negative-class queries. In poison-rate sweeps on the standard test split, a 3\% poison rate already yields 91.7\% ASR. The attack remains effective across the evaluated model scales and under five text perturbation types. These findings identify model-bearing skills as a distinct model supply-chain risk in agent ecosystems and motivate stronger provenance verification and behavioral vetting for third-party skill artifacts.
要約:
Model poisoning attacks pose a significant security threat to Federated Learning (FL). Most existing model poisoning attacks rely on collusion, requiring adversarial clients to coordinate by exchanging local benign models and synchronizing the generation of their poisoned updates. However, sustaining such coordination is increasingly impractical in real-world FL deployments, as it effectively requires botnet-like control over many devices. This approach is costly to maintain and highly vulnerable to detection. This context raises a fundamental question: Can model poisoning attacks remain effective without any communication between attackers? To address this challenge, we introduce and formalize the \textbf{non-collusive attack model}, in which all compromised clients share a common adversarial objective but operate independently. Under this model, each attacker generates its malicious update without communicating with other adversaries, accessing other clients' updates, or relying on any knowledge of server-side defenses. To demonstrate the feasibility of this threat model, we propose \textbf{XFED}, the first aggregation-agnostic, non-collusive model poisoning attack. Our empirical evaluation across six benchmark datasets shows that XFED bypasses eight state-of-the-art defenses and outperforms six existing model poisoning attacks. These findings indicate that FL systems are substantially less secure than previously believed and underscore the urgent need for more robust and practical defense mechanisms.
要約:
Retrieval Augmented Generation (RAG) systems deployed across organizational boundaries face fundamental tensions between security, accuracy, and efficiency. Current encryption methods expose plaintext during decryption, while federated architectures prevent resource integration and incur substantial overhead. We introduce Trans-RAG, implementing a novel vector space language paradigm where each organization's knowledge exists in a mathematically isolated semantic space. At the core lies vector2Trans, a multi-stage transformation technique that enables queries to dynamically "speak" each organization's vector space "language" through query-centric transformations, eliminating decryption overhead while maintaining native retrieval efficiency. Security evaluations demonstrate near-orthogonal vector spaces with 89.90{\deg} angular separation and 99.81% isolation rates. Experiments across 8 retrievers, 3 datasets, and 3 LLMs show minimal accuracy degradation (3.5% decrease in nDCG@10) and significant efficiency improvements over homomorphic encryption.
要約:
Precise interference detection and identification are crucial for enhancing the survivability of communication systems in non-cooperative wireless environments. While deep learning (DL) has advanced this field, existing single-task learning (STL) approaches neglect inherent task correlations. Furthermore, emerging multi-task learning (MTL) methods often lack a theoretical foundation for quantifying and modeling task relationships. To bridge this gap, we establish a theoretically grounded MTL framework for joint interference detection, modulation identification, and interference identification. First, we derive an upper bound for the weighted expected loss in MTL frameworks. This bound explicitly connects MTL performance to task similarity, quantified by the Wasserstein distance and learnable task relation coefficients. Guided by this theory, we present the adversarial multi-task interference detection and identification network (AMTIDIN), which integrates adversarial training to minimize distributional discrepancies across tasks and uses adaptive coefficients to model task correlations dynamically. Crucially, we conducted a quantitative analysis of task similarity to reveal intrinsic task relationships, specifically that modulation identification and interference identification share a substantial feature overlap distinct from interference detection. Extensive comparative experiments demonstrate that AMTIDIN significantly outperforms both its task-specific STL baseline and state-of-the-art MTL baselines in robustness and generalization, particularly under challenging conditions with limited training data, short signal lengths, and low signal-to-noise ratios (SNRs).
要約:
Multi-modal large language models (MLLMs) have emerged as powerful tools for analyzing Internet-scale image data, offering significant benefits but also raising critical safety and societal concerns. In particular, open-weight MLLMs may be misused to extract sensitive information from personal images at scale, such as identities, locations, or other private details. In this work, we propose ImageProtector, a user-side method that proactively protects images before sharing by embedding a carefully crafted, nearly imperceptible perturbation that acts as a visual prompt injection attack on MLLMs. As a result, when an adversary analyzes a protected image with an MLLM, the MLLM is consistently induced to generate a refusal response such as "I'm sorry, I can't help with that request." We empirically demonstrate the effectiveness of ImageProtector across six MLLMs and four datasets. Additionally, we evaluate three potential countermeasures, Gaussian noise, DiffPure, and adversarial training, and show that while they partially mitigate the impact of ImageProtector, they simultaneously degrade model accuracy and/or efficiency. Our study focuses on the practically important setting of open-weight MLLMs and large-scale automated image analysis, and highlights both the promise and the limitations of perturbation-based privacy protection.
要約:
Large Language Models (LLMs) for code generation can replicate insecure patterns from their training data. To mitigate this, a common strategy for security hardening is to fine-tune models using supervision derived from the final transformer layer. However, this design may suffer from a final-layer bottleneck: vulnerability-discriminative cues can be distributed across layers and become less detectable near the output representations optimized for next-token prediction. To diagnose this issue, we perform layer-wise linear probing. We observe that vulnerability-related signals are most detectable in a band of intermediate-to-upper layers yet attenuate toward the final layers. Motivated by this observation, we introduce DeepGuard, a framework that leverages distributed security-relevant cues by aggregating representations from multiple upper layers via an attention-based module. The aggregated signal powers a dedicated security analyzer within a multi-objective training objective that balances security enhancement and functional correctness, and further supports a lightweight inference-time steering strategy. Extensive experiments across five code LLMs demonstrate that DeepGuard improves the secure-and-correct generation rate by an average of 11.9% over strong baselines such as SVEN. It also preserves functional correctness while exhibiting generalization to held-out vulnerability types. Our code is public at https://github.com/unknownhl/DeepGuard.
要約:
Hardware-software contracts are abstract specifications of a CPU's leakage behavior. They enable verifying the security of high-level programs against side-channel attacks without having to explicitly reason about the microarchitectural details of the CPU. Using the abstraction powers of a contract requires proving that the targeted CPU satisfies the contract in the sense that the contract over-approximates the CPU's leakage. Besides pen-and-paper reasoning, proving contract satisfaction has been approached mostly from the model-checking perspective, with approaches based on a (semi-)automated search for the necessary invariants.
As an alternative, this paper explores how such proofs can be conducted in interactive proof assistants. We start by observing that contract satisfaction is an instance of a more general problem we call relative trace equality, and we introduce relative bisimulation as an associated proof technique. Leveraging recent advances in the field of coinductive proofs, we develop a deductive proof system for relative trace equality. Our system is provably sound and complete, and it enables a modular and incremental proof style. It also features several reasoning principles to simplify proofs by exploiting symmetries and transitivity properties. We formalized our deductive system in the Rocq proof assistant and applied it to two challenging contract satisfaction proofs.
要約:
In this paper, we present the first explicit examples of low-conductance permutations. The notion of conductance of permutations was introduced by Dodis et al. in "Indifferentiability of Confusion-Diffusion Networks", where the search for low-conductance permutations was first initiated and motivated. As part of our contribution, we not only provide these examples, but also offer a general characterization of the problem: we show that low-conductance permutations are equivalent to permutations possessing the information-theoretic properties of Multi-Source-Somewhere-Condensers, a specific variant of somewhere condensers.
要約:
Large language models (LLMs) are now routinely used to autonomously execute complex tasks, from natural language processing to dynamic workflows like web searches. The usage of tool-calling and Retrieval Augmented Generation (RAG) allows LLMs to process and retrieve sensitive corporate data, amplifying both their functionality and vulnerability to abuse. As LLMs increasingly interact with external data sources, indirect prompt injection emerges as a critical and evolving attack vector, enabling adversaries to exploit models through manipulated inputs. Through a systematic evaluation of indirect prompt injection attacks across diverse models, we analyze how susceptible current LLMs are to such attacks, which parameters, including model size and manufacturer, specific implementations, shape their vulnerability, and which attack methods remain most effective. Our results reveal that even well-known attack patterns continue to succeed, exposing persistent weaknesses in model defenses. To address these vulnerabilities, we emphasize the need for strengthened training procedures to enhance inherent resilience, a centralized database of known attack vectors to enable proactive defense, and a unified testing framework to ensure continuous security validation. These steps are essential to push developers toward integrating security into the core design of LLMs, as our findings show that current models still fail to mitigate long-standing threats.
要約:
Space infrastructures represent an emerging domain that is critical to the global economy and society. However, this domain is vulnerable to attacks, including cyber attacks and other kinds of attacks. To enhance the resilience of this domain, we must understand these attacks that can be waged against it and the defenses that can be employed to mitigate these attacks. The status quo is that there is neither a systematic understanding of these attacks against, nor defenses for, space infrastructures, despite their clear importance in guiding systematic analysis of space security and future research. In this paper, we fill the void by proposing the first systematic taxonomy of attacks against, and defenses for, space infrastructures. We hope this paper will inspire a community effort at refining the taxonomy towards a widely used one.
要約:
Current LLM safety research predominantly focuses on mitigating Goal Hijacking, preventing attackers from redirecting a model's high-level objective (e.g., from "summarizing emails" to "phishing users"). In this paper, we argue that this perspective is incomplete and highlight a critical vulnerability in Reasoning Alignment. We propose a new adversarial prompt attack paradigm: Reasoning Hijacking and instantiate it with Criteria Attack, which subverts model judgments by injecting spurious decision criteria without altering the high-level task goal. Unlike Goal Hijacking, which attempts to override the system prompt, Reasoning Hijacking accepts the high-level goal but manipulates the model's decision-making logic by injecting spurious reasoning shortcut. Though extensive experiments on three different tasks (toxic comment, negative review, and spam detection), we demonstrate that even newest models are prone to prioritize injected heuristic shortcuts over rigorous semantic analysis. The results are consistent over different backbones. Crucially, because the model's "intent" remains aligned with the user's instructions, these attacks can bypass defenses designed to detect goal deviation (e.g., SecAlign, StruQ), exposing a fundamental blind spot in the current safety landscape. Data and code are available at https://github.com/Yuan-Hou/criteria_attack
要約:
The modern integrated circuit ecosystem is increasingly reliant on third-party intellectual property integration, which introduces security risks, including hardware Trojans and security vulnerabilities. Addressing the resulting trust deadlock between IP vendors and system integrators without exposing proprietary designs requires novel privacy-preserving verification techniques. However, existing privacy-preserving hardware verification methods are all simulation-based and fail to offer formal guarantees. In this paper, we propose ZK-CEC, the first privacy-preserving framework for hardware formal verification. By combining formal verification and zero-knowledge proof (ZKP), ZK-CEC establishes a foundation for formally verifying IP correctness and security without compromising the confidentiality of the designs.
We observe that existing zero-knowledge protocols for formal verification are designed to prove statements of public formulas. However, in a privacy-preserving verification context where the formula is secret, these protocols cannot prevent a malicious prover from forging the formula, thereby compromising the soundness of the verification. To address these gaps, we first propose a blueprint for proving the unsatisfiability of a secret design against a public constraint, which is widely applicable to proving properties in software, hardware, and cyber-physical systems. Based on the proposed blueprint, we construct ZK-CEC, which enables a prover to convince the verifier that a secret IP's functionality aligns perfectly with the public specification in zero knowledge, revealing only the length and width of the proof. We implement ZK-CEC and evaluate its performance across various circuits, including arithmetic units and cryptographic components. Experimental results show that ZK-CEC successfully verifies practical designs, such as the AES S-Box, within practical time limits.
要約:
Multi-agent LLM systems are entering production -- processing documents, managing workflows, acting on behalf of users -- yet their resilience to prompt injection is still evaluated with a single binary: did the attack succeed? This leaves architects without the diagnostic information needed to harden real pipelines. We introduce a kill-chain canary methodology that tracks a cryptographic token through four stages (EXPOSED -> PERSISTED -> RELAYED -> EXECUTED) across 950 runs, five frontier LLMs, six attack surfaces, and five defense conditions. The results reframe prompt injection as a pipeline-architecture problem: every model is fully exposed, yet outcomes diverge downstream -- Claude blocks all injections at memory-write (0/164 ASR), GPT-4o-mini propagates at 53%, and DeepSeek exhibits 0%/100% across surfaces from the same model. Three findings matter for deployment: (1) write-node placement is the highest-leverage safety decision -- routing writes through a verified model eliminates propagation; (2) all four defenses fail on at least one surface due to channel mismatch alone, no adversarial adaptation required; (3) invisible whitefont PDF payloads match or exceed visible-text ASR, meaning rendered-layer screening is insufficient. These dynamics apply directly to production: institutional investors and financial firms already run NLP pipelines over earnings calls, SEC filings, and analyst reports -- the document-ingestion workflows now migrating to LLM agents. Code, run logs, and tooling are publicly released.
著者: Gabrielle De Micheli, Syed Mahbub Hafiz, Geovandro Pereira, Eduardo L. Cominetti, Thales B. Paiva, Jina Choi, Marcos A. Simplicio Jr, Bahattin Yildiz
要約:
Face recognition models operate in a client-server setting where a client extracts a compact face embedding and a server performs similarity search over a template database. This raises privacy concerns, as facial data is highly sensitive. To provide cryptographic privacy guarantees, one can use fully homomorphic encryption to perform end-to-end encrypted similarity search. However, existing FHE-based protocols are computationally costly and, impose high memory overhead. Building on prior work, HyDia (PoPETS 2025), we introduce algorithmic and system-level improvements targeting real-world deployment with resource-constrained clients. First, we propose BSGS-Diagonal, an algorithm delivering fast and memory-efficient similarity computation. BSGS-Diagonal substantially shrinks the rotation-key set, lowering both client and server memory requirements, and also improves practical server runtime. This yields a 91% reduction in the number of rotation keys, translating to approximately 14 GB less memory used on the client, and reducing overall CPU peak RAM from over 30 GB in the original HyDia to under 10 GB for databases up to size 1M. In addition, runtime is improved by up to 1.57x for the membership verification scenario and 1.43x for the identification scenario. Secondly, we introduce fully GPU-optimized similarity matrix computation kernels. The implementation is built upon FIDESlib, a CKKS-level GPU library based on OpenFHE. Rather than offloading individual CKKS primitives in isolation, the integrated kernels fuse operations to avoid repeated CPU-GPU ciphertext movement and costly FIDESlib/OpenFHE data-structure conversions. As a result, our GPU implementations of both HyDia and BSGS-Diagonal achieve up to 9x and 21x speedups, respectively, enabling sub-second encrypted face recognition for databases up to 32K entries while further reducing host memory usage.
要約:
Street-level imagery contains personally identifiable information (PII), some of which is context-dependent. Existing anonymization methods either over-process images or miss subtle identifiers, while API-based solutions compromise data sovereignty. We present an agentic framework CAIAMAR (\underline{C}ontext-\underline{A}ware \underline{I}mage \underline{A}nonymization with \underline{M}ulti-\underline{A}gent \underline{R}easoning) for context-aware PII segmentation with diffusion-based anonymization, combining pre-defined processing for high-confidence cases with multi-agent reasoning for indirect identifiers. Three specialized agents coordinate via round-robin speaker selection in a Plan-Do-Check-Act (PDCA) cycle, enabling large vision-language models to classify PII based on spatial context (private vs. public property) rather than rigid category rules. The agents implement spatially-filtered coarse-to-fine detection where a scout-and-zoom strategy identifies candidates, open-vocabulary segmentation processes localized crops, and $IoU$-based deduplication ($30\%$ threshold) prevents redundant processing. Modal-specific diffusion guidance with appearance decorrelation substantially reduces re-identification (Re-ID) risks. On CUHK03-NP, our method reduces person Re-ID risk by $73\%$ ($R1$: $16.9\%$ vs. $62.4\%$ baseline). For image quality preservation on CityScapes, we achieve KID: $0.001$, and FID: $9.1$, significantly outperforming existing anonymization. The agentic workflow detects non-direct PII instances across object categories, and downstream semantic segmentation is preserved. Operating entirely on-premise with open-source models, the framework generates human-interpretable audit trails supporting EU's GDPR transparency requirements while flagging failed cases for human review.