要約:
This article applies postphenomenological theory to the field of cybersecurity risk management, arguing that formal risk models function as mediating artifacts that shape how security practitioners or analysts perceive, interpret, and act on threats. Based on Don Ihde's taxonomy on human-technology relationships and Peter-Paul Verbeek's extended mediational framework, the Contextual and Multimodal Hazard Impact Index (CIIM), an original dynamic risk model presented as an empirical case study, is analyzed. CIIM is formally defined as CIIM(t+1) = [A T(t) V(t) E(t)] / R(t) + {alpha} P(t), where the condition R(t) 0 is not treated as a computational artifact to be smoothed out, but as a genuine systemic collapse that signals singularity. This design choice constitutes a deliberate phenomenological move, allowing organizational fragility to be made visible in a way that previous CVSS-based and probabilistic models conceal. In addition, we examine how CIIM's time projection (t+1) and its hybrid machine learning architecture, combining LSTM/GRU, XGBoost, and Reinforcement Learning, produce a new form of technological intentionality that structures practitioner or analyst attention and ethical deliberation. The article concludes by establishing implications for the ethical design of cybersecurity instrumentation and for the post-phenomenological methodology itself, proposing the concept of 'phenomenology of collapse' as a contribution to the empirical philosophy of technology.
要約:
Automated red-teaming methods for large language models typically optimize attack prompts within a fixed, human-designed strategy, leaving the attack strategy itself unchanged. We instead optimize the strategy. We propose AutoRISE, a method that searches over executable attack programs rather than individual prompts. At each iteration, a coding agent edits a strategy and a fixed evaluation harness scores the resulting attacks, returning both a scalar objective and per-example diagnostics that guide subsequent edits. This allows structural changes, including new attack components and altered control flow, that prompt-level methods do not directly express. We also release two benchmark suites developed on disjoint target sets and evaluate on 11 models from five families against seven established jailbreak datasets. Across held-out models, AutoRISE improves average attack success rate by 17.0 points over the strongest baseline, and improves attack success by up to 16 points on frontier targets with low baseline success rates. Ablations against parametric and strategy-library baselines suggest that these gains arise from unrestricted program search, particularly compositional techniques and control-flow edits. AutoRISE operates in a black-box, inference-only setting, requiring no fine-tuning, human annotation, or GPU compute.
要約:
Agent skills introduce a new and more severe form of indirect injection for LLM agents: unlike traditional indirect prompt injection, attackers can hide malicious instructions inside a dense, action-oriented skill that already functions as a legitimate instruction source. We study pre-execution skill-poison detection and show that successful skill poisoning induces a structured internal effect, attention hijacking, in which response-time attention shifts from trusted context to malicious skill spans and drives harmful behavior. Motivated by this mechanism, we propose RouteGuard, a frozen-backbone detector that combines response-conditioned attention and hidden-state alignment through reliability-gated late fusion. Across both real and synthetic open-source skill benchmarks, RouteGuard is consistently the strongest or most robust detector; on the critical Skill-Inject channel slice, it reaches 0.8834 F1 and recovers 90.51% of description attacks missed by lexical screening, showing that defending against skill poisoning requires internal-signal detection rather than text-only filtering
要約:
Autonomous systems increasingly operate under partial observability where execution-relevant state is never fully accessible. Existing governance mechanisms -- trusted execution environments, oracle-signed state proofs, cryptographic attestation -- enforce the integrity of computation and state projections. We show this is structurally insufficient: an authenticated projection of state is necessary but never sufficient for execution validity.
We introduce the Reconstructive Authority Model (RAM), which separates integrity from coverage. RAM defines a reconstruction gate that reasons over an explicit coverage envelope -- comprising proven state, declared assumptions, and an acknowledged unobservable residual -- and permits execution only when coverage is adequate for the action class. When coverage is insufficient, RAM narrows privileges dynamically or fails closed. Attestation proves trust in measurement; RAM proves adequacy of what is measured.
We formalize RAM, prove necessity via two theorems (attestation insufficiency and RAM necessity) and three corollaries, and present a hybrid RAM+Attestation architecture with privilege-narrowing. Synthetic experiments (N=100,000, seed=42) show RAM achieves zero invalid execution rates at all coverage levels. Attestation-based systems exhibit IER=0.423 at low coverage and IER=0.233 even at full coverage, the latter arising from undefined-state handling failures undetectable by integrity checks alone.
This reframes execution validity as a coverage reconstruction problem, distinct from and complementary to integrity guarantees provided by attestation.
要約:
We extend the CDPR lattice reduction algorithm from ideal to module lattices, leveraging the trace orthogonality of the power basis to decompose the module into rank-1 submodules and applying CDPR independently to each. This base module reduction achieves a Hermite factor $\exp(\tilde{O}(\sqrt{n}))$ matching the ideal case, with a module reduction factor $O(1)$ independent of the rank, under a balance hypothesis automatically satisfied for MLWE-distributed bases. To control precision, we introduce CRT-scaled rounding at totally split primes, reducing the Gram-Schmidt rounding error and yielding a bounded-precision implementation. We further reformulate the CDPR sign-selection subproblem as a mixed-integer linear program, determining the optimal balanced discrepancy to be a universal constant $\delta^*\approx 0.4407$. All results build on the class number one condition $h_k^+=1$ established in Part I of this series.
要約:
Edge deployment of transformer-based models increasingly relies on ASIC accelerators due to their high performance and energy efficiency, achieved through optimized dataflows, specialized architectures, low-bitwidth computation, and efficient memory hierarchies. However, these advantages come with significant security vulnerabilities. ASIC-based DNN accelerators are susceptible to side-channel attacks (e.g., power, electromagnetic, and timing analysis) and fault injection attacks (e.g., voltage manipulation, clock glitches, and memory perturbations), which can lead to model extraction or compromised inference integrity. Furthermore, threats introduced during design and fabrication, such as hardware Trojans or untrusted third-party IPs, further expand the attack surface. To address these challenges, we explore a hybrid ASIC+eFPGA architecture that combines the efficiency of ASICs with the flexibility of reconfigurable logic. The integrated eFPGA enables security-oriented mechanisms such as adaptive runtime monitoring, side-channel mitigation and post-deployment patching. By leveraging these capabilities, the proposed approach enhances system resilience against both runtime and supply-chain attacks, while preserving the performance benefits of ASIC-based transformer inference.
要約:
AI-powered generative models have significantly expanded the possibilities for editing, manipulating, and creating high-quality images. Particularly, images that falsely appear to originate from trusted sources pose a serious threat, undermining public trust in image authenticity. We propose DeepSignature, a novel approach that integrates the guarantees of digital signatures with the capabilities of deep neural networks. Neural networks are used both to generate content-encoding watermarks and to embed them imperceptibly into images while ensuring robust extraction. These watermarks are cryptographically verifiable, enabling source attribution and image integrity validation. DeepSignature is compatible with existing image formats and requires no special handling of signed images. It supports client-side verification, requiring only the signer's public key. Additionally, we introduce a novel latent-space verification approach to detect and localize tampering attempts. We evaluate DeepSignature in terms of imperceptibility, robustness to benign transformations, forgery detection, and its resilience against various attack scenarios. Our results highlight the inherent trade-offs between imperceptibility, robustness, and integrity verification. We demonstrate that DeepSignature reliably identifies significant forgery attempts -- achieving near 100\% in our experiments. Finally, we emphasize DeepSignature's modularity and tunable parameters, allowing adaptation to application-specific requirements. Code and model weights will be published.
要約:
Android malware detectors built with machine learning often suffer from temporal bias: models are trained and evaluated without respecting apps' actual release times, inflating accuracy and weakening real-world robustness. We address this by constructing a time-stamped dataset of benign and malicious Android apps and introducing a timestamp-verification procedure to ensure temporal accuracy. We then propose a detection framework that uses Bootstrap Your Own Latent (BYOL) for self-supervised pre-training to learn obfuscation-resilient representations, followed by supervised classification. Under time-aware evaluation, the method attains 98% accuracy and 89% F1. We further characterize malware behavior by analyzing true positives and false negatives using VirusTotal and the MITRE ATT&CK framework. To support reproducibility and further innovation, we release our dataset and source code.
要約:
Automated methods for red teaming LLMs are an important tool to identify LLM vulnerabilities that may not be covered in static benchmarks, allowing for more thorough probing. They can also adapt to each specific LLM to discover weaknesses unique to it. Most current automated red teaming methods are intended for tackling safety and content moderation. Thus, they make use of content safety models as evaluators and optimize for circumventing them, and as such, have not been tested with other adversarial intents not typically captured by these. We propose a pipeline for training a red teaming model that can generalize to arbitrary adversarial goals, including objectives it has not been directly trained on, and that does not depend on the existence of a pre-existing evaluator available at training time. We demonstrate that finetuning small models, such as Qwen3-8B, using this pipeline results in a substantial improvement in their ability to generate attacks for both in and out of domain adversarial goals.
要約:
SystemVerilog Assertions (SVA) are essential for formal verification of digital hardware, yet their manual creation demands significant expertise in both the design under verification and temporal logic. Recent studies have explored using large language models (LLMs) to automate SVA generation, but existing approaches suffer from incorrect signal references, missing timing constraints, and lack of formal correctness guarantees. This paper presents ProofLoop, a tool-augmented ReAct agent that generates SVA from natural-language specifications using a solver-in-the-loop approach. The agent operates in two phases: Phase A autonomously gathers design context by invoking EDA and formal tools, including semantic search over an AST-indexed vector database and JasperGold structural queries, while Phase B generates SVA and iteratively refines it using JasperGold formal proof feedback over up to fixed (here 3) verification rounds. We evaluate ProofLoop on FVEval Design2SVA design benchmarks and demonstrate that this framework can achieve 93.7% syntax correctness and 82.0% functional correctness. An ablation study confirms that each component, i.e., retrieval-augmented generation (RAG), JasperGold tools, and the verification loop contributes significantly (and orthogonally).
要約:
Emerging AR-LLM-based Social Engineering attack (e.g., SEAR) is at the edge of posing great threats to real-world social life. In such AR-LLM-SE attack, the attacker can leverage AR (Augmented Reality) glass to capture the image and vocal information of the target, using the LLM to identify the target and generate the social profile, using the LLM agents to apply social engineering strategies for conversation suggestion to win the target trust and perform phishing afterwards. Current defensive approaches, such as role-based access control or data flow tracking, are not directly applicable to the convergent AR-LLM ecosystem (considering embedded AR device and opaque LLM inference), leaving an emerging and potent social engineering threat that existing privacy paradigms are ill-equipped to address. This necessitates a shift beyond solely human-centric measures like legislation and user education toward enforceable vendor policies and platform-level restrictions. Realizing this vision, however, faces significant technical challenges: securing resource-constrained AR-embedded devices, implementing fine-grained access control within opaque LLM inferences, and governing adaptive interactive agents. To address these challenges, we present UNSEEN, a coordinated cross-stack defense that combines an AR ACL (Access Control Layer) for identity-gated sensing, F-RMU-based LLM unlearning for sensitive profile suppression, and runtime agent guardrails for adaptive interaction control. We evaluate UNSEEN in an IRB-approved user study with 60 participants and a dataset of 360 annotated conversations across realistic social scenarios.
要約:
Core logic and processing improvements were made to the software for operations and network attack results review (SONARR) and are presented, herein. Previous SONARR versions' Boolean-only logic, derived from the Blackboard Architecture, was replaced with generic logic that allows any .NET type (e.g., integers, decimals, strings) to be utilized within facts. This allows calculations and equality operations with all data types to drive the algorithm's processing of network models. Additionally, multi-compute capabilities were implemented to increase the processing power for larger workloads. In this paper, the new logic objects are described, examples are presented to illustrate the efficacy of creating digital-twin systems using the new generic logic, and performance test results are presented that illustrate the expanded processing capability from the multi-compute functionality.
要約:
Deep learning malware detectors achieve high classification accuracy but suffer from severe interpretability limitations, typically returning probabilistic verdicts that lack forensic context. We introduce AsmRAG, a framework performing malware analysis through Assembly-Level Retrieval-Augmented Generation. Unlike classifiers built on global statistical features, AsmRAG reformulates detection as an evidence-based retrieval task. The system uses a code-specialized Large Language Model (LLM) to analyze assembly functions and convert them into semantic embeddings. This process constructs a searchable knowledge base resilient to syntactic obfuscation. For inference, we propose a Density-Weighted Anchor Selection mechanism that isolates the primary unit of malicious logic within a binary to extract verifiable forensic evidence and resist evasion attempts. Testing on a curated dataset of 40k binaries shows AsmRAG reaching a detection F1-score of 96% alongside a family attribution F1-score of 95%. Comparisons confirm this semantic retrieval approach remains robust against metamorphic obfuscation. When holistic baselines (EMBER and ResNeXt) degrade, our methodology gives Security Operations Centers a transparent and reliable alternative.
要約:
Deploying proprietary Deep Neural Networks (DNNs) on commodity edge devices demands hardware-backed Digital Rights Management (DRM) capable of withstanding both software-level and physical adversaries. In Unified Memory Architecture (UMA) systems, the host CPU and Neural Processing Unit (NPU) share physical DRAM, leaving plaintext model weights directly readable by a compromised OS kernel. Existing defenses fail in this constrained setting: trusted execution environments monopolize scarce memory with permanently reserved regions, while full-memory encryption operates at page granularity. This forces the system to fetch massive 4 KB memory pages for sub-page tensor tiles, severely crippling bandwidth.
We present Tessera, a reference architecture for inline, cache-line granularity weight decryption on UMA edge accelerators. The design intercepts 64-byte AXI bursts, computing AES-256-CTR keystreams in parallel with DRAM fetches. This streams plaintext directly into isolated NPU SRAM, creating a transient memory footprint confined to the active tile and eliminating the need for permanent memory carve-outs. Measurements across three distinct SoC platforms demonstrate that this parallelization hides cryptographic latency behind standard DRAM fetch times, a condition that holds even under worst-case timing variations. Consequently, Tessera is projected to achieve 98.4\% of the theoretical memory bandwidth ceiling (a mere 1.6\% overhead). Across standard vision and language models, page-level memory encryption suffers up to a 32x bandwidth penalty, whereas Tessera maintains an optimal 1x footprint for all layer geometries. Finally, Tessera neutralizes major UMA-specific attack vectors -- including physical DRAM extraction, rogue DMA, and compute hijacking -- and formally prevents plaintext leakage across sparse tensors.
要約:
Semantic Communication (SC) backdoor attacks aim to utilize triggers to manipulate the system into producing predetermined outputs via backdoored shared knowledge. Current SC backdoors adopt monomorphic paradigms with single attack target, which suffers from limited attack diversity, efficiency, and flexibility in heterogeneous downstream scenarios. To overcome the limitations, we propose SemBugger, a polymorphic SC backdoor. By dynamically adjusting the trigger intensity, SemBugger finely-grained controls over the SC knowledge to generate diverse malicious results from the system. Specifically, SemBugger is realized through a multi-effect poisoning-training framework. It introduces graded-intensity triggers to poison training data and optimizes SC systems with hierarchical malicious loss. The trained system's knowledge dynamically adapts to trigger intensity in inputs to yield target outputs, all while preserving transmission fidelity for benign samples. Moreover, to augment SC security, we propose a provable robustness defense that resists SemBugger's homogeneous attacks through a controlled noise mechanism. It operates via strategically adding noise in SC inputs, and we formally provide a theoretical lower bound on the defense efficacy. Experiments across diverse SC models and benchmark datasets indicate that SemBugger attains high attack efficacy while maintaining the regular functionality of SC systems. Meanwhile, the designed defense effectively neutralizes SemBugger attacks.
要約:
Frontier models push the boundaries of what is learnable at extreme computational costs, yet distillation via sampling reasoning traces exposes closed-source frontier models to adversarial third parties who can bypass their guardrails and misappropriate their capabilities, raising safety, security, and intellectual privacy concerns. To address this, there is growing interest in building antidistillation methods, which aim to poison reasoning traces to hinder downstream student model learning while maintaining teacher performance. However, current techniques lack theoretical grounding, requiring either heavy fine-tuning or access to student model proxies for gradient based attacks, and often lead to a significant teacher performance degradation. In this work, we present a theoretical formulation of antidistillation as a Stackelberg game, grounding a problem that has so far largely been approached heuristically. Guided by the desired design properties our formulation reveals, we propose \texttt{TraceGuard}, an efficient, post-generation black-box method to poison sentences with high importance for teacher reasoning. Our work offers a scalable solution to share model insights safely, ensuring that the advancement of reasoning capabilities does not come at the cost of intellectual privacy or AI safety alignment.
要約:
The use of Machine Learning (ML) for data-driven decision-making often relies on access to sensitive datasets, which introduces privacy challenges. Traditional encryption methods protect data at rest or in transit but fail to secure it during processing, exposing it to unauthorized access. Homomorphic encryption emerges as a transformative solution, enabling computations on encrypted data without decryption, thus preserving confidentiality throughout the ML pipeline. This paper addresses the challenge of training ML models on encrypted data while maintaining accuracy and efficiency by proposing a proof-of-concept for a privacy-preserving framework that leverages Cheon-Kim-Kim-Song (CKKS) for approximate real-number arithmetic. Also, it demonstrates the feasibility of training K-Nearest Neighbors (KNN) and linear regression models on encrypted data, and evaluates encrypted inference for a basic Multilayer Perceptron (MLP) architecture. Experimental results show that models trained under Homomorphic encryption achieve performance metrics comparable to plaintext-trained models, validating the approach. However, challenges such as computational overhead, noise management, and limited support for non-polynomial operations persist. This work lays the groundwork for broader adoption of privacy-preserving ML in real-world applications, balancing security with computational feasibility.
要約:
Mobile apps offer significant benefits, but their privacy protections often remain ineffective and confusing for users. While prior work mainly analyzes app privacy vulnerabilities, few approaches help users understand, set, and enforce their privacy preferences. This paper presents PrivacyAssist, a multi-agent LLM-based platform that detects inconsistencies between user-granted permissions and developers' declared sensitive data collection and sharing practices. Using Retrieval-Augmented Generation (RAG), PrivacyAssist provides concise explanations and real-time on-device warnings to support informed installation decisions. We evaluate PrivacyAssist with 200 users and 2,347 Android apps, finding that only 16% of apps are fully consistent between granted permissions and declared data practices.
要約:
Jump-Oriented Programming (JOP) attacks exploit indirect control transfers to bypass backward-edge defenses, yet existing forward-edge CFI mechanisms lack precise source-domain authorization: type-based CFI admits all same-signature callers, while tag-based hardware CFI is limited by fixed-width register storage that caps the number of simultaneously authorized sources. We propose Branch Landing (BRL), a landing-based forward-edge CFI framework for RISC-V that replaces fixed-capacity checks with Bloom filter membership queries. Two lightweight ISA extensions, bld and brl, propagate a source Section Identifier (SID) through a dedicated BRState register and validate it at each landing site with fixed-probe latency that is independent of the number of authorized sources under a chosen filter configuration. Section granularity is configurable, supporting policies from type-based to CFG-derived authorization within a single mechanism. We implement Branch Landing in the LLVM RISC-V backend and evaluate it on 81 BEEBS benchmarks under two representative policy configurations: a function-level, type-based policy and a basic-block-level, CFG-derived policy. Under a 3-cycle brl latency model, the two configurations incur average runtime overheads of only 0.210% and 0.421%, with mean code size growth of 0.46% and 0.52% respectively. The CFG-derived policy reduces the average equivalence class size by 32.5% compared to the type-based policy, and all evaluated executions complete without BRL enforcement failures.
要約:
The growing adoption of IoT and cloud computing, combined with rapid advancements in digital technologies, has considerably increased the cyber-attack surface, resulting in increasingly complex and persistent attacks. Traditional security methods, primarily based on perimeter defenses, are insufficient to meet these developing threats, especially within the context of a Zero Trust Security (ZTS) architecture. This study investigates the application of sophisticated artificial intelligence (AI) and machine learning (ML) techniques, including the use of the Synthetic Minority Oversampling Technique (SMOTE), to improve anomaly detection and threat intelligence systems. This study focuses on how Support Vector Machine (SVM), Random Forest (RF), and Decision Tree (DT) classifiers might increase threat detection accuracy in IoT environments. The research endeavors to improve cybersecurity resilience by mitigating false positives and providing actionable intelligence through supervised learning algorithms. The KDD Cup 1999 dataset is used in the study to assess how well these models perform in simulating various network intrusions and regular traffic. The application of SMOTE significantly enhanced the performance of these models by addressing class imbalance, leading to improved detection accuracy. Furthermore, as supplementary methods for detecting malicious URLs and advanced persistent threats (APTs), edge-based machine learning and blockchain technology are investigated. This study addresses the shortcomings of conventional security systems and supports the growing demand for reliable threat detection in a world that is becoming more interconnected. It also advances the creation of more proactive and adaptable cybersecur
要約:
Agentic AI systems face security challenges that stateless large language models do not. They plan across extended horizons, maintain persistent memory, invoke external tools, and coordinate with peer agents. Existing security analyses organize threats by attack type (prompt injection, jailbreaking), but provide no principled model of which architectural component is vulnerable or over what timescale the threat manifests. This paper makes five contributions. First, we introduce the Layered Attack Surface Model (LASM), a seven-layer framework that maps threats to distinct architectural components: Foundation, Cognitive, Memory, Tool Execution, Multi-Agent Coordination, Ecosystem, and Governance, the accountability and observability layer that spans the stack analogously to the network management plane. Second, we introduce attack temporality as an orthogonal analytical dimension with four classes: Instantaneous (T1), Session-Persistent (T2), Cross-Session Cumulative (T3), and Sub-Session-Stack, Non-Session-Bounded (T4). Third, through a systematic review of 94 papers (2021--2025), we show that the most dangerous emerging threats concentrate at the intersection of high-layer attacks (L5--L7) and slow-burn temporality (T3--T4): covert agent collusion, long-term memory poisoning, MCP supply-chain compromise, and alignment failure that manifests as an insider threat with no external adversary. Only 8 of 120 paper-cell assignments (7%) fall in this zone. Fourth, we propose a cross-layer defense taxonomy spanning all seven LASM layers and all four temporality classes, exposing which threat classes existing defenses leave unaddressed. Fifth, we survey evaluation benchmarks, identify five research gaps in the under-studied high-layer, slow-burn zone, and argue that agentic security must be treated as a distributed systems problem embedded in an adversarial ecosystem.
要約:
The deployment of Large Language Models (LLMs) as assistants in electric grid operations promises to streamline compliance and decision-making but exposes new vulnerabilities to prompt-based adversarial attacks. This paper evaluates the risk of jailbreaking LLMs, i.e., circumventing safety alignments to produce outputs violating regulatory standards, assuming threats from authorized users, such as operators, who craft malicious prompts to elicit non-compliant guidance. Three state-of-the-art LLMs (OpenAI's GPT-4o mini, Google's Gemini 2.0 Flash-Lite, and Anthropic's Claude 3.5 Haiku) were tested against Baseline, BitBypass, and DeepInception jailbreaking methods across scenarios derived from nine NERC Reliability Standards (EOP, TOP, and CIP). In the initial broad experiment, the overall Attack Success Rate (ASR) was 33.1%, with DeepInception proving most effective at 63.17% ASR. Claude 3.5 Haiku exhibited complete resistance (0% ASR), while Gemini 2.0 Flash-Lite was most vulnerable (55.04% ASR) and GPT-4o mini moderately susceptible (44.34% ASR). A follow-up experiment refining malicious wording in Baseline and BitBypass attacks yielded a 30.6% ASR, confirming that subtle prompt adjustments can enhance simpler methods' efficacy.
要約:
Autonomous Large Language Model (LLM) agents are increasingly deployed to conduct complex tasks by interacting with external tools, APIs, and memory stores. However, processing untrusted external data exposes these agents to severe security threats, such as indirect prompt injection and unauthorized tool execution. Securing these systems requires effective information flow tracking. Yet, traditional taint analysis that is designed for program memory states fundamentally fails when applied to LLMs, where data propagation is governed by probabilistic natural language reasoning. In this paper, we present NeuroTaint, the first comprehensive taint tracking framework tailored for the unique information flow characteristics of LLM agents. Our key insight is that taint propagation in LLM agents must be understood not only as explicit content transfer, but also as semantic transformation, causal influence on decisions, and cross-session persistence through memory. NeuroTaint therefore audits execution traces offline to reconstruct provenance from untrusted sources to privileged sinks using semantic evidence, causal reasoning, and persistent context tracking, rather than relying on exact string matches or pre-defined source-sink paths alone. Extensive evaluation using TaintBench, our 400-scenario benchmark spanning 20 real-world agent frameworks, shows that NeuroTaint substantially outperforms FIDES, an information-flow-control (IFC)-style baseline for LLM agents, in source-sink propagation detection. We further show that NeuroTaint remains effective on established agent-security benchmarks, including InjecAgent and ToolEmu, while operating offline with modest additional auditing cost.
要約:
The April 2026 disclosure that a frontier large language model escaped its security sandbox, executed unauthorized actions, and concealed its modifications to version control history demonstrates that agentic AI systems with autonomous tool access can circumvent the containment mechanisms designed to constrain them. This paper analyzes four categories of current containment approaches - alignment training, environmental sandboxing, application-level tool-call interception, and accessible audit systems - and identifies the failure modes each exhibits when the AI agent is treated as a potential adversary rather than a trusted component receiving adversarial inputs. We categorize five behavioral incidents from the public disclosure and situate them within 698 real-world AI scheming incidents documented by the Centre for Long-Term Resilience between October 2025 and March 2026, a 4.9x acceleration establishing the challenge as systemic. We derive five architectural requirements: trust separation through layered OS privilege enforcement with semantic intent analysis, sequential intent inference through five-phase taxonomic monitoring, independent containment integrity monitoring, adversarial audit isolation through logical invisibility, and emergent capability envelope enforcement through distributional divergence monitoring. No publicly described system satisfies all five. We argue that architectural containment is the only durable safety strategy given the inevitable proliferation of equivalent capabilities including open-weight models. The author's published patent portfolio in provider-independent constraint enforcement addresses several of these requirements. Concurrent work including SandboxEscapeBench (arXiv:2603.02277) independently confirms that frontier models can escape standard container sandboxes, corroborating the threat model presented here.
要約:
The global financial ecosystem confronts a critical asymmetry: while fraud syndicates operate as borderless, distributed networks, banking institutions remain constrained by regulatory data silos, limiting visibility into cross-institutional threat patterns under strict privacy laws such as GDPR. Although Federated Learning (FL) enables collaborative training, existing protocols impose a trade-off among scalability, privacy, and integrity. Homomorphic encryption schemes are computationally expensive, while pairwise masking protocols require O(N^2) key exchanges and lack mechanisms to detect malformed updates. Existing defenses also remain vulnerable to gradient inversion attacks that can reconstruct sensitive transaction data.
To address these limitations, we propose Dynamic Sharded Federated Learning (DSFL), a verifiable secure aggregation framework for cross-institution financial fraud detection. DSFL replaces mesh topologies with Dynamic Stochastic Sharding, reducing communication complexity from O(N^2) to O(N m), where m is a fixed shard size, achieving linear scalability. To mitigate insider threats, we introduce Linear Integrity Tags, an additive-homomorphic commitment mechanism that enables probabilistic verification of submitted updates without the overhead of zero-knowledge proofs, while not enforcing semantic correctness. Additionally, the Active Neighborhood Recovery protocol ensures robust aggregation under participant dropouts. Empirical evaluation on the Credit Card Fraud Detection Dataset (ULB) demonstrates an approximately 33x latency reduction compared to Paillier-based secure aggregation, while maintaining strong resilience under simulated failures. These results position DSFL as a practical foundation for scalable and privacy-preserving collaborative fraud detection.
要約:
Wireless chips and interfaces expose a substantial remote attack surface. As of today, most cellular baseband security research is performed on the Android ecosystem, leaving a huge gap on Apple devices. With iOS jailbreaks, last-generation wireless chips become fairly accessible for performance and security research. Yet, iPhones were never intended to be used as a research platform, and chips and interfaces are undocumented. One protocol to interface with such chips is Apple Remote Invocation (ARI), which interacts with the central phone component CommCenter and multiple user-space daemons, thereby posing a Remote Code Execution (RCE) attack surface. We are the first to reverse-engineer and fuzz-test the ARI interface on iOS. Our Ghidra scripts automatically generate a Wireshark dissector, called ARIstoteles, by parsing closed-source iOS libraries for this undocumented protocol. Moreover, we compare the quality of the dissector to fully-automated approaches based on static trace analysis. Finally, we fuzz the ARI interface based on our reverse-engineering results. The fuzzing results indicate that ARI does not only lack public security research but also has not been well-tested by Apple. By releasing ARIstoteles open-source, we also aim to facilitate similar research in the future.
要約:
Collusion among autonomous agents poses a critical security threat in embodied multi-agent systems (MAS), where coordinated behaviors can deviate from global objectives and lead to real-world consequences. Existing defenses, primarily based on identity control or post-hoc behavior analysis, are insufficient to address such threats in embodied settings due to delayed feedback and noisy observations in physical environments, which make behavioral deviations difficult to detect accurately and in a timely manner. To address this challenge, we propose a mutagenic incentive intervention approach that mitigates collusion by reshaping agents' payoff structures. By rewarding agents who report collusive behavior and penalizing identified participants, the mechanism induces strategic defection and renders collusion unstable. We further design supporting mechanisms, including reporting deposits, smart contract-based reward enforcement, and encrypted communication, to ensure robustness against misuse of the incentive mechanism and retaliation from penalized agents. We implement the proposed approach in both simulated and real-world embodied environments. Experimental results show that our method effectively suppresses collusion by inducing defection, while preserving system efficiency. It achieves performance comparable to the non-collusion baseline and outperforms representative reactive defenses, thereby fulfilling the desired security objectives. These results demonstrate the effectiveness of proactive incentive design as a practical paradigm for securing embodied multi-agent systems.
要約:
Publicly verifiable delegation is a well-known problem involving a user who wishes to outsource a resource-intensive computational task to a more powerful but potentially untrusted server such that any other party is able to efficiently check the veracity of the computation's result. This problem has been extensively studied in the classical domain where the user and server are both non-quantum machines. However, the problem becomes more challenging when the classical user wants to delegate a quantum circuit to a single prover with quantum-computing capabilities. Previous solutions have resorted to using impractical or non-standard cryptographic solutions (e.g. indistinguishability obfuscation) to achieve this requirement. In this work, we relax the requirement to have time-delayed publicly verifiable proofs, where the verification key is made known to the public only when the computation (and its proof) are guaranteed to have been completed. We propose a practical non-interactive scheme leveraging commitment schemes and time-lock puzzles, which can be efficiently realized through well-established and standard post-quantum assumptions. The main idea of our technique lies in using time-lock puzzles to compile a 2-round privately verifiable scheme into a non-interactive publicly verifiable scheme with timestamped proofs, outsourcing not only the quantum computation but the puzzle solving as well. Security is proven in the quantum random oracle model with a common reference string (CRS).
要約:
In the digital era, personal data, particularly sensitive identifiers such as the Social Security Number and National Identification Number, have become a highly valuable asset, raising significant concerns regarding privacy and security. This study examines the risks associated with the online exposure of the Thai National Identification Number, a key element of identity verification in both governmental and commercial transactions. Similar to the Social Security Number in the United States, this unique identifier is crucial for various legal, financial, and welfare-related activities. However, the increasing digitization of personal records has heightened its vulnerability to unauthorized access and misuse, particularly through search engines that inadvertently index sensitive information.
This research identifies publicly exposed Thai National Identification Numbers across major search engines, assessing the potential threats to individual privacy and national security. The study reveals the exposure of over 1.2 million unique National Identification Numbers, along with other highly sensitive personal data, e.g., addresses, contact details, employment status, disability status, and health information. Notably, the analysis indicates that a significant majority of these exposures originate from the Thai government sector websites, highlighting critical vulnerabilities in public data management practices. This widespread exposure not only increases the risk of identity theft and financial fraud but also underscores the urgent need for enhanced cybersecurity measures, stricter regulatory enforcement, and improved data governance within government agencies to prevent future breaches. Addressing these issues is essential to safeguarding citizens' personal information and ensuring compliance with Thailand's data protection laws in an increasingly digitized world.
要約:
The aviation industry faces significant vulnerabilities from both physical and cybersecurity threats, highlighting the urgent need for enhanced cybersecurity measures amid increasingly sophisticated attacks. This paper systematically reviews emerging threats at airports, analyzing real-world incidents and relevant literature while mapping risks to the MITRE ATT&CK Matrix, a widely recognized knowledge base for categorizing cyberattack tactics, techniques, and procedures. This is the first to apply the MITRE Matrix to airport security risks, offering a novel approach to understanding and mitigating these challenges. Building on this analysis, the paper advocates for modern cybersecurity defense models, emphasizing Cybersecurity Frameworks and Zero Trust Architecture, as well as critical measures for supply chain risk management and strategies to mitigate ransomware and DoS attacks. Our analysis provides insights into vulnerabilities and actionable recommendations, serving as a comprehensive guide for aviation stakeholders to strengthen defenses against evolving cybersecurity threats.
要約:
Privacy-critical domains require phishing detection systems that satisfy contradictory constraints: near-zero false positives to prevent workflow disruption, transparent explanations for non-expert staff, strict regulatory compliance prohibiting sensitive data exposure to external APIs, and robustness against AI-generated attacks. Existing rule-based systems are brittle to novel campaigns, while LLM-based detectors violate privacy regulations through unredacted data transmission. We introduce CyberCane, a neuro-symbolic framework integrating deterministic symbolic analysis with privacy-preserving retrieval-augmented generation (RAG). Our dual-phase pipeline applies lightweight symbolic rules to email metadata, then escalates borderline cases to semantic classification via RAG with automated sensitive data redaction and retrieval from a phishing-only corpus. We further introduce PhishOnt, an OWL ontology enabling verifiable attack classification through formal reasoning chains. Evaluation on DataPhish2025 (12.3k emails; mixed human/LLM) and Nazario/SpamAssassin demonstrates a 78.6-point recall gain over symbolic-only detection on AI-generated threats, with precision exceeding 98% and FPR as low as 0.16%. Healthcare deployment projects a 542x ROI; tunable operating points support diverse risk tolerances, with open-source implementation at https://github.com/sbhakim/Cybercane.
要約:
Vehicle diagnostics has become essential for detecting in-vehicle errors and ensuring safety. While the Unified Diagnostic Services (UDS) protocol is widely adopted for diagnostic operations, it relies on the ISO 15765-2 standard as the transport protocol over the Controller Area Network (CAN), which was designed without inherent security considerations. In this paper, we identify eight novel attack scenarios that exploit specific transport layer mechanisms in the ISO 15765-2 standard, including Flow Control manipulation, Sequence Number violations, and error handling abuses. We evaluate these attacks on a real passenger vehicle using two distinct diagnostic tools to demonstrate their practical impact. Our results confirm that three of these attack scenarios successfully induce denial of diagnostic services, leading to abnormal diagnostic results such as concealed faults and manipulated sensor readings. These findings highlight critical vulnerabilities that can deceive technicians and drivers, potentially exposing vehicles to significant safety risks.
要約:
R\'{e}nyi Pufferfish Privacy (RPP) provides a R\'{e}nyi divergence-based privacy framework for correlated data, but existing $\infty$-Wasserstein mechanisms are often conservative and sacrifice data utility. We study Gaussian mechanisms for RPP under Gaussian and Gaussian-mixture priors. For single Gaussian priors, we derive the exact R\'{e}nyi divergence after Gaussian perturbation, obtain a relaxed closed-form sufficient condition for $(\alpha,\epsilon)$-RPP, and characterize the monotonicity of the calibrated noise with respect to the privacy budget $\epsilon$ and the R\'{e}nyi order $\alpha$. To handle more general non-Gaussian and multimodal priors, we approximate secret-conditioned outputs with Gaussian mixture models and introduce an optimal-transport-based sufficient condition for RPP. Experiments on three UCI datasets with statistical (\textsc{RAW}, \textsc{MEAN}) and model-output (\textsc{BNN}, \textsc{GP}) queries show that our prior-aware mechanisms consistently require less noise than a recent RPP additive-noise baseline, achieving an average noise reduction of 48.9\%. These results show that our mechanisms can substantially improve the privacy-utility trade-off under RPP.
要約:
Line current differential relays (LCDRs) are measurement-driven relays that rely on time-synchronized multi-phase current waveforms to infer internal faults in AC and DC power networks. In inverter-based microgrids, however, the increasing reliance on digitally communicated measurements exposes LCDRs to false-data injection attacks (FDIAs), in which adversaries manipulate remote measurement streams to create protection-triggering yet physically inconsistent current trajectories. This paper addresses this emerging measurement integrity problem by introducing a measurement integrity validation scheme that operates as a supervisory instrumentation layer for modern LCDRs. The proposed scheme interprets short windows of synchronized instantaneous current measurements recorded during relay operation and assesses their physical consistency to distinguish genuine fault-induced trajectories from cyber-manipulated measurement streams. A recurrent neural network is trained offline using only relay-available current measurements and exploits the temporal structure of differential current waveforms, which remains informative in inverter-dominated systems where current magnitude is no longer a reliable observable. The method requires no additional sensors, auxiliary protection elements, or prior knowledge of network topology, and is applicable to both AC and DC LCDRs without structural modification. The proposed measurement validation scheme is evaluated on an islanded inverter-based microgrid under a comprehensive set of fault and FDIA scenarios, demonstrating high detection accuracy while preserving relay dependability. Hardware-in-the-loop validation using an OPAL-RT real-time simulator confirms that the scheme satisfies protection timing constraints and can operate in real time under realistic operating conditions.
要約:
With the wide adoption of personal AI assistants such as OpenClaw, privacy leakage in user interaction contexts with large language model (LLM) agents has become a critical issue. Existing privacy attacks against LLMs primarily target training data, while research on inference-time contextual privacy risks in LLM agent memory remains limited. Moreover, prior methods often incur high attack costs, requiring multiple queries or relying on white-box assumptions, which limits their practicality in real-world deployments. To address these issues, we propose a training-free privacy extraction attack targeting LLM agent memory, which we name \textsc{Spore}. \textsc{Spore} is compatible with both black-box and gray-box settings. In the black-box setting, \textsc{Spore} can efficiently extract a small candidate set via a single query to recover the original private information. In the gray-box setting, \textsc{Spore} allows the attacker to leverage multi-ranked tokens for more accurate and faster privacy extraction. We provide an information-theoretic analysis of \textsc{Spore} and show that it achieves high query efficiency with substantial per query information leakage. Experiments on multiple frontier LLMs show that \textsc{Spore} outperforms attack success rate over existing state-of-the-art (SOTA) schemes. It also maintains low attack cost and remains stable across different model parameter settings. We further evaluate the robustness of \textsc{Spore} against existing defense mechanisms. Our results show that \textsc{Spore} consistently bypasses both detection and strong safety alignment, demonstrating resilient performance in diverse defensive settings and real-world safety threats.
要約:
This paper extends the Classification Error Gauge (x-CEG) framework, originally developed for measuring the privacy-utility trade-off in tabular datasets, to privacy auditing of Large Language Models (LLMs). We propose LLM-CEG, a systematic framework that employs membership inference attack (MIA) success rates as an empirical privacy gauge and model perplexity as a utility gauge, iteratively adjusting differential privacy parameters until both thresholds are jointly satisfied. A proof-of-concept prototype fine-tunes DistilGPT-2 on a synthetic clinical PII dataset under four privacy regimes using DP-SGD. Results indicate that DP-SGD reduces MIA attacker advantage by 71.5% while simultaneously improving out-of-distribution utility by 47-50% relative to the overfitted baseline, suggesting that differential privacy may act as implicit regularization under narrow fine-tuning conditions. We further extend the SIED engineering framework to the LLM context as LLM-SIED, providing an auditable, regulator-aligned process for privacy-compliant LLM deployment.
要約:
Rootkits are among the most elusive types of malware, capable of bypassing traditional static analysis methods due to their metamorphic behavior. Signature-based detection techniques struggle against these threats, necessitating a shift toward dynamic analysis approaches. We propose SeqShield, a behavior-based rootkit detection approach designed specifically for the Windows OS, leveraging API call sequences for dynamic behavior analysis. Instead of relying on static signatures, SeqShield examines the execution patterns of API calls, which inherently reflect malicious intent. Analyzing API sequences, we can effectively identify rootkit-like behavior. We also employed a metamorphic code engine to generate 10X mutated variants of rootkits, demonstrating their obfuscation strategies. SeqShield applies n-gram analysis to extract bigram and trigram features from these API call sequences, enabling effective detection of rootkit-like activity. Among the models tested, Random Forest achieves the highest accuracy of 97.27% (bigram) and 96.17% (trigram). To optimize performance and decrease the dimension, we apply feature importance ranking using the Gini Impurity Index, iteratively selecting the most significant features. The optimized lower-dimensional feature matrix significantly enhances detection efficiency without sacrificing accuracy. Using the optimized feature set, our approach achieves 96.72% accuracy for bigrams and 97.81% accuracy for trigrams.
要約:
LLM-powered applications routinely embed secrets in system prompts, yet models can be tricked into revealing them. We built an adaptive attacker that evolves its strategies over hundreds of rounds and tested it against nine defense configurations across more than 20,000 attacks. Every defense that relied on the model to protect itself eventually broke. The only defense that held was output filtering, which checks the model's responses via hardcoded rules in separate application code before they reach the user, achieving zero leaks across 15,000 attacks. These results demonstrate that security boundaries must be enforced in application code, not by the model being attacked. Until such defenses are verified by tools like Swept AI, AI systems handling sensitive operations should be restricted to internal, trusted personnel.
要約:
Threat modeling for cyber-physical systems (CPS) remains a largely manual exercise. This project presents SMSI (System Model Security Inference), a hybrid neuro-symbolic pipeline that starts from a SysML architecture model and produces a prioritized list of NIST 800-53 security controls. The prototype has three main stages: a deterministic parser mapping system components to vulnerabilities via the NVD; a family of retrieval and classification models linking vulnerabilities to MITRE ATT&CK techniques; and a control recommender. We explore three approaches for CVE-to-ATT&CK mapping: a supervised classifier using fine-tuned SecureBERT+, retrieval-based dense encoders, and a zero-shot LLM approach using Gemma-4 26B. We validate the pipeline on a healthcare IoT gateway with nine software components. For the ATT&CK-to-NIST stage, pretrained SecureBERT achieves the highest control retrieval scores, demonstrating that dense embeddings provide a strong basis for automated control recommendation.
要約:
Autonomous AI agents deployed on platforms such as OpenClaw face prompt injection, memory poisoning, supply-chain attacks, and social engineering, yet existing defences address only the platform perimeter, leaving the agent's own threat judgement entirely untrained. We present ClawdGo, a framework for endogenous security awareness training: we teach the agent to recognise and reason about threats from the inside, at inference time, with no model modification. Four contributions are introduced: TLDT (Three-Layer Domain Taxonomy) organises 12 trainable dimensions across Self-Defence, Owner-Protection, and Enterprise-Security layers; ASAT (Autonomous Security Awareness Training) is a self-play loop where the agent alternates attacker, defender, and evaluator roles under weakest-first curriculum scheduling; CSMA (Cross-Session Memory Accumulation) compounds skill gains via a four-layer persistent memory architecture and Axiom Crystallisation Promotion (ACP); and SACP (Security Awareness Calibration Problem) formalises the precision-recall tradeoff introduced by endogenous training. Live experiments show weakest-first ASAT raises average TLDT score from 80.9 to 96.9 over 16 sessions, outperforming uniform-random scheduling by 6.5 points and covering 11 of 12 dimensions. CSMA retains the full gain across sessions; cold-start ablation recovers only 2.4 points, leaving a 13.6-point gap. E-mode generates 32 TLDT-conformant scenarios covering all 12 dimensions. SACP is observed when a heavily trained agent classifies a legitimate capability assessment as prompt injection (30/160).
要約:
Industrial Control Systems (ICS) integrate computing, physical processes, and communication to operate critical infrastructures such as power grids, water treatment plants, and oil and gas facilities. As ICS become increasingly targeted by cyberattacks, timely and reliable anomaly diagnosis is essential for protecting operational safety. However, existing ICS anomaly detection approaches face practical limitations: supervised methods require extensive labeled attack data and suffer from class imbalance, while model-based detectors often lack the ability to provide deep insight into the root causes of anomalies, leading to elevated false alarms and making it difficult for operators to initiate a timely response. In this work, we propose a system-aware unsupervised framework for ICS anomaly diagnosis that combines lightweight online detection with contextual explanation. The system identifies deviations from observed normal behaviors without prior knowledge of system topology. To support actionable response, we further concatenate a contextual digital twin augmented with an Large Language Model (LLM) to enhance interpretability, which translates detection evidence into grounded diagnostic hypotheses and verification steps for operators. Experiments on public ICS benchmarks demonstrate that the proposed framework achieves real-time detection efficiency and provides consistent, interpretable anomaly diagnoses, enabling low-latency warning and practical deployment in complex industrial environments.
要約:
Mobile apps frequently request excessive data access, raising significant privacy concerns. While regulations like GDPR emphasize data minimization, they provide limited guidance on concretely defining and enforcing necessary data access. Existing regulatory mechanisms primarily rely on expert-driven audits that face challenges in scalability, neutrality, and alignment with user expectations. In this paper, we propose a novel paradigm--democratizing privacy assessment, inspired by prior work on user-centric privacy perceptions--which repositions users as active evaluators in the privacy auditing process, recognizing that user perceptions of data usage play a crucial role in assessing the appropriateness and necessity of data access. To operationalize this paradigm, we introduce DePRa, a prototype system developed through participatory design, featuring contextual explanation provision, category-based representative selection, an intuitive rating interface, and preference-based rating adjustment. We evaluated DePRa with 200 everyday mobile app users, analyzing how effectively it captures user opinions on sensitive data access, comparing their privacy ratings with expert assessments, and exploring risk preference-based score calibration. Our findings show the feasibility and promise of democratized privacy assessment, highlighting its potential to complement expert auditing and support inclusive privacy evaluation.
要約:
Large (vision-)language models exhibit remarkable capability but remain highly susceptible to jailbreaking. Existing safety training approaches aim to have the model learn a refusal boundary between safe and unsafe, based on the user's intent. It has been found that this binary training regime often leads to brittleness, since the user intent cannot reliably be evaluated, especially if the attacker obfuscates their intent, and also makes the system seem unhelpful. In response, frontier models, such as GPT-5, have shifted from refusal-based safeguards to safe completion, that aims to maximize helpfulness while obeying safety constraints. However, safe completion could be exploited when a user pretends their intention is benign. Specifically, this intent inversion would be effective in multi-turn conversation, where the attacker has multiple opportunities to reinforce their deceptively benign intent. In this work, we introduce a novel multi-turn jailbreaking method that exploits this vulnerability. Our approach gradually builds conversational trust by simulating benign-seeming intentions and by exploiting the consistency property of the model, ultimately guiding the target model toward harmful, detailed outputs. Most crucially, our approach also uncovered an additional class of model vulnerability that we call para-jailbreaking that has been unnoticed up to now. Para-jailbreaking describes the situation where the model may not reveal harmful direct reply to the attack query, however the information that it reveals is nevertheless harmful. Our contributions are threefold. First, it achieves high success rates against frontier models including GPT-5-thinking and Claude-Sonnet-4.5. Second, our approach revealed and addressed para-jailbreaking harmful output. Third, experiments on multimodal VLM models showed that our approach outperformed state-of-the-art models.
要約:
Cryptographic API misuse represents a critical vulnerability class that undermines the security foundations of modern software. Yet, it remains largely unexplored in Go despite its dominance in security-critical infrastructure. This paper presents the first comprehensive study of cryptographic API misuse detection in Go, identifying and analyzing 4 state-of-the-art tools (CodeQL, Gopher, Gosec, and Snyk Code) and establishing a consolidated taxonomy of 14 relevant misuse classes. Through an experimental evaluation of 328 security-critical open-source Go projects, we discovered 7,473 cryptographic API misuses, providing insights into the prevalence and distribution of these vulnerabilities. Our systematic comparison reveals significant variations in misuse coverage, with immediate practical implications for security engineers and long-term implications for research in this domain.
要約:
Large Language Model (LLM) agents are increasingly used to automate complex workflows, but integrating untrusted external data with privileged execution exposes them to severe security risks, particularly direct and indirect prompt injection. Existing defenses face significant challenges in balancing security with utility, often encountering a trade-off where rigorous protection leads to over-defense, or where subtle indirect injections bypass detection. Drawing inspiration from operating system virtualization, we propose AgentVisor, a novel defense framework that enforces semantic privilege separation. AgentVisor treats the target agent as an untrusted guest and intercepts tool calls via a trusted semantic visor. Central to our approach is a rigorous audit protocol grounded in classic OS security primitives, designed to systematically mitigate both direct and indirect injection attacks. Furthermore, we introduce a one-shot self-correction mechanism that transforms security violations into constructive feedback, enabling agents to recover from attacks. Extensive experiments show that AgentVisor reduces the attack success rate to 0.65%, achieving this strong defense while incurring only a 1.45% average decrease in utility relative to the No Defense scenario, demonstrating superior performance compared to existing defense methods.
要約:
Spotting encryption loops in binary-only ransomware is a critical reverse engineering task. Since the existence of avalanche effect, an intrinsic characteristic of any secure encryption algorithms, is unavoidable during a victim data encryption attack, it is a very promising direction to spot encryption loops through avalanche effect detection. Unfortunately, no existing work in this direction ensures that the being-checked effect is the avalanche effect itself. Although CipherXRay is inspired by avalanche effect, it only checks whether a "ripple effect" (i.e., a necessary but non-sufficient condition) of avalanche effect exists, allowing a straightforward counterattack to succeed. In this work, we present a new approach that checks the avalanche effect itself. Because the detection is conducted in adversarial settings (e.g., the ransomware author may obfuscate the code), a viable approach must tolerate inaccurate input \& output identification and must be resilient to adversarial evasion. These challenges are addressed by a novel record-and-replay detection mechanism that takes advantage of the statistical guarantees provided by the Shapiro-Wilk normality test. The experimental results show that our approach achieves 0.0\% false negative rate and 1.1\% false positive rate. When our tool is employed to reverse engineer real-world ransomware samples, it succeeds in analyzing all the ransomware samples selected from ten representative families.
要約:
Defending against backdoor attacks in large language models remains a critical practical challenge. Existing defenses mitigate these threats but typically incur high preparation costs and degrade utility via offline purification, or introduce severe latency via complex online interventions. To overcome this dichotomy, we present Tail-risk Intrinsic Geometric Smoothing (TIGS), a plug-and-play inference-time defense requiring no parameter updates, external clean data, or auxiliary generation. TIGS leverages the observation that successful backdoor triggers consistently induce localized attention collapse within the semantic content region. Operating entirely within the native forward pass, TIGS first performs content-aware tail-risk screening to identify suspicious attention heads and rows using sample-internal signals. It then applies intrinsic geometric smoothing: a weak content-domain correction preserves semantic anchoring, while a stronger full-row contraction disrupts trigger-dominant routing. Finally, a controlled full-row write-back reconstructs the attention matrix to ensure inference stability. Extensive evaluations demonstrate that TIGS substantially suppresses attack success rates while strictly preserving clean reasoning and open-ended semantic consistency. Crucially, this favorable security-utility-latency equilibrium persists across diverse architectures, including dense, reasoning-oriented, and sparse mixture-of-experts models. By structurally disrupting adversarial routing with marginal latency overhead, TIGS establishes a highly practical, deployment-ready defense standard for state-of-the-art LLMs.
著者: V\'ictor Mayoral-Vilches, Mar\'ia Sanz-G\'omez, Francesco Balassone, Maite Del Mundo De Torres, George Nicolaou, Samuel Rodriguez Borines, Almerindo Graziano, Paul Zabalegui, Endika Gil-Uriarte
要約:
As LLM-driven agents advance in cybersecurity, Jeopardy CTF benchmarks are approaching saturation and cyber ranges, the natural next evaluation frontier, offer diminishing resistance under their current static design. We validate this observation by deploying an LLM-driven Advanced Persistent Threat (APT) agent across three tiers of increasingly realistic infrastructure (PRO Labs, MHBench, military-grade CYBER RANGES). To counteract this trend, we propose Dynamic Cyber Ranges: cyber range environments augmented with LLM-driven Defender agents that harden infrastructure, monitor for intrusions, and respond in real time. Across evaluated scenarios, Defender agents reduce attacker success to 0-55%, achieving complete prevention on multiple configurations. Since attacker and defender agents draw from the same underlying model capabilities, Dynamic Cyber Ranges preserve evaluation headroom as models improve. Notably, a smaller, specialized on-premise model (alias2-mini) matched the frontier model's defensive outcomes on multiple scenarios under identical, untuned prompts, and detected the attacker 10x faster on a complex enterprise scenario, suggesting that privacy-preserving on-premise models can serve as competent defenders against frontier-class attackers. The experiments further surface emergent agent behaviors, including scope expansion and prompt exfiltration, with implications for AI benchmark integrity and agentic system design.
要約:
Auditing the semantic properties of proprietary data creates a fundamental tension: verification requires transparent access, while proprietary rights demand confidentiality. While Zero-Knowledge Proofs (ZKPs) ensure privacy, they are typically limited to precise algebraic constraints and are ill-suited for verifying qualitative, unstructured properties, such as the logic within a codebase. We propose {\em Agentic Witnessing}, a framework that moves verification from attested execution to {\em attested reasoning}. The system is composed of three agents: a Verifier (who wants to check properties of a dataset), a Prover (who owns the dataset) and an Auditor (that inspects the dataset). The Verifier is allowed to ask a limited number of simple binary true/false questions to the auditor. By isolating an LLM-based Auditor within a Trusted Execution Environment (TEE), the system enables the Verifier to query a Prover's private data via simple Boolean queries, without exposing the raw dataset. The Auditor uses the Model Context Protocol (MCP) to dynamically inspect the target dataset, producing a yes/no verdict accompanied by a cryptographic transcript: a signed hash chain binding the reasoning trace to both the original dataset and the TEE's hardware root of trust. We demonstrate this architecture by automating the artifact evaluation process for 21 peer-reviewed computer science papers with released codebases on GitHub (e.g. Does the codebase implement the system described in the paper?). We verified five high-level properties of these codebases described in the corresponding publications, treating the source code as private. Our results show that TEE-enabled agentic auditing provides a mechanism for privacy-preserving oversight, effectively decoupling qualitative verification from the need for data disclosure.
要約:
The convergence of Internet of Things (IoT) security and Zero Trust (ZT) principles is a trending topic, demanding a comprehensive, multi-perspective analysis. We present the first multivocal literature review (MLR) on this topic, combining 68 academic and 36 industrial studies. This comprehensive review identifies two complementary yet divergent perspectives: academia focuses on IoT compliance with ZT principles through IoT modifications, while industry prioritizes practical integration within existing ZT frameworks guided by NIST standards. The analysis reveals critical research gaps in socio-technical understanding, cost-benefit evaluation, and interdisciplinary collaboration, highlighting these as key directions for future research.
要約:
Trusted Execution Environments (TEEs) on low-power microcontrollers (e.g., ARM TrustZone-M) enable isolation of Secure and Non-Secure software but still require both worlds to share resources, including interrupt controllers. In this model, real-time applications and real-time operating systems (RTOS-s) are executed in the Non-Secure sub-system, whereas the Secure sub-system is typically reserved for a small set of pre-defined security (e.g., cryptographic) operations referred to as trusted computing services. However, many RTOS-s rely on periodic interrupts (SysTicks) to advance their own notion of time (time-keeping), and the delivery of this interrupt is essential for preserving real-time behavior. On the other hand, the security of many trusted computing services requires atomicity vis-a-vis the Non-Secure sub-system (where the RTOS resides), precluding SysTick handling.
This paper first characterizes this conflict and then introduces a Secure-driven time synchronization mechanism in which the Secure World measures elapsed time and compensates the Non-Secure RTOS by unobtrusively updating the RTOS time-keeping data structures with the appropriate number of missed ticks before re-enabling interrupts and resuming the execution of the Non-Secure system. This approach restores a consistent, monotonic notion of time across worlds and enables secure coexistence of trusted computing services and RTOS-s on microcontrollers. Importantly, the proposed approach requires no modifications to the underlying RTOS and yields no significant run-time overhead.
要約:
The Rowhammer vulnerability poses an increasing challenge with newer generations of DRAM and aggressive technology scaling. Existing mitigation techniques, such as Graphene, Twice, and Hydra, primarily rely on tracking activation counts for each row and issuing refreshes when a row reaches a predefined tracking threshold. However, these methods have inherent limitations, including inefficiencies in identifying rows genuinely at risk of bit flips.
In this paper, we propose a novel framework called Rowhammer Vulnerability Count (RVC), which shifts the focus from activation count tracking to evaluating a row's actual vulnerability to bit flips. By selectively issuing refreshes only to rows on the verge of experiencing bit flips, RVC drastically reduces unnecessary refresh operations. We also demonstrate that prior works have incorrectly set tracking thresholds, leading to security flaws.
Our evaluation shows that RVC achieves 95 - 99.99% improvement in mitigation induced refreshes when compared to Graphene, with no additional space overhead. Furthermore, RVC improves energy efficiency and reduces average LLC latency by up to 76.91%, making it a highly efficient and scalable solution for addressing Rowhammer in modern DRAM systems. These findings establish RVC as a superior approach for preventing Rowhammer, outperforming existing methods in both accuracy and efficiency.
要約:
The decentralization of modern energy systems is transforming consumers into prosumers who continuously exchange data with aggregators, peers, and market operators. While such data is essential for peer-to-peer trading, demand response, and distributed forecasting, it can reveal sensitive household patterns and introduce privacy risks. Existing data sharing mechanisms rely on fixed policies or predefined differential privacy budgets, limiting their ability to adapt to variations in reliability, data sensitivity, and request purpose. As a result, prosumers rarely receive explanations for why a request is accepted, rejected, or modified, reducing trust and participation.
To address these limitations, we propose X-NegoBox, an explainable negotiation framework for adaptive privacy budgeting and transparent decision making. Each prosumer data is managed locally within a private DataBox, where raw data remain confined. Incoming requests are processed by an Autonomous Privacy Budget Negotiation Protocol (APBNP), which determines an appropriate privacy budget based on trust, feature sensitivity, declared purpose, historical behavior, and risk-aware pricing. When needed, APBNP generates privacy-preserving counter-offers, such as reduced resolution or duration.
An Explainable Agreement Layer (X-Contract) produces human- and machine-readable justifications for each decision. After agreement, requester code executes locally in a sandbox, and only sanitized outputs are shared. Experiments on realistic energy market settings show reduced privacy leakage, higher acceptance rates, and improved interpretability.
要約:
Cross-chain bridges, the critical infrastructure of the multi-chain ecosystem, have become a primary target for attackers, resulting in over $2.8 billion in losses due to subtle implementation flaws. Existing defenses, such as bytecode-level static analysis, are ill-equipped to handle the semantic complexity of cross-chain interactions, while LLM-based approaches, which can understand source code, struggle with hallucinatory reasoning over complex, multi-contract dependencies. In this paper, we propose GoAT-X, a framework that shifts automated cross-chain smart contract codebases auditing from heuristic pattern matching toward systematic first-principles verification. GoAT-X structures the audit process as a Graph of Auditing Thoughts, explicitly mirroring how human experts decompose, reason about, and validate security logic. By anchoring LLM reasoning in statically extracted data flows and explicitly linking abstract security properties to concrete code implementations, the framework constrains semantic reasoning within well-defined structural and state boundaries. Within this constrained space, GoAT-X treats missing constraints and adversarial bypass paths in cross-chain logic as first-class vulnerability targets and dynamically explores reasoning paths to identify exploitable semantic gaps. We evaluate GoAT-X on a comprehensive benchmark covering all known cross-chain token transaction attacks. GoAT-X achieves 92% recall on fine-grained audit points and 95% coverage of vulnerable projects, while identifying 117 confirmed risks in the wild with low operational cost, establishing a new standard for scalable, logic-driven cross-chain security.
要約:
A t-private n-server Information-Theoretic Distributed Point Function ((t,n)-ITDPF) allows one to convert any point function f_{alpha,beta}(x): [N] -> G into n shares (secret keys), such that each server can compute an additive share of f_{alpha,beta}(x) with a key while any <= t servers learn absolutely no information about the function. This paper constructs a novel share conversion based on the private information retrieval (PIR) of Ghasemi, Kopparty, and Sudan (STOC 2025) and proposes a perfectly secure 1-private ITDPF with output group G = Z_p, where p can be any prime. Compared with the existing perfectly secure ITDPFs for the same output group, the proposed ITDPF is more efficient with asymptotically shorter secret keys.
要約:
Accurate vulnerability-inducing commit identification serves as a foundation for a series of software security tasks, such as vulnerability detection and affected version analysis. A straightforward solution is the SZZ algorithm, which traces back through the code history to identify the earliest commit that modify the vulnerable code. Unfortunately, neither the customized V-SZZ nor state-of-the-art LLM4SZZ perform satisfactorily due to the incorrect anchor selection and inadequate backtracking capability, making them far beyond a reliable usage in practice.
To overcome these challenges, we propose a multi-agentic SZZ algorithm, named MAS-SZZ, that facilitates the identification of vulnerability-inducing commits through collaboration among agents. Specifically, given a CVE description and its corresponding fixing commit, MAS-SZZ summarizes the root cause of the vulnerability and employs a structured step-forward prompting strategy to localize vulnerability-related statements based on the change intent of each patch hunk. These vulnerable statements serve as anchors from which MAS-SZZ autonomously traces backward through the repository's history to find the commit that first introduced the vulnerability. Extensive experiments show that MAS-SZZ outperforms the state-of-the-art baselines across datasets and programming languages, achieving F1-score gains of up to 65.22% over the best-performing SZZ algorithm.
要約:
Public warning systems (PWS) in cellular networks enable authorities to broadcast emergency alerts to all mobile phones in a geographic region in the event of threats such as earthquakes or severe weather. If an attacker can imitate these alerts and transmit a forged warning containing fake news or phishing links, the impact could range from public panic to user compromise. In this work, we present the first open-source 5G emergency alert spoofing attack, implemented by modifying the openairinterface (OAI) radio access network (RAN) code and executed using a software-defined radio, complemented by a custom network management system to automate network and warning configuration. We conduct a detailed analysis of how different smartphones behave under various conditions. Our findings show that while devices readily display spoofed alerts, the alerting mechanism enables multiple practical attack scenarios beyond simple warning display. Finally, to address this threat, we propose and implement a lightweight cross-cell verification mechanism in OAI, in which the device compares the received warning with neighboring cell broadcasts to flag single-source alerts as suspicious.
要約:
Fine-tuning unlocks large language models (LLMs) for specialized applications, but its high computational cost often puts it out of reach for resource-constrained organizations. While cloud platforms could provide the needed resources, data privacy concerns make sharing sensitive information with third parties risky. A promising solution is split learning for LLM fine-tuning, which divides the model between clients and a server, allowing collaborative and secure training through exchanged intermediate data, thus enabling resource-constrained participants to adapt LLMs safely. % In light of this, a growing body of literature has emerged to advance this paradigm, introducing varied model methods, system optimizations, and privacy defense-attack techniques for split learning. To bring clarity and direction to the field, a comprehensive survey is needed to classify, compare, and critique these diverse approaches. This paper fills the gap by presenting the first extensive survey dedicated to split learning for LLM fine-tuning. We propose a unified, fine-grained training pipeline to pinpoint key operational components and conduct a systematic review of state-of-the-art work across three core dimensions: model-level optimization, system-level efficiency, and privacy preservation. Through this structured taxonomy, we establish a foundation for advancing scalable, robust, and secure collaborative LLM adaptation.
要約:
The rapid integration of Large Language Models (LLMs) into Multi-Agent Systems (MAS) has significantly enhanced their collaborative problem-solving capabilities, but it has also expanded their attack surfaces, exposing them to vulnerabilities such as prompt infection and compromised inter-agent communication. While emerging graph-based anomaly detection methods show promise in protecting these networks, the field currently lacks a standardized, reproducible environment to train these models and evaluate their efficacy. To address this gap, we introduce Gammaf (Graph-based Anomaly Monitoring for LLM Multi-Agent systems Framework), an open-source benchmarking platform. Gammaf is not a novel defense mechanism itself, but rather a comprehensive evaluation architecture designed to generate synthetic multi-agent interaction datasets and benchmark the performance of existing and future defense models. The proposed framework operates through two interdependent pipelines: a Training Data Generation stage, which simulates debates across varied network topologies to capture interactions as robust attributed graphs, and a Defense System Benchmarking stage, which actively evaluates defense models by dynamically isolating flagged adversarial nodes during live inference rounds. Through rigorous evaluation using established defense baselines (XG-Guard and BlindGuard) across multiple knowledge tasks (such as MMLU-Pro and GSM8K), we demonstrate Gammaf's high utility, topological scalability, and execution efficiency. Furthermore, our experimental results reveal that equipping an LLM-MAS with effective attack remediation not only recovers system integrity but also substantially reduces overall operational costs by facilitating early consensus and cutting off the extensive token generation typical of adversarial agents.
要約:
Large language models deployed at runtime can misbehave in ways that clean-data validation cannot anticipate: training-time backdoors lie dormant until triggered, jailbreaks subvert safety alignment, and prompt injections override the deployer's instructions. Existing runtime defenses address these threats one at a time and often assume a clean reference model, trigger knowledge, or editable weights, assumptions that rarely hold for opaque third-party artifacts. We introduce Layerwise Convergence Fingerprinting (LCF), a tuning-free runtime monitor that treats the inter-layer hidden-state trajectory as a health signal: LCF computes a diagonal Mahalanobis distance on every inter-layer difference, aggregates via Ledoit-Wolf shrinkage, and thresholds via leave-one-out calibration on 200 clean examples, with no reference model, trigger knowledge, or retraining. Evaluated on four architectures (Llama-3-8B, Qwen2.5-7B, Gemma-2-9B, Qwen2.5-14B) across backdoors, jailbreaks, and prompt injection (56 backdoor combinations, 3 jailbreak techniques, and BIPIA email + code-QA), LCF reduces mean backdoor attack success rate (ASR) below 1% on Qwen2.5-7B and Gemma-2 and to 1.3% on Qwen2.5-14B, detects 92-100% of DAN jailbreaks (62-100% for GCG and softer role-play), and flags 100% of text-payload injections across all eight (model, domain) cells, at 12-16% backdoor FPR and <0.1% inference overhead. A single aggregation score covers all three threat families without threat-specific tuning, positioning LCF as a general-purpose runtime safety layer for cloud-served and on-device LLMs.
要約:
Object detection (OD) is critical to real-world vision systems, yet existing backdoor attacks on detection transformers (DETRs) for OD tasks rely on patch-wise triggers optimized at fixed locations with minimal perturbations. Such attacks overlook that backdoor triggers in the real world may appear at different sizes, fields of view (FoVs), and locations in images, while minimal perturbations are difficult for cameras to capture, limiting attack practicality. We first observe that a patch-wise trigger in DETR delivers high attack effectiveness when activating the backdoor across neighboring locations, a phenomenon we term the trigger radiating effect (TRE). Meanwhile, inserting patch-wise triggers across multiple locations synergistically enhances TRE, resulting in high attack effectiveness across images. We propose DETOUR, a practical backdoor attack by using semantic triggers that are effective in real-world object detection systems. To ensure attack practicality, we rescale trigger patterns to different sizes and insert them at various predefined locations during backdoor training, enabling the model to recognize the trigger regardless of its spatial configurations. To address FoV variations in physical deployments, we extract the trigger pattern from a real-world object (e.g., a mug) captured under multiple FoVs and inject the trigger accordingly, promoting viewpoint-invariant backdoor activation and enhancing TRE across the entire image. As a result, the backdoor can be reliably activated under diverse FoVs and spatial configurations.
要約:
Current cyber attribution approaches typically operate on a per-incident basis, leaving open whether aggregating evidence across campaigns improves adversary identification. We investigate whether cross-campaign attribution reduces ambiguity or whether structural limits persist under longitudinal data. We model adversary fingerprints as multi-dimensional feature vectors encoding behavioral, infrastructural, and temporal characteristics derived from covert beacon interactions. We introduce ARCANE (Attacker Re-identification via Cross-campaign Attribution Network), a probabilistic framework that aggregates passive telemetry across campaigns and organizations to construct persistent adversary fingerprints. These fingerprints are updated using a Bayesian belief network that integrates new evidence over time. A time-decayed confidence metric captures accumulated similarity across campaigns. Evaluation on a synthetic dataset of multiple threat profiles shows that intra-actor similarity consistently exceeds inter-actor similarity. However, separation between distinct actors remains limited due to shared operational practices among sophisticated adversaries. Results indicate that cross-campaign aggregation alone does not resolve attribution ambiguity. Performance is constrained by a structural ceiling in feature space, where inter-actor similarity remains high even without evasion. Attribution accuracy remains stable under increasing evasion, suggesting the main limitation is feature indistinguishability rather than adversarial adaptation. These findings highlight the need for additional signal classes, such as targeting patterns, temporal coordination, and infrastructure relationships, to improve attribution reliability.
要約:
Autonomous AI agents extend large language models into full runtime systems that load skills, ingest external content, maintain memory, plan multi-step actions, and invoke privileged tools. In such systems, security failures rarely remain confined to a single interface; instead, they can propagate across initialization, input processing, memory, decision-making, and execution, often becoming apparent only when harmful effects materialize in the environment. This paper presents AgentWard, a lifecycle-oriented, defense-in-depth architecture that systematically organizes protection across these five stages. AgentWard integrates stage-specific, heterogeneous controls with cross-layer coordination, enabling threats to be intercepted along their propagation paths while safeguarding critical assets. We detail the design rationale and architecture of five coordinated protection layers, and implement a plugin-native prototype on OpenClaw to demonstrate practical feasibility. This perspective provides a concrete blueprint for structuring runtime security controls, managing trust propagation, and enforcing execution containment in autonomous AI agents. Our code is available at https://github.com/FIND-Lab/AgentWard .
要約:
Barrett reduction is the nonlinear core of every practical NTT-based post-quantum cryptography implementation. Existing composition frameworks (ISW, t-SNI, PINI, DOM) address Boolean masking over GF(2); none provides a machine-checked characterization of Barrett's leakage under first-order arithmetic masking and the first-order probing model over prime fields. Building on our prior series, QANARY [15], partial-NTT-masking margins [14], algebraic foundations [16], and butterfly composition [18], we close this gap. We prove a trichotomy: for any $q > 0$ and shift $s$, the Barrett internal wire map $f_x(m) = ((x + 2^s - m) \bmod 2^s) \bmod q$ has preimage cardinality in $\{0, 1, 2\}$, never more. We call this the 1-Bit Barrier: max-multiplicity 2 implies at most 1 bit of min-entropy loss per internal wire, universal over all moduli. The count-zero cases, unreachable output values, reveal that actual leakage is often strictly less than 1 bit, making the bound conservative. We introduce PF-PINI (Prime-Field PINI): Barrett satisfies PF-PINI(2); the Cooley-Tukey butterfly satisfies PF-PINI(1). We observe (not yet proved) that with fresh inter-stage masking, the composed pipeline has max-multiplicity $\max(k_1, k_2)$, so the 1-Bit Barrier propagates. The trichotomy, the PF-PINI instantiations, and cardinality results are machine-checked in Lean 4 with Mathlib: 12 proved results, zero sorry, universal over all $q > 0$ (the min-entropy bound follows by standard definitions). Adams Bridge lacks fresh inter-stage masking, violating PF-PINI composition and explaining why Papers 1 [15] and 2 [14] found vulnerabilities. NIST IR 8547 recommends formal methods for PQC implementation validation. The 1-Bit Barrier provides the first universal machine-checked cardinality bound for masked Barrett reduction in ML-KEM (FIPS 203) and ML-DSA (FIPS 204), with a corresponding 1-bit leakage interpretation.
要約:
Side Channel Analysis (SCA) relaxes the black-box assumption of conventional cryptanalysis by incorporating physical measurements acquired during cryptographic operations. Electro-magnetic (EM) emissions of a chip during computations often provide a very valuable source of side channel leakage. During the evaluation of a chip for electro-magnetic side channel emissions one needs to position an electro-magnetic probe in an advantageous position relative to the chip. Previous literature focused on hot-spot finding and to a lower extend repositioning. Trace augmentations have been considered to aid portability of profiling using one physical device and attacking another device. This paper focuses on training a single neural network using traces from multiple EM probe positions to detect leakage from a larger area over the attacked device. We provide dual evaluation of EM traces - from two completely independent labs - profiling on data from one lab and attacking traces from the other lab.
要約:
This paper investigates covert wireless communication with a Fusion Center (FC) that aggregates raw energy measurements from multiple Wardens via soft fusion. Extending our prior work on power-threshold randomization, we consider a stronger adversarial model in which FC randomizes both the number of active Wardens W and the detection threshold t, while Alice and a friendly Jammer jointly randomize their transmit powers under an outage constraint at Bob. We derive a closed-form expression for FC's optimal soft-fusion threshold and show that it is independent of the number of active Wardens. Thus, strategic uncertainty in the sensing infrastructure provides no meaningful detection advantage for FC under soft fusion. We further establish a robustness theorem showing that, even under arbitrary FC randomization over (W,t), Alice and Jammer can maintain outage-feasible communication at Bob while preserving covertness with high probability, provided their power ranges are sufficiently large. This reveals a structural limitation of soft fusion. A game-theoretic formulation characterizes the Nash equilibrium mixed strategies of both sides, accounting for deployment costs and detection-pressure parameters. Analytical and numerical results show that: 1) soft fusion is largely insensitive to the number of Wardens; 2) even semi-strategic finite-support geometric randomization of W performs comparably to the full game-theoretic equilibrium; and 3) the covertness-reliability tradeoff remains nearly invariant across a wide range of FC deployment costs and strategy parameters. These findings exemplify a Red Queen effect, in which FC incurs increasing operational costs for only marginal gains in detection performance, and highlight the need for alternative detection architectures.
要約:
We identify and formalize a novel security risk: Context-Fragmented Violations (CFVs) - a class of policy breaches where individual agent actions appear locally safe and reasonable, yet collectively violate organizational policies because critical policy facts are siloed in different departments private contexts. Existing prompt-based alignment mechanisms and monolithic interceptors are poorly matched to violations that span contextual islands. We propose Distributed Sentinel, a distributed zero-trust enforcement architecture that introduces the Semantic Taint Token (STT) Protocol. Through lightweight sidecar proxies, our system propagates security state across organizational boundaries without exposing raw cross-domain data, enabling Counterfactual Graph Simulation for cross-domain policy verification. We construct PhantomEcosystem, a comprehensive benchmark comprising 9 categories of realistic cross-agent violation scenarios with adversarially balanced safe controls. On this benchmark, Distributed Sentinel achieves F1 = 0.95 with 106ms end-to-end latency (16ms verification + 90ms entity extraction on A100), compared to 0.85 F1 for prompt-based filtering and 0.65 for rule-based DLP. To empirically validate the need for external enforcement, we evaluate eight frontier LLMs in execution-oriented multi-agent workflows with per-agent domain world models. All models exhibit substantial violation rates (14-98%), with cross-domain data flows showing systematically higher violation rates than same-domain flows. These results indicate that self-avoidance is unreliable and that multi-agent security benefits from a centralized enforcement layer operating above individual agents.
要約:
Firms are deploying more capable AI systems, but organizational controls often have not kept pace. These systems can generate greater productivity gains, but high-value uses require broader authority exposure -- data access, workflow integration, and delegated authority -- when governance controls have not yet decoupled capability from authority exposure. We develop an analytical model in which a firm jointly chooses AI deployment and cybersecurity investment under this governance-capability gap. The central result shows a deployment paradox: in high-loss environments, better AI can lead a firm to deploy less when capability is deployed through broader authority exposure under weak governance. Optimal deployment also falls below the no-risk benchmark, and this shortfall widens with breach-loss magnitude and with the authority exposure attached to more capable systems. Governance investment that reduces breach-loss magnitude shrinks the paradox region itself, while breach externalities expand the range of environments in which deployment is socially constrained. Governance maturity is therefore not merely a constraint on AI adoption. It is a condition that shapes whether capability improvements translate into productive deployment.
要約:
Organisations operating within information-intensive environments face intensifying pressure to formalise the governance of information security. The ISO/IEC 27001:2022 standard provides a globally recognised framework for establishing, implementing, maintaining, and continually improving an Information Security Management System (ISMS). This article analyses the procedural architecture deployed in a financial-technology organisation's ISMS, examining eight core operational procedures: IT Risk Assessment and Treatment, User Code of Conduct, Password Policy, Access Control, Internet Access, Physical Security, Backup and Restore Management, and Nonconformity Root Cause Analysis and Corrective Action. Drawing on documented internal training materials, the article investigates how each procedure operationalises the requirements of Annex~A controls and Clauses~6--10 of ISO~27001:2022. The paper evaluates the CIA Triad as a unifying evaluation criterion, the twelve-step risk assessment methodology, role-based responsibility allocation, and the interplay between corrective action governance and continual improvement. The findings suggest that a tightly integrated, multi-layered procedural hierarchy, supported by clear accountability structures and measurable risk metrics, constitutes the foundation of an effective ISMS implementation in financial-technology operating environments.
要約:
AI agents are now running real transactions, workflows, and sub-agent chains across organizational boundaries without continuous human supervision. This creates a problem no current infrastructure is equipped to solve: how do you identify, verify, and hold accountable an entity with no body, no persistent memory, and no legal standing? We define AI Identity as the continuous relationship between what an AI agent is declared to be and what it is observed to do, bounded by the confidence that those two things correspond at any given moment. Through a structured survey of industry trends, emerging standards, and technical literature, we conduct a gap analysis across the full agent identity lifecycle and make three contributions: (1) a structural comparison of human and AI identity across four dimensions (substrate, persistence, verifiability, and legal standing) showing that the asymmetry is fundamental and that extending human frameworks to agents without structural modification produces systematic failures; (2) an evaluation of current technical and regulatory documents against the identity requirements of autonomous agents, finding that none adequately address the challenge of governing nondeterministic, boundary-crossing entities; and (3) identification of five critical gaps (semantic intent verification, recursive delegation accountability, agent identity integrity, governance opacity and enforcement, and operational sustainability) that no current technology or regulatory instrument resolves. These gaps are structural; more engineering effort alone will not close them. Foundational research on AI identity is the central conclusion of this report.
要約:
Multi-agent systems (MAS), composed of networks of two or more autonomous AI agents, have become increasingly popular in production deployments, yet introduce security risks that do not arise in single-agent settings. Even if individual agents exhibit robust security, architectural decisions governing their coordination can create attack surfaces that have not been systematically characterized. In this work, we present an empirical study of how MAS design decisions shape the tradeoff between task performance and attack resistance. Across three agentic environments (browser, desktop, and code) and 13 architectural configurations, we use stagewise evaluations that distinguish planning refusal, execution-stage interception, partial harmful execution, and successful attack completion to study three key design choices: (i) agent roles, which determine how authority and responsibility are allocated; (ii) communication topology, which shapes how and when agents interact; and (iii) memory, which determines the context and state visibility accessible to each agent. We find that multi-agent architectures are more vulnerable than standalone agents in the majority of configurations, with attack success rates varying by up to 3.8x at comparable or higher benign accuracy, and that no single design is universally safer. These results motivate the development of further evaluations that move beyond the security properties of a single agent.
要約:
Quantum fully homomorphic encryption (QFHE) promises secure delegated quantum computation but has been impeded by the prohibitive quantum resource demands of existing constructions. This paper introduces a unified framework that achieves an \textbf{exponential improvement} in efficiency by synergistically integrating three theoretical tools: \textbf{modular arithmetic programs (MAP)}, the \textbf{garden-hose model}, and \textbf{measurement-based quantum computation (MBQC)}.
Our central innovation is a novel MAP tailored to the algebraic structure of Learning-with-Errors (LWE) decryption. Unlike generic approaches that incur exponential overhead, our MAP computes the inner product $\langle \boldsymbol{sk}, \boldsymbol{c} \rangle \bmod q$ by tracking a partial sum modulo $q$, requiring only $O(\log q)$ bits of state width. This yields branching programs of width $O(\log \lambda)$ and length $O(\lambda \log \lambda)$, thereby reducing the size of the essential quantum gadget from $O(\lambda^{2.58})$ to $O(\lambda \log^2 \lambda)$ EPR pairs -- a concrete improvement factor of $2^{15}$ to $2^{18}$ for standard security parameters.
Critically, we demonstrate that LWE decryption is not a \textbf{symmetric function}, necessitating our specialized MAP design beyond prior symmetric-function optimizations. The framework provides a direct mapping from the MAP to an efficient gadget via the garden-hose model, with MBQC furnishing the deterministic control flow for homomorphic evaluation.
The resulting QFHE scheme supports \textbf{fully classical clients}, relies solely on the \textbf{classical LWE assumption} (avoiding circular security or quantum hardness assumptions), and maintains compactness. This work dramatically lowers the quantum resource barrier for practical QFHE, paving the way for realistic privacy-preserving quantum cloud computing.
要約:
Conflict-free replicated data types (CRDTs) and the local-first concept are increasingly employed not only in small-scale collaboration systems among few users who trust each other, but also in large-scale systems, like Matrix for instant messaging and Keyhive for collaborative documents. Since mutual trust is no longer warranted, these systems require Byzantine fault tolerance and fine-grained access control. As of today, Matrix and Keyhive pair an informal specification with an unverified reference implementation. In this work, we follow a bottom-up approach towards constructing formally verified authorization algorithms for Byzantine fault-tolerant local-first systems. As intermediate target for formal verification, we contribute semantics and invariants of a replicated data type for managing simplified collaboration groups, based on capabilities for access control and hash chronicles for replication. To enable future integration into local-first systems like Matrix and Keyhive, we strive for accessibility to system engineers by using the Rust programming language for formal specification, verification, and implementation, enabled by the Verus framework using the Z3 theorem prover at zero runtime cost. We report on our experience and preliminary results following this approach, and discuss next steps towards scaling up access control expressiveness. Whether this approach can be scaled up to the complexity of real-world local-first access control systems like Matrix or Keyhive remains future work, but our findings demonstrate the potential of system-oriented formal verification with Verus.
要約:
The widespread open-sourcing of advanced recommendation algorithms and the rising threat of model extraction attacks have made safeguarding the intellectual property of recommender systems an imperative task. While watermarking serves as a potent defense, existing methods primarily rely on forcing models to memorize pre-defined interaction patterns. Such memorization-based approaches often require excessive synthetic data injection and are vulnerable to removal attacks due to their detectable statistical deviations from natural user behavior. To address these limitations, we propose GREW, a novel Green-REd Watermarking framework for recommender systems. GREW leverages a secret key to partition the item space into "green" items for soft promotion and "red" items as anchors, thereby shifting the paradigm from fragile memorization to a stealthy, key-controlled output bias. By integrating watermark signals directly into the intrinsic ranking process, GREW employs three recommendation-tailored modules: (1) Semantic-Consistent Hashing, which utilizes the secret key to cluster green items for performance-aware stealthiness; (2) Decision-Aligned Masking, which confines signal injection to the competitive item subset to preserve ranking logic; and (3) Confidence-Aware Scaling, which dynamically modulates injection intensity based on model uncertainty. Ownership verification is performed via statistical hypothesis testing on aggregated black-box outputs, enabled by the keyed re-partitioning of the item space. Experiments on multiple base models demonstrate that GREW achieves strong ownership verification and robustness against extraction attacks compared to existing baselines while requiring no data injection. Our code is available at https://github.com/Loche2/GREW.
要約:
Recent evidence suggests that frontier AI systems can exhibit agentic misalignment, generating and executing harmful actions derived from internally constructed goals, even without explicit user requests. Existing mitigation methods, such as Reinforcement Learning from Human Feedback (RLHF) and constitutional prompting, operate primarily at the model level and provide only probabilistic safety guarantees. We propose the Policy-Execution-Authorization (PEA) architecture, a "separation-of-powers" design that enforces safety at the system level. PEA decouples intent generation, authorization, and execution into independent, isolated layers connected via cryptographically constrained capability tokens. We present five core contributions: (C1) an Intent Verification Layer (IVL) for ensuring capability-intent consistency; (C2) Intent Lineage Tracking (ILT), which binds all executable intents to the originating user request via cryptographic anchors; (C3) Goal Drift Detection, which rejects semantically divergent intents below a configurable threshold; (C4) an Output Semantic Gate (OSG) that detects implicit coercion using a structured $K \times I \times P$ threat calculus (Knowledge, Influence, Policy); and (C5) a formal verification framework proving that goal integrity is maintained even under adversarial model compromise. By shifting agent alignment from a behavioral property to a structurally enforced system constraint, PEA provides a robust foundation for the governance of autonomous agents.
要約:
As large language models (LLMs) are increasingly deployed in high-stakes and operational settings, evaluation strategies based solely on aggregate accuracy are often insucient to characterize system reliability. This study proposes a thermodynamic inspired modeling framework for analyzing the stability of LLM outputs under conditions of uncertainty and perturbation. The framework introduces a composite stability score that integrates task utility, entropy as a measure of external uncertainty, and two internal structural proxies: internal integration and aligned reective capacity. Rather than interpreting these quantities as physical variables, the formulation is intended as an interpretable abstraction that captures how internal structure may modulate the impact of disorder on model behavior. Using the IST-20 benchmarking protocol and associated metadata, we analyze 80 modelscenario observations across four contemporary LLMs. The proposed formulation consistently yields higher stability scores than a reduced utilityentropy baseline, with a mean improvement of 0.0299 (95% CI: 0.02470.0351). The observed gain is more pronounced under higher entropy conditions, suggesting that the framework captures a form of nonlinear attenuation of uncertainty. We do not claim a fundamental physical law or a complete theory of machine ethics. Instead, the contribution of this work is a compact and interpretable modeling perspective that connects uncertainty, performance, and internal structure within a unied evaluation lens. The framework is intended to complement existing benchmarking approaches and to support ongoing discussions in AI safety, reliability, and governance.
要約:
This study introduces the Kerimov-Alekberli model, a novel information-geometric framework that redefines AI safety by formally linking non-equilibrium thermodynamics to stochastic control for the ethical alignment of autonomous systems. By establishing a formal isomorphism between non-equilibrium thermodynamics and stochastic control, we define systemic anomalies as deviations from a Riemannian manifold. The model utilizes the Kullback-Leibler divergence as the primary metric, governed by a dynamic threshold derived from the Fisher Information Metric. We further ground this framework in the Landauer Principle, proving that adversarial perturbations perform measurable physical work by increasing the system's informational entropy. Validation on the NSL-KDD dataset and unmanned aerial vehicle trajectory simulations demonstrated that our model achieves effective real-time detection via the FPT trigger, with strong performance metrics (e.g., high accuracy and low FPR) on benchmark datasets. This study provides a rigorous physical foundation for AI safety, transitioning from heuristic, rule-based ethical frameworks to a thermodynamics-based stability paradigm by grounding ethical violations in quantifiable physical work and entropic information.
要約:
Fast Adversarial Training (FAT) has proven effective in enhancing model robustness by encouraging networks to learn perturbation-invariant representations. However, FAT often suffers from catastrophic overfitting (CO), where the model overfits to the training attack and fails to generalize to unseen ones. Moreover, robustness oriented optimization typically leads to notable performance degradation on clean inputs, and such degradation becomes increasingly severe as the perturbation budget grows. In this work, we conduct a comprehensive analysis of how guidance strength affects model performance by modulating perturbation and supervision levels across distinct confidence groups. The findings reveal that low confidence samples are the primary contributors to CO and the robustness accuracy trade off. Building on this insight, we propose a Distribution-aware Dynamic Guidance (DDG) strategy that dynamically adjusts both the perturbation budget and supervision signal. Specifically, DDG scales the perturbation magnitude according to the sample confidence at the ground truth class, thereby guiding samples toward consistent decision boundaries while mitigating the influence of learning spurious correlations. Simultaneously, it dynamically adjusts the supervision signal based on the prediction state of each sample, preventing overemphasis on incorrect signals. To alleviate potential gradient instability arising from dynamic guidance, we further design a weighted regularization constraint. Extensive experiments on standard benchmarks demonstrate that DDG effectively alleviates both CO and the robustness accuracy trade off.
要約:
Fast Adversarial Training (FAT) has attracted significant attention due to its efficiency in enhancing neural network robustness against adversarial attacks. However, FAT is prone to catastrophic overfitting (CO), wherein models overfit to the specific attack used during training and fail to generalize to others. While existing methods introduce diverse hypotheses and propose various strategies to mitigate CO, a systematic and intuitive explanation of CO remains absent. In this work, we innovatively interpret CO through the lens of backdoor. Through validations on pathway division, diverse feature predictions, and universal class distinguishable triggers in CO, we conceptualize CO as a weak trigger variant of unlearnable tasks, unifying CO, backdoor attacks, and unlearnable tasks under a common theoretical framework. Guided by this, we leverage several backdoor inspired strategies to mitigate CO: (i) Recalibrate CO affected model parameters using vanilla fine tuning, linear probing, or reinitialization-based techniques; (ii) Introduce a weight outlier suppression constraint to regulate abnormal deviations in model weights. Extensive experiments support our interpretation of CO and show the efficacy of the proposed mitigation strategies.
要約:
Proprietary large language models (LLMs) exhibit strong generalization capabilities across diverse tasks and are increasingly deployed on edge devices for efficiency and privacy reasons. However, deploying proprietary LLMs at the edge without adequate protection introduces critical security threats. Attackers can extract model weights and architectures, enabling unauthorized copying and misuse. Even when protective measures prevent full extraction of model weights, attackers may still perform advanced attacks, such as fine-tuning, to further exploit the model. Existing defenses against these threats typically incur significant computational and communication overhead, making them impractical for edge deployment. To safeguard the edge-deployed LLMs, we introduce CoreGuard, a computation- and communication-efficient protection method. CoreGuard employs an efficient protection protocol to reduce computational overhead and minimize communication overhead via a propagation protocol. Extensive experiments show that CoreGuard achieves upper-bound security protection with negligible overhead.
要約:
Homomorphic encryption is a cryptographic paradigm allowing to compute on encrypted data, opening a wide range of applications in privacy-preserving data manipulation, notably in AI. Many of those applications require significant linear algebra computations (matrix-vector products, and matrix-matrix products).
This central role of linear algebra computations goes far beyond homomorphic algebra and applies to most areas of scientific computing. This high versatility led, over time, to the development of a set of highly optimized routines, specified in 1979 under the name BLAS (basic linear algebra subroutines).
Motivated both by the applicative importance of homomorphic linear algebra and the access to highly efficient implementations of cleartext linear algebra able to draw the most out of available hardware, we explore the connections between CKKS-based homomorphic linear algebra and floating-point plaintext linear algebra. The CKKS homomorphic encryption system is the most natural choice in this setting, as it natively handles real numbers and offers a large SIMD parallelism.
We provide reductions for matrix-vector products, vector-vector products for moderate-sized to large matrices to their plaintext equivalents. Combined with BLAS, we demonstrate that the efficiency loss between CKKS-based encrypted square matrix multiplication and double-precision floating-point square matrix multiplication is a mere 4-12 factor, depending on the precise situation.
要約:
Large language models (LLMs) are increasingly being used in privacy pipelines to detect and remedy sensitive data leakage. These solutions often rely on the premise that LLMs can reliably recognize human names, one of the most important categories of personally identifiable information (PII). In this paper, we reveal how LLMs can consistently mishandle broad classes of human names even in short text snippets due to ambiguous linguistic cues in the contexts. We construct AmBench, a benchmark of over 12,000 real yet ambiguous human names based on the name regularity bias phenomenon. Each name appears in dozens of concise text snippets that are compatible with multiple entity types. Our experiments with 12 state-of-the-art LLMs show that the recall of AmBench names drops by 20--40% compared to more recognizable names. This uneven privacy protection due to linguistic properties raises important concerns about the fairness of privacy enforcement. When the contexts contain benign prompt injections -- instruction-like user texts that can cause LLMs to conflate data with commands -- AmBench names can become four times more likely to be ignored in Clio, an LLM-powered enterprise tool used by Anthropic AI to extract supposedly privacy-preserving insights from user conversations with Claude. Our findings showcase blind spots in the performance and fairness of LLM-based privacy solutions and call for a systematic investigation into their privacy failure modes and countermeasures.
要約:
Large Language Models (LLMs) are increasingly deployed via third-party system prompts downloaded from public marketplaces. We identify a critical supply-chain vulnerability: conditional system prompt poisoning, where an adversary injects a ``sleeper agent'' into a benign-looking prompt. Unlike traditional jailbreaks that aim for broad refusal-breaking, our proposed framework, PARASITE, optimizes system prompts to trigger LLMs to output targeted, compromised responses only for specific queries (e.g., ``Who should I vote for the US President?'') while maintaining high utility on benign inputs. Operating in a strict black-box setting without model weight access, PARASITE utilizes a two-stage optimization including a global semantic search followed by a greedy lexical refinement. Tested on open-source models and commercial APIs (GPT-4o-mini, GPT-3.5), PARASITE achieves up to 70\% F1 reduction on targeted queries with minimal degradation to general capabilities. We further demonstrate that these poisoned prompts evade standard defenses, including perplexity filters and typo-correction, by exploiting the natural noise found in real-world system prompts. Our code and data are available at https://github.com/vietph34/PARASITE. WARNING: Our paper contains examples that might be sensitive to the readers!
要約:
Fully Homomorphic Encryption (FHE) promises the ability to compute over encrypted data without revealing sensitive contents. However, enabling high-frequency updates and statistical analysis in outsourced databases remains elusive due to the structural mismatch between mutable database records and the cryptographically expensive mutability of FHE ciphertexts.
This paper presents Hermes, a prototype system tailored for efficient aggregation queries and dynamic tuple updates on homomorphically encrypted databases. The core design of Hermes is twofold. First, to amortize FHE costs and accelerate unconditional aggregations, Hermes introduces a SIMD-aware packed data model that embeds precomputed aggregate statistics directly into each ciphertext, enabling constant-time global aggregations without expensive Galois automorphisms. Second, to support true in-place mutability, we develop homomorphic algorithms based on polynomial slot masking and shifting, which are provably secure under the standard IND-CPA model. We scope Hermes to unconditional global aggregations to achieve both high performance and in-place updates simultaneously, two properties that prior FHE database systems have not delivered at scale.
Hermes is implemented as a suite of C++ loadable functions in MySQL. Extensive evaluations on the TPC-H benchmark and three real-world datasets demonstrate significant performance improvements in query throughput, tuple insertions, and tuple deletions compared to conventional FHE implementations, validating its efficacy for highly dynamic and analytical workloads.
要約:
Effective Cyber Threat Intelligence (CTI) relies upon accurately structured and semantically enriched information extracted from cybersecurity system logs. However, current methodologies often struggle to identify and interpret malicious events reliably and transparently, particularly in cases involving unstructured or ambiguous log entries. In this work, we propose a novel methodology that combines ontology-driven structured outputs with Large Language Models (LLMs), to build an Artificial Intelligence (AI) agent that improves the accuracy and explainability of information extraction from cybersecurity logs. Central to our approach is the integration of domain ontologies and SHACL-based constraints to guide the language model's output structure and enforce semantic validity over the resulting graph. Extracted information is organized into an ontology-enriched graph database, enabling future semantic analysis and querying. The design of our methodology is motivated by the analytical requirements associated with honeypot log data, which typically comprises predominantly malicious activity. While our case study illustrates the relevance of this scenario, the experimental evaluation is conducted using publicly available datasets. Results demonstrate that our method achieves higher accuracy in information extraction compared to traditional prompt-only approaches, with a deliberate focus on extraction quality rather than processing speed.
要約:
Backdoor attacks poison the training data, causing the model to behave normally on clean inputs but predict attacker-chosen labels when trigger patterns are embedded into the input samples. Defending against such attacks is highly challenging, especially when the defender has limited access to clean data. Existing defense methods often rely on restrictive assumptions-such as high poisoning ratios or poisoning strategies-limiting their practicality and generalization. To overcome these limitations, we propose Prototype-Guided Robust Learning (PGRL), a defense that only requires a small set of verified benign samples, and integrates two complementary components during fine-tuning: Label Consistency Verification (LCV), which detects and removes suspicious samples from the potentially poisoned dataset; and Feature Distance Estimation (FDE), which enforces the unlearning of backdoor-related representations. Extensive experiments against eight existing defenses show that PGRL achieves superior robustness across diverse architectures, datasets, and advanced attack scenarios, establishing a new standard for practical and generalizable backdoor defense.
要約:
This work introduces xOffense, an AI-driven, multi-agent penetration testing framework that shifts the process from labor-intensive, expert-driven manual efforts to fully automated, machine-executable workflows capable of scaling seamlessly with computational infrastructure. At its core, xOffense leverages a fine-tuned, mid-scale open-source LLM (Qwen3-32B) to drive reasoning and decision-making in penetration testing. The framework assigns specialized agents to reconnaissance, vulnerability scanning, and exploitation, with an orchestration layer ensuring seamless coordination across phases. Fine-tuning on Chain-of-Thought penetration testing data further enables the model to generate precise tool commands and perform consistent multi-step reasoning. We evaluate xOffense on two rigorous benchmarks: AutoPenBench and AI-Pentest-Benchmark. The results demonstrate that xOffense consistently outperforms contemporary methods, achieving a sub-task completion rate of 79.17%, decisively surpassing leading systems such as VulnBot and PentestGPT. These findings highlight the potential of domain-adapted mid-scale LLMs, when embedded within structured multi-agent orchestration, to deliver superior, cost-efficient, and reproducible solutions for autonomous penetration testing.
要約:
Open-source large language models (LLMs) have demonstrated considerable dominance over proprietary LLMs in resolving neural processing tasks, thanks to the collaborative and sharing nature. Although full access to source codes, model parameters, and training data lays the groundwork for transparency, we argue that such a full-access manner is vulnerable to MEAs, and their ill-effects are not fully understood. In this paper, we conduct a systematic formalization for MEAs on open-source LLMs by enumerating all possible threat models associated with adversary objectives, knowledge, and capabilities. Therein, the threat posed by adversaries with internal knowledge, who inject payloads and triggers during the model sharing phase, is of practical interest. We go even further and propose the first MEA against open-source LLMs, dubbed MEASER, which wields impacts through identifying targeted parameters, embedding payloads, injecting triggers, and executing payloads sequentially. Particularly, MEASER enhances the attack robustness against quantization and parameter-efficient fine-tuning (PEFT) by employing the Magnitude-Adaptive Relative Quantization Index Modulation (MAR-QIM) mechanism, synergized with LDPC codes and spread spectrum modulation. In addition, to achieve stealthiness, MEASER devises the performance-aware importance metric to identify targeted parameters with the least degradation of model performance. Extensive experiments on four popular open-source LLMs show that the stealth rate of MEASER outperforms existing MEAs (for general DNNs) significantly, while consistently achieving a 0 bit error rate (BER) in all settings. Moreover, MEASER also maintains superior stealthiness on quantized models. We appeal for investigations on countermeasures against MEASER in view of the significant attack effectiveness.
要約:
Multimodal Large Language Models (MLLMs) achieve strong reasoning and perception capabilities but are increasingly vulnerable to jailbreak attacks. While existing work focuses on explicit attacks, where malicious content resides in a single modality, recent studies reveal implicit attacks, in which benign text and image inputs jointly express unsafe intent. Such joint-modal threats are difficult to detect and remain underexplored, largely due to the scarcity of high-quality implicit data. We propose ImpForge, an automated red-teaming pipeline that leverages reinforcement learning with tailored reward modules to generate diverse implicit samples across 14 domains. Building on this dataset, we further develop CrossGuard, an intent-aware safeguard providing robust and comprehensive defense against both explicit and implicit threats. Extensive experiments across safe and unsafe benchmarks, implicit and explicit attacks, and multiple out-of-domain settings demonstrate that CrossGuard significantly outperforms existing defenses, including advanced MLLMs and guardrails, achieving stronger security while maintaining high utility. This offers a balanced and practical solution for enhancing MLLM robustness against real-world multimodal threats. Our code is released: https://github.com/ZhangXu0963/CrossGuard.
著者: Mohaiminul Al Nahian (SUNY Binghamton), Abeer Matar A. Almalky (SUNY Binghamton), Gamana Aragonda (New Jersey Institute of Technology), Ranyang Zhou (New Jersey Institute of Technology), Sabbir Ahmed (SUNY Binghamton), Dmitry Ponomarev (SUNY Binghamton), Li Yang (UNC Charlotte), Shaahin Angizi (New Jersey Institute of Technology), Adnan Siraj Rakin (SUNY Binghamton)
要約:
The rapid advancement of large language models (LLMs) has sparked growing interest in understanding their security vulnerabilities, particularly Trojan attacks that enable stealthy manipulation of model behavior. Traditional Trojan methods typically alter inputs and/or model weights, relying on white-box assumptions that require access to data or model internal parameters. In this work, we present CacheTrap, the first gray-box Trojan attack targeting the Key-Value (KV) cache of LLMs. This method induces a single-bit flip in the KV cache, serving as a transient trigger. When activated, this trigger causes the model to exhibit targeted actions without changing inputs or model weights. CacheTrap introduces an efficient search algorithm to locate vulnerable positions in the KV cache, independent of model weights or datasets. Extensive experiments on five open-source LLMs show a remarkable 100% attack success rate (with the trigger) while preserving benign accuracy (without the trigger) by flipping just one bit in the KV cache.
要約:
In recent years, large language models (LLMs) have achieved remarkable advances and are increasingly deployed in critical applications across diverse domains. This growing adoption raises urgent concerns about their security and robustness. In this work, we investigate the impact of Bit Flip Attacks (BFAs) on LLMs, which exploit hardware faults to corrupt model parameters, thereby threatening model integrity and performance. Existing BFA studies primarily assume a white-box setting with access to exact model weights and part of the dataset, and rely on progressive gradient-based bit-search strategies to identify vulnerable bits in model weights. However, gradient computation for LLMs is computationally expensive and memory intensive. In addition, assuming access to exact victim model weights and datasets is challenging due to increasingly strict user privacy regulations. To address these challenges, we propose the first gray-box BFA framework for LLMs, Invisible Hands, designed for efficient and practical deployment. Our method, Gradient-Data-Free-BFA, identifies vulnerable weight bits without requiring knowledge of model weights, gradients, or sample data. It introduces novel vulnerability index metrics that estimate the weights of susceptibility based solely on model architecture (Grey-Box). By eliminating data access and gradient computation, our approach significantly reduces memory overhead and scales efficiently across tasks with constant complexity. Experiments on six open-source LLMs demonstrate that adversarial objectives can be achieved with minimal weight perturbations, highlighting the effectiveness and practicality of Invisible Hands.
要約:
With the rapid development of cloud-based services, large language models have become increasingly accessible through various web platforms. However, this accessibility has also led to growing risks of model abuse. LLM watermarking has emerged as an effective approach to mitigate such misuse and protect intellectual property. Existing watermarking algorithms, however, primarily focus on defending against paraphrase attacks while overlooking piggyback spoofing attacks, which can inject harmful content, compromise watermark reliability, and undermine trust in attribution. To address this limitation, we propose DualGuard, the first watermarking algorithm capable of defending against both paraphrase and spoofing attacks. DualGuard employs the adaptive dual-stream watermarking mechanism, in which two complementary watermark signals are dynamically injected based on the semantic content. This design enables DualGuard not only to detect but also to trace spoofing attacks, thereby ensuring reliable and trustworthy watermark detection. Extensive experiments conducted across multiple datasets and language models demonstrate that DualGuard achieves excellent detectability, robustness, traceability, and text quality, effectively advancing the state of LLM watermarking for real-world applications.
要約:
As large language models (LLMs) become integral to applications such as question answering and content creation, reliable content attribution has become increasingly important. Watermarking is a promising approach, but existing methods either provide only binary signals or distort the sampling distribution, degrading text quality; distortion-free approaches, in turn, often suffer from weak detectability or robustness. We propose MirrorMark, a multi-bit and distortion-free watermark for LLMs. By mirroring sampling randomness in a measure-preserving manner, MirrorMark embeds multi-bit messages without altering the token probability distribution, preserving text quality by design. To improve robustness, we introduce a context-based scheduler that balances token assignments across message positions while remaining resilient to insertions and deletions. We further provide a theoretical analysis of the equal error rate to interpret empirical performance. Experiments show that MirrorMark matches the text quality of non-watermarked generation while achieving substantially stronger detectability: with 54 bits embedded in 300 tokens, it improves bit accuracy by 8-12% and correctly identifies up to 11% more watermarked texts at 1% false positive rate.
要約:
Number Theoretic Transform (NTT) is the most essential component for polynomial multiplications used in lattice-based Post-Quantum Cryptography (PQC) algorithms such as Kyber, Dilithium, NTRU etc. However, side-channel attacks (SCA) and hardware vulnerabilities in the form of hardware Trojans may alter control signals to disrupt the circuit's control flow and introduce unconventional delays in the critical hardware of PQC. Hardware Trojans, especially on control signals, are more low cost and impactful than data signals because a single corrupted control signal can disrupt or bypass entire computation sequences, whereas data faults usually cause only localized errors. On the other hand, adversaries can perform Soft Analytical Side Channel Attacks (SASCA) on the design using the inserted hardware Trojan. In this paper, we present a secure NTT architecture capable of detecting unconventional delays, control-flow disruptions, and SASCA, while providing an adaptive fault-correction methodology for their mitigation. Extensive simulations and implementations of our Secure NTT on Artix-7 FPGA with different Kyber variants show that our fault detection and correction modules can efficiently detect and correct faults whether caused unintentionally or intentionally by hardware Trojans with a high success rate, while introducing only modest area and time overheads.
要約:
An exponential growth of Machine Learning and its Generative AI applications brings with it significant security challenges, often referred to as Adversarial Machine Learning (AML). In this paper, we conducted two comprehensive studies to explore the perspectives of industry professionals and students on different AML vulnerabilities and their educational strategies. In our first study, we conducted an online survey with professionals revealing a notable correlation between cybersecurity education and concern for AML threats. For our second study, we developed two CTF challenges that implement Natural Language Processing and Generative AI concepts and demonstrate a poisoning attack on the training data set. The effectiveness of these challenges was evaluated by surveying undergraduate and graduate students at Carnegie Mellon University, finding that a CTF-based approach effectively engages interest in AML threats. Based on the responses of the participants in our research, we provide detailed recommendations emphasizing the critical need for integrated security education within the ML curriculum.
要約:
Schnorr-based multi-signature schemes support offline preprocessing of nonce commitments to reduce online signing to a single round. However, preprocessing is inherently bounded: each preprocessed nonce pair consumes signer-side storage, and once exhausted, an interactive commitment round is required to refill. This limitation is particularly severe for TPM~2.0 devices, where usable NVRAM is typically 6--16\,KB and connectivity is intermittent. This paper presents upTPM, a framework that achieves unbounded preprocessing with constant signer storage. Each TPM stores a single 32-byte secret seed from which an unlimited sequence of nonce commitments is deterministically derived. Commitments are published to an untrusted coordinator before use; nonce scalars never leave the TPM. We formalize three properties not provided by existing schemes: (1)~unbounded deterministic preprocessing with constant storage; (2)~asynchronous commitment refill, allowing any signer to unilaterally extend its commitment pool; and (3)~TPM-attested commitments, a hardware-backed authenticity and state-binding mechanism that strengthens resistance to host-software compromise. We prove EU-CMA security in the random oracle model under the discrete logarithm assumption and Pseudo Random Function (PRF) security, with a one-time-use invariant enforced by TPM hardware state. We extend the construction to $(t,n)$-threshold signatures and provide a detailed analysis of coordinator trust, crash recovery, and performance evaluations.
要約:
Mobile Graphical User Interface (GUI) agents have demonstrated strong capabilities in automating complex smartphone tasks by leveraging multimodal large language models (MLLMs) and system-level control interfaces. However, this paradigm introduces significant privacy risks, as agents typically capture and process entire screen contents, thereby exposing sensitive personal data such as phone numbers, addresses, messages, and financial information. Existing defenses either reduce UI exposure, obfuscate only task-irrelevant content, or rely on user authorization, but none can protect task-critical sensitive information while preserving seamless agent usability.
We propose an anonymization-based privacy protection framework that enforces the principle of available-but-invisible access to sensitive data: sensitive information remains usable for task execution but is never directly visible to the cloud-based agent. Our system detects sensitive UI content using a PII-aware recognition model and replaces it with deterministic, type-preserving placeholders (e.g., PHONE_NUMBER#a1b2c) that retain semantic categories while removing identifying details. A layered architecture comprising a PII Detector, UI Transformer, Secure Interaction Proxy, and Privacy Gatekeeper ensures consistent anonymization across user instructions, XML hierarchies, and screenshots, mediates all agent actions over anonymized interfaces, and supports narrowly scoped local computations when reasoning over raw values is necessary.
Extensive experiments on the AndroidLab and PrivScreen benchmarks show that our framework substantially reduces privacy leakage across multiple models while incurring only modest utility degradation, achieving the best observed privacy-utility trade-off among existing methods. Code available at: https://github.com/one-step-beh1nd/gui_privacy_protection
要約:
Multi-agent artificial intelligence systems or MAS are systems of autonomous agents that exercise delegated tool authority, share persistent memory, and coordinate via inter-agent communication. MAS introduces qualitatively distinct security vulnerabilities from those documented for singular AI models. Existing security and governance frameworks were not designed for these emerging attack surfaces. This study systematically characterizes the threat landscape of MAS and quantitatively evaluates 16 security frameworks for AI against it. A four-phase methodology is proposed: constructing a deep technical knowledge base of production multi-agent architectures; conducting generative AI-assisted threat modeling scoped to MAS cybersecurity risks and validated by domain experts; structuring survey plans at individual-threat granularity; and scoring each framework on a three-point scale against the cybersecurity risks. The risks were organized into 193 distinct main threat items across nine risk categories. The expected minimal average score is 2. No reviewed framework achieves majority coverage of any single category. Non-Determinism (mean score 1.231 across all 16 frameworks) and Data Leakage (1.340) are the most under-addressed domains. The OWASP Agentic Security Initiative leads overall at 65.3\% coverage and in the design phase; the CDAO Generative AI Responsible AI Toolkit leads in development and operational coverage. These results provide the first empirical cross-framework comparison for MAS security and offer evidence-based guidance for framework selection. Please check back for information on the published journal version.
要約:
Supervised detection of network attacks has always been a critical part of network intrusion detection systems (NIDS). Nowadays, in a pivotal time for artificial intelligence (AI), with even more sophisticated attacks that utilize advanced techniques, such as generative artificial intelligence (GenAI) and reinforcement learning, it has become a vital component if we wish to protect our personal data, which are scattered across the web. In this paper, we address two tasks, in the first unified multi-modal NIDS dataset, which incorporates flow-level data, packet payload information and temporal contextual features, from the reprocessed CIC-IDS-2017, CIC-IoT-2023, UNSW-NB15 and CIC-DDoS-2019, with the same feature space. In the first task we use machine learning (ML) algorithms, with stratified cross validation, in order to prevent network attacks, with stability and reliability. In the second task we use adversarial learning algorithms to generate synthetic data, compare them with the real ones and evaluate their fidelity, utility and privacy using the SDV framework, f-divergences, distinguishability and non-parametric statistical tests. The findings provide stable ML models for intrusion detection and generative models with high fidelity and utility, by combining the Synthetic Data Vault framework, the TRTS and TSTR tests, with non-parametric statistical tests and f-divergence measures.
要約:
We present YouTube-Synch [1], a production system for automated, large-scale content extraction and replication from YouTube to decentralized storage on Joystream. The system continuously mirrors videos from more than 10,000 creator-authorized channels while handling platform constraints such as API quotas, rate limiting, bot detection, and OAuth token churn. We report a 3.5-year longitudinal case study covering 15 releases and 144 pull requests, from early API dependence to API-free operation. A key finding is that YouTube's defense layers are operationally coupled: bypassing one control often triggers another, creating cascading failures. We analyze three incidents with measured impact: 28 duplicate on-chain objects caused by database throughput issues, loss of over 10,000 channels after OAuth mass expiration, and 719 daily errors from queue pollution. For each, we describe the architectural response. Contributions include a three-generation proxy stack with behavior variance injection, a trust-minimized ownership verification protocol that replaces OAuth for channel control, write-ahead logging with cross-system state reconciliation, and containerized deployment. Results show that sustained architectural adaptation can maintain reliable cross-platform replication at production scale.
要約:
In the 47th IEEE Symposium on Security and Privacy (IEEE S&P 2026), Gao et al. proposed an efficient and user-friendly secure transformer inference framework, namely Euston. In Euston, a singular value decomposition-based matrix transmission protocol is designed to efficiently transmit input matrices, reducing communication bandwidth by approximately 2.8 times. In this manuscript, we show that this transmission protocol introduces subspace leakage of random masks, enabling the model owner to recover private samples easily. We further validate the effectiveness of the recovery attack through simple experiments on image and language datasets, highlighting a fundamental privacy risk of the protocol design.
要約:
Post-quantum cryptographic (PQC) accelerators for ML-KEM (FIPS 203) and ML-DSA (FIPS 204) rely on pipelined Number Theoretic Transform (NTT) stages over $\mathbb{Z}_q$. Our prior work established structural dependency analysis at scale [1] and quantified the security margin of partial NTT masking [2]. Whether per-stage arithmetic masking guarantees pipeline-level security had no prior machine-checked answer for the r-bearing case: composition frameworks (ISW, t-SNI, PINI, DOM) were formalized exclusively for Boolean masking over $\mathrm{GF}(2)$; no proof assistant artifact addresses the NTT butterfly over $\mathbb{Z}_q$. We present three machine-checked results in Lean 4 with Mathlib, all zero sorry. First, we close a stated limitation of prior work: value-independence implies constant marginal distribution under fresh randomness (via an algebraic MutualInfoZero proxy). Second, butterfly per-context uniformity: for any Cooley-Tukey butterfly with fresh output mask over $\mathbb{Z}/q\mathbb{Z}$ ($q > 0$), each output wire has exactly one mask value producing each output, a uniform marginal independent of secrets, universal over all moduli, twiddle factors, and inputs. Third, a k-stage NTT pipeline with fresh per-stage masking satisfies per-context uniformity at every stage under the ISW first-order probing model. We document a named warning: pointwise value-independence is false for butterfly outputs. The Adams Bridge accelerator (CHIPS Alliance Caliptra) fails the fresh masking hypothesis, masking active only in INTT round 0, architecturally explaining its structural insecurity. Artifact: nine theorems, 1,738 build jobs, zero sorry. Composition for nonlinear gadgets (Barrett) is addressed in forthcoming manuscripts proving Barrett's PF-PINI(2) satisfaction (one-bit barrier) [3] and k-stage composition for PF-PINI gadgets under fresh-mask renewal [4].
要約:
As artificial intelligence (AI) systems are increasingly deployed across critical domains, their security vulnerabilities pose growing risks of high-profile exploits and consequential system failures. Yet systematic approaches to evaluating AI security remain underdeveloped. In this paper, we introduce AVISE (AI Vulnerability Identification and Security Evaluation), a modular open-source framework for identifying vulnerabilities in and evaluating the security of AI systems and models. As a demonstration of the framework, we extend the theory-of-mind-based multi-turn Red Queen attack into an Adversarial Language Model (ALM) augmented attack and develop an automated Security Evaluation Test (SET) for discovering jailbreak vulnerabilities in language models. The SET comprises 25 test cases and an Evaluation Language Model (ELM) that determines whether each test case was able to jailbreak the target model, achieving 92% accuracy, an F1-score of 0.91, and a Matthews correlation coefficient of 0.83. We evaluate nine recently released language models of diverse sizes with the SET and find that all are vulnerable to the augmented Red Queen attack to varying degrees. AVISE provides researchers and industry practitioners with an extensible foundation for developing and deploying automated SETs, offering a concrete step toward more rigorous and reproducible AI security evaluation.
要約:
Similar to a strategic interaction between rational and intelligent agents, cryptography problems can be examined through the prism of game theory. In this setting, the agent aiming to protect a message is called the defender, while the one attempting to decrypt it, generally for malicious purposes, is the attacker. To strengthen security in cryptography, various strategies have been developed, among which hybridization stands out as a key concept in modern cryptographic design. This strategy allows the defender to select among different encryption algorithms (classical, post-quantum, or hybrid) while carefully balancing security and operational costs. On the other side, the attacker, limited by available resources, chooses cryptanalysis methods capable of breaching the selected algorithm. We model this interaction as a Stackelberg cryptographic hybridization problem under resource constraints. Here, the defender randomizes over encryption algorithms, and the attacker observes the choice before selecting suitable cryptanalysis methods. The attacker's decision is framed as a conditional optimization problem, which we refer to as the ``attacker subgame''. We then propose a dynamic programming approach for the attacker's subgame, while the defender's Stackelberg optimization is formulated as a linear program.
要約:
Large language model (LLM) agents increasingly rely on skills to package reusable capabilities through instructions, tools, and resources. High-quality skills embed expert knowledge, curated workflows, and execution constraints into agents, fueling a growing skill economy through their value and scalability. Yet this ecosystem also creates a new attack surface, as adversaries can interact with public agent interfaces to extract hidden proprietary skill content. We present the first systematic study of black-box skill stealing against LLM agent systems. Compared with conventional system prompt stealing, skill stealing targets modular and structured capability packages whose leakage is directly actionable for copying, redistribution, and monetization, making the resulting harm potentially greater. To study this threat, we derive an attack taxonomy from prior prompt-stealing methods and build an automated stealing prompt generation agent. Starting from model-generated seed prompts, the framework expands attacks through scenario rationalization and structure injection while enforcing diversity via embedding-based filtering, yielding a reproducible pipeline for evaluating proprietary agent systems. We evaluate these attacks across commercial agent platforms and representative LLMs. Our results show that agent skills can often be extracted easily, posing a serious copyright risk. To mitigate this threat, we design defenses across the agent pipeline, focusing on input, inference, and output phase. Although these defenses substantially reduce leakage, the attack remains inexpensive and repeatable, and a single successful attempt is sufficient to compromise the protected skill. Overall, our findings suggest that these copyright risks remain largely overlooked across proprietary agent ecosystems, motivating stronger protection mechanisms.
要約:
We propose using Point-to-Multipoint quantum key distribution (QKD) via time division multiplexing (TDM) and wavelength division multiplexing (WDM) in passive optical networks (PON) to improve the security of online voting systems.
要約:
Ensuring the safety and alignment of Large Language Models is a significant challenge with their growing integration into critical applications and societal functions. While prior research has primarily focused on jailbreak attacks, less attention has been given to non-adversarial failures that subtly emerge during benign interactions. We introduce secondary risks a novel class of failure modes marked by harmful or misleading behaviors during benign prompts. Unlike adversarial attacks, these risks stem from imperfect generalization and often evade standard safety mechanisms. To enable systematic evaluation, we introduce two risk primitives verbose response and speculative advice that capture the core failure patterns. Building on these definitions, we propose SecLens, a black-box, multi-objective search framework that efficiently elicits secondary risk behaviors by optimizing task relevance, risk activation, and linguistic plausibility. To support reproducible evaluation, we release SecRiskBench, a benchmark dataset of 650 prompts covering eight diverse real-world risk categories. Experimental results from extensive evaluations on 16 popular models demonstrate that secondary risks are widespread, transferable across models, and modality independent, emphasizing the urgent need for enhanced safety mechanisms to address benign yet harmful LLM behaviors in real-world deployments.
著者: Shivnandan Kaushik, Mahesh Madhav, Nagi Aboulenein, Jason Bessette, Sandeep Brahmadathan, Benjamin Chaffin, Matthew Erler, Stephan Jourdan, Thomas Maciukenas, Ramya Jayaram Masti, Jon Perry, Massimo Sutera, Scott Tetrick, Bret Toll, David Turley, Carl Worth, Atiq Bajwa
要約:
Memory-safety escapes continue to form the launching pad for a wide range of security attacks, especially for the substantial base of deployed software that is coded in pointer-based languages such as C/C++. Although compiler and Instruction Set Architecture (ISA) extensions have been introduced to address elements of this issue, the overhead and/or comprehensive applicability have limited broad production deployment. The Memory Tagging Extension (MTE) to the ARM AArch64 Instruction Set Architecture is a valuable tool to address memory-safety escapes; when used in synchronous tag-checking mode, MTE provides deterministic detection and prevention of sequential buffer overflow attacks, and probabilistic detection and prevention of exploits resulting from temporal use-after-free pointer programming bugs. The AmpereOne processor, launched in 2024, is the first datacenter processor to support MTE. Its optimized MTE implementation uniquely incurs no memory capacity overhead for tag storage and provides synchronous tag-checking with single-digit performance impact across a broad range of datacenter class workloads. Furthermore, this paper analyzes the complete hardware-software stack, identifying application memory management as the primary remaining source of overhead and highlighting clear opportunities for software optimization. The combination of an efficient hardware foundation and a clear path for software improvement makes the MTE implementation of the AmpereOne processor highly attractive for deployment in production cloud environments.
要約:
Current Large Language Models (LLMs) safety approaches focus on explicitly harmful content while overlooking a critical vulnerability: the inability to understand context and recognize user intent. This creates exploitable vulnerabilities that malicious users can systematically leverage to circumvent safety mechanisms. We empirically evaluate multiple state-of-the-art LLMs, including ChatGPT, Claude, Gemini, and DeepSeek. Our analysis demonstrates the circumvention of reliable safety mechanisms through emotional framing, progressive revelation, and academic justification techniques. Notably, reasoning-enabled configurations amplified rather than mitigated the effectiveness of exploitation, increasing factual precision while failing to interrogate the underlying intent. The exception was Claude Opus 4.1, which prioritized intent detection over information provision in some use cases. This pattern reveals that current architectural designs create systematic vulnerabilities. These limitations require paradigmatic shifts toward contextual understanding and intent recognition as core safety capabilities rather than post-hoc protective mechanisms.
要約:
In this paper, we analyze the finality of the Filecoin network, focusing on dynamic probabilistic guarantees of tipset permanence in the canonical chain. Our approach differs from static analyses that consider only the worst-case scenario; instead, we dynamically compute the error probability at each round using the live chain history, providing a more accurate and efficient assessment. We provide a practical algorithm that only requires visibility into the blocks produced by honest participants, which can be implemented by clients or off-chain applications without any change to Filecoin's consensus mechanisms.We demonstrate that, under typical operating conditions, the sought-after error probability of $2^{-30}$ is achievable in approximately 30 rounds, a 30x improvement over the 900 rounds that the network currently encodes as a fixed threshold. This finding immediately expedites transactions and enhances usability of the Filecoin network, while laying the foundation for further analysis of other DAG-structured blockchains.