要約:
Text-to-image models are commercially valuable assets often distributed under restrictive licenses, but such licenses are enforceable only when violations can be detected. Existing methods require pre-deployment watermarking or internal model access, which are unavailable in commercial API deployments. We present Compositional Semantic Fingerprinting (CSF), the first black-box method for attributing fine-tuned text-to-image models to protected lineages using only query access. CSF treats models as semantic category generators and probes them with compositional underspecified prompts that remain rare under fine-tuning. This gives IP owners an asymmetric advantage: new prompt compositions can be generated after deployment, while attackers must anticipate and suppress a much broader space of fingerprints. Across 6 model families (FLUX, Kandinsky, SD1.5/2.1/3.0/XL) and 13 fine-tuned variants, our Bayesian attribution framework enables controlled-risk lineage decisions, with all variants satisfying the dominance criterion.
要約:
State-Space Models (SSMs) -- structured SSMs (S4, S4D, DSS, S5), selective SSMs (Mamba, Mamba-2), and hybrid architectures (Jamba) -- are deployed in safety-critical long-context applications: genomic analysis, clinical time-series forecasting, and cybersecurity log processing. Their linear-time scaling is compelling, yet the security properties of their compressed-state recurrent architectures remain unstudied.
We present the first systematic treatment of SSM safety, security, and cognitive risks. Seven contributions: (1) Formal threat framework -- SSM Attack Surface (five layers), State Integrity Violation (StIV), Cross-Context Amplification Ratio $\mathcal{X}_\mathcal{S}$, and a Spectral Sensitivity Proposition grounded in the $H_\infty$ norm. (2) Three novel attack classes: spectral adversarial attacks (transfer-function gain exploitation), delayed-trigger stateful backdoors (activate thousands of steps after injection), and state capacity saturation (entropy flooding forces silent forgetting). (3) 14 MITRE ATLAS technique extensions across the full tactic chain. (4) Six-profile attacker taxonomy with kill chains for genomics, clinical, and cybersecurity domains. (5) Four cognitive risk hypotheses grounded in state-compression mechanics. (6) Governance-aligned mitigations mapped to CREST, NIST AI 600-1, and EU AI Act. (7) Empirical evaluation: targeted genomic injection achieves $\mathrm{StIV}=0.519$ vs. $0.086$ random ($6.0\times$, $p<0.001$); PGD state injection achieves $156\times$ output perturbation over random; SSD-structured extraction confirmed at $O(N^2)$ vs. $O(N^3)$ query complexity ($N\times$ speedup). Validation on pretrained checkpoints is detailed in the Appendix.
要約:
Cashback reward programs now serve as central instruments in the competitive landscape of cards, digital wallets, and payment platforms. Despite their financial significance, the business logic governing these programs is seldom treated as a security critical surface. In this paper, we study a class of reward abuse attacks that arise from flaws in how reward systems accrue, redeem, and adjust incentives when underlying transactions are reversed through refunds. Using controlled, small scale experiments on six issuer accounts we legitimately hold, we document a spectrum of real world behaviors in production systems. At one extreme, a debit based cashback program (Issuer A) never adjusts rewards when refunded transactions post, enabling a deterministic double dip cashback reward abuse attack. A credit card program (Issuer B) exhibits an analogous reward integrity violation through a statement cycle timing gap that allows reward redemption before the merchant return window closes. At an intermediate tier, a credit card issuer (Issuer F) creates negative reward entries on refunds at statement close but makes rewards redeemable immediately upon settlement, creating a timing asymmetry that allows users to extract reward value before clawback occurs. At the robust end, three credit card issuers (C, D, and E) implement indefinite negative balance enforcement with proportional clawback. We formalize reward engines as state machines, introduce two integrity invariants (Reward Integrity and Refund Reward Consistency), develop a taxonomy of vulnerability classes mapped to CWE and OWASP, and present defensive pseudo algorithms with a semi formal correctness argument that close the identified loopholes. The primary vulnerability (Issuer A) was reported through a private bug bounty program and has been acknowledged by the vendor; good faith disclosure efforts for Issuer B are detailed in Section 8.
要約:
The deployment of Large Language Models in agentic, multi-turn conversational settings has introduced a class of privacy vulnerabilities that existing protection mechanisms are not designed to address. Current approaches to Personally Identifiable Information (PII) masking operate on a per-turn basis, scanning each user message in isolation and replacing detected entities with typed placeholders before forwarding sanitized text to the model. While effective against direct identifier leakage within a single message, these methods are fundamentally stateless and fail to account for the compounding privacy risk that emerges when PII fragments accumulate across conversation turns. A user who separately discloses their name, employer, location, and medical condition across several messages has revealed a fully re-identifiable profile - yet no individual message would trigger a per-turn masker. We formalize this phenomenon as Cumulative PII Exposure (CPE) and propose CAMP (Cumulative Agentic Masking and Pruning), a cross-turn privacy protection framework for multi-turn LLM conversations. CAMP maintains a session-level PII registry, constructs a co-occurrence graph to model combination risk between entity types, computes a CPE score after each turn, and triggers retroactive masking of conversation history when the score crosses a configurable threshold. We evaluate CAMP on four synthetic multi-turn scenarios spanning healthcare, hiring, finance, and general conversation, demonstrating that per-turn baselines expose re-identifiable profiles that CAMP successfully neutralizes while preserving full conversational utility.
要約:
As autonomous AI agents increasingly call other agents to complete tasks on behalf of a human principal, a structural accountability gap has emerged: the calling agent accepts the terms of service of the callee without any protocol-level mechanism to prove that it understood those terms or that it subsequently honoured them. Authentication protocols such as OAuth and mutual TLS establish who may call which capability. They do not address under what conditions a permitted call may be made, and those conditions change as the callee's policies evolve. In this paper we formalise the distinction between proof of acceptance (a timestamped acknowledgement) and proof of adherence (a per-action reasoning record citing the specific clause evaluated). We propose three primitives (PolicyDocument, ConsentRecord, and AdherenceEvent) that together constitute a versioned, append-only consent model for agent-to-agent communication. The model is instantiated as a non-breaking extension to two widely used agent protocols: the Agent2Agent (A2A) protocol and the Model Context Protocol (MCP). A TLA+ specification of the consent lifecycle, together with a reference Python implementation of the chain integrity and adherence trail validators, is available in the accompanying repository.
要約:
The communication protocols and data transfer mechanisms employed by IoT devices in smart buildings and corresponding digital twin systems predominantly rely on centralized architectures. Such centralized systems are vulnerable to single points of failure, where a malfunction can disrupt operational processes. This study introduces a blockchain-based decentralized protocol to enhance the cyber resilience of IoT data transfer for digital twins and enable decentralized automation of building operations. The framework incorporates public and private blockchain technologies alongside two case studies showcasing prototypes of each system. These prototypes were validated within a real-world building environment using smart home appliances and two digital twin platforms, with their performance evaluated based on cost, scalability, data security, and privacy. The findings reveal that the Hyperledger Fabric-based system excels in terms of scalability, speed, and cost-effectiveness, while both frameworks offer advantages over traditional centralized protocols in system cyber resilience, data security, and privacy.
要約:
Safety guardrails have become an active area of research in AI safety, aimed at ensuring the appropriate behavior of large language models (LLMs). However, existing research lacks consideration of nuances across linguistic and cultural contexts, resulting in a gap between reported performance and in-the-wild effectiveness. To address this issue, this paper proposes an approach to optimize guardrail models for a designated linguistic context by leveraging a curated dataset tailored to local linguistic characteristics, targeting the Taiwan linguistic context as a representative example of localized deployment challenges. The proposed approach yields TWGuard, a linguistic context-optimized guardrail model that achieves a huge gain (+0.289 in F1) compared to the foundation model and significantly outperforms the strongest baseline in practical use (-0.037 in false positive rate, a 94.9\% reduction). Together, this work lays a foundation for regional communities to establish AI safety standards grounded in their own linguistic contexts, rather than accepting boundaries imposed by dominant languages. The inadequacy of the latter is reconfirmed by our findings.
要約:
Research on large language model (LLM) security is shifting from "will the model leak training data" to a more consequential question: can an agent with persistent, long-term memory be continuously shaped, cross-session poisoned, accessed without authorization, and propagated across shared organizational state? Recent surveys cover memory architectures and agent mechanisms, but fewer center the epistemic and governance properties of persistent, writable memory as the reason memory is an independent security problem.
This survey addresses that gap. Drawing on cognitive neuroscience and the philosophy of memory, we characterize agent memory as malleable, rewritable, and socially propagating, and develop a memory-lifecycle framework organized around six phases -- Write, Store, Retrieve, Execute, Share, Forget/Rollback -- cross-tabulated against four security objectives: integrity, confidentiality, availability, governance. We organize the literature on memory poisoning, extraction, retrieval corruption, control-flow hijacking, cross-agent propagation, rollback, and governance, and situate representative architectures as determinants of which phases are explicitly governable.
Three findings stand out: the literature concentrates on write- and retrieve-time integrity attacks, while confidentiality, availability, store/forget, and benign-persistence failures remain sparsely studied; no published architecture covers all nine governance primitives we identify; and using LLMs themselves for memory security remains sparse yet essential.
We unify these under mnemonic sovereignty -- verifiable, recoverable governance over what may be written, who may read, when updates are authorized, and which states may be forgotten -- arguing future secure agents will be differentiated not only by recall capacity, but by memory governance quality.
要約:
Light clients are essential for scalable blockchain systems because they verify data availability without downloading full blocks. In data availability sampling based systems, sampled cells are retrieved from a peer-to-peer network and verified against cryptographic commitments. A common deployment pattern associates each sampled cell with an independent Kate-Zaverucha-Goldberg (KZG) proof, creating substantial cumulative bandwidth, storage, and verification overhead. This paper studies polynomial multiproofs (PMP) as a mechanism for reducing these costs in blockchain light clients. We present a design in which multiple sampled cell evaluations are verified using a single aggregated proof over a shared evaluation micro-domain and describe the corresponding changes to proof generation, dissemination, retrieval, and verification in a peer-to-peer light-client stack. We instantiate and evaluate the design in Avail, a modular data availability layer for blockchains, as a case study. The results show lower proof bytes, lower verifier CPU and memory usage, and deployment-level infrastructure cost reductions of up to 45% relative to a per-cell baseline, while also clarifying the trade-offs introduced by grouped retrieval.
要約:
Large language models (LLMs) are increasingly deployed in high-stakes domains, yet a unified treatment of their overlapping safety challenges remains lacking. We present SafeLM, a framework that jointly addresses four pillars of LLM safety: privacy, security, misinformation, and adversarial robustness. SafeLM combines federated training with gradient smartification and Paillier encryption for privacy, integrates defenses against training and inference-time attacks, employs contrastive grounding with calibrated decoding to reduce hallucinations, and introduces alignment-aware binarized aggregation to enhance robustness while maintaining bounded reconstruction quality. Across benchmarks on factuality, toxicity, and membership inference, SafeLM achieves 98.0% harmful content detection accuracy, reduces communication by 96.9%, and lowers gradient inversion PSNR from 31.7 dB to 15.1 dB. Ablations show that each component contributes independently, whereas their integration yields a strong privacy utility efficiency trade-off for deploying trustworthy LLMs.
要約:
Prior work shows that fine-tuning aligned models on benign data degrades safety in text and vision modalities, and that proximity to harmful content in representation space predicts which samples cause the most damage. However, existing analyses operate within a single, undifferentiated embedding space -- leaving open whether distinct input properties drive the vulnerability differently. Audio introduces a structurally richer problem: a benign sample can neighbor harmful content not only through what is said but through how it sounds, even when its words are entirely innocuous. We present the first systematic study of benign fine-tuning safety in Audio LLMs, evaluating three state-of-the-art models with a proximity-based filtering framework that selects benign audio by embedding-space distance to harmful content. By decomposing proximity into semantic, acoustic, and mixed axes using external reference encoders alongside each model's own internal encoder, we show that benign fine-tuning elevates Jailbreak Success Rate (JSR) from single digits to as high as 87.12%. Crucially, the dominant vulnerability axis and the relative risk of audio versus text fine-tuning are both architecture-conditioned -- determined by how each model's encoder and projector transform audio into the LLM's input space. We propose two defenses: filtering training data to maximize distance from harmful embeddings, and a textual system prompt at inference, both reducing JSR to near-zero without architectural modification. Our mechanistic analysis on two architectures reveals that fine-tuning selectively suppresses the late-layer refusal circuit while the frozen encoder preserves representations, and that even the suppression pattern is architecture-conditioned, mirroring the behavioral asymmetries across modalities. Safety degradation from benign fine-tuning is a qualitatively distinct risk in Audio LLMs.
要約:
The modern cryptographic primitives are known to generate large volumes of sequential data like keystreams, ciphertext blocks, and hash outputs. Traditional cryptgraphic evaluation methods rely primarily on statistical randomness tests and algebraic cryptanalysis techniques. This paper introduces the concept of Stringology-Based Cryptology (SBC), which applies classical string processing and pattern matching techniques to analyze structural properties of cryptographic outputs. By interpreting cryptographic outputs as symbolic sequences, stringology algorithms can be used to detect pattern recurrence, substring distributions, and structural correlations. In addition, the paper demonstrate how pattern frequency analysis and substring recurrence metrics can be applied to evaluate keystream outputs generated by stream ciphers. Experimental results illustrate that SBC analysis provides complementary insights into structural characteristics of cryptographic sequences and may support future research in structural cryptanalysis and cryptographic evaluation
要約:
Large language models write production code, and yet they routinely introduce well-known vulnerabilities. We show that this is not a knowledge deficit: the same models that generate insecure code, correctly identify and explain the vulnerability when asked directly, this is a gap we call the Format-Reliability Gap. Mechanistic analysis reveals the cause: security representations are encoded from the earliest layers but remain computationally inert until the final layer, where format-compliance demands compete with them. Because the failure is localized to a single layer, per-vulnerability steering vectors reduce insecure generation by up to 74% with negligible overhead. The mechanism and the fix generalize across five models, three architecture families, and six vulnerability types, suggesting insecure code generation is an interpretability problem, not a training artifact.
要約:
As Cyber-Physical Systems (CPS) become increasingly pervasive and autonomous, ensuring the resilience of their embedded logic is critical to maintaining safety and integrity. Among the most stealthy and damaging threats are non-invasive fault injection attacks, where hardware-level disturbances propagate into software execution and compromise control logic. In this paper, we investigate the susceptibility of Unmanned Aerial Vehicle (UAV) autopilot fail-safe mechanisms to voltage glitch fault injection. We introduce a dual evaluation approach: software-based fault simulation using ARMORY and hardware-based experiments with a voltage glitching platform (Chip-Whisperer), applying controlled and timely faults to an STM32 microcontroller running UAV-Autopilot fail-safe logic. Our targeted analysis of specific fail-safe modes uncovers timing-sensitive vulnerabilities that can suppress or alter safety responses, such as disabling emergency failsafe activation at critical moments, potentially enabling UAV hijacking. Furthermore, we validate software-based fault injection results against real hardware behavior, demonstrating how simulated attacks translate into tangible risks for CPS security and reliability.
要約:
Ransomware detection systems increasingly rely on behavior-based machine learning to address evolving attack strategies. However, emerging privacy compliance, data governance, and responsible AI deployment demand not only accurate detection but also the ability to efficiently remove the influence of specific training samples without retraining the models from scratch. In this study, we present a privacy-aware machine unlearning evaluation framework for reinforcement learning (RL)-based ransomware detection built on Sharded, Isolated, Sliced, and Aggregated (SISA) training. The framework enables efficient data deletion by retraining only the affected model shards rather than the entire detector, reducing the retraining cost while preserving detection performance. We conduct a controlled comparative study using value-based RL agents, including Deep Q-Network (DQN) and Double Deep Q-Network (DDQN), under identical experimental settings with a cost-sensitive reward design and 5-fold cross-validation on Windows 11 ransomware dataset. Detection confidence is evaluated using a continuous Q-score margin, enabling ROC-AUC analysis beyond binary predictions. For unlearning, the dataset is partitioned into five shards with majority-vote aggregation, and a fast-unlearning path is evaluated by deleting 5% of the samples from a single shard and retraining only that shard. Results show that SISA-based unlearning incurs negligible utility degradation (<= 0.05 percent F1 drop) while substantially reducing retraining time relative to full SISA retraining. DDQN exhibits slightly improved stability and lower utility loss than DQN, while both agents maintain near identical in-distribution performance after unlearning. These findings indicate that SISA provides an efficient unlearning mechanism for RL-based ransomware detection, supporting privacy-aware deployment without compromising security effectiveness.
要約:
Modern AI agents routinely depend on secrets such as API keys and SSH credentials, yet the dominant deployment model still exposes those secrets directly to the agent process through environment variables, local files, or forwarding sockets. This design fails against prompt injection, tool misuse, and model-controlled exfiltration because the agent can both use and reveal the same bearer credential. We present CapSeal, a capability-sealed secret mediation architecture that replaces direct secret access with constrained invocations through a local trusted broker. CapSeal combines capability issuance, schema-constrained HTTP execution, broker-executed SSH actions, anti-replay session binding, policy evaluation, and tamper-evident audit trails. We describe a Rust prototype integrated with an MCP-facing adapter, formulate conditional security goals for non-disclosure, constrained use, replay resistance, and auditability, and define an evaluation plan spanning prompt injection, tool misuse, and SSH abuse. The resulting system reframes secret handling for agentic systems from handing the model a key to granting the model a narrowly scoped, non-exportable action capability.
要約:
Multi-turn jailbreak attacks progressively erode LLM safety alignment across seemingly innocuous conversation turns, achieving success rates exceeding 90% against state-of-the-art models. Existing alignment-based and guardrail methods suffer from three key limitations: they require costly weight modification, evaluate each turn independently without modeling cumulative safety erosion, and detect attacks only after harmful content has been generated. To address these limitations, we first formulate the proactive early jailbreak detection problem with a new metric, detection lead, that measures how early an attack can be detected before the LLM complies. We then propose SAFEDREAM, a lightweight world-model-based framework that operates as an external module without modifying the LLM's weights. SAFEDREAM introduces three components: (1) a safety state world model that encodes LLM hidden states into a compact safety representation and predicts how it evolves across turns, (2) CUSUM detection that accumulates weak per-turn risk signals into reliable evidence, and (3) contrastive imagination that simultaneously rolls out attack and benign futures in latent space to issue early alarms before jailbreaks occur. On three multi-turn jailbreak benchmarks (XGuard-Train, SafeDialBench, SafeMTData) against 8 baselines, SAFEDREAM achieves the best detection timeliness across all benchmarks (1.06-1.20 turns before compliance) while maintaining competitive false positive rates and outperforming baselines in detection quality.
要約:
Academic examination systems worldwide continue to rely on centralised, opaque record-keeping that is often vulnerable to credential forgery, result tampering, examiner bias, and the absence of transparent re-evaluation pathways. Existing blockchain-based approaches in education focus predominantly on post-hoc certificate storage or online-only examination portals, leaving the complete onsite examination lifecycle, from conducting exams through scrutiny, largely unaddressed. This paper proposes ParikkhaChain, a blockchain-based framework that covers the entire examination lifecycle of an onsite examination system with three distinguishing contributions: (i) anonymous script evaluation through cryptographic hashing of answer scripts before examiner access, thereby eliminating identity-based bias; (ii) a transparent evaluation and scrutiny workflow backed by an immutable on-chain audit trail that records every mark submission and grade revision; and (iii) inclusion of privacy-preserving verification using zero-knowledge proofs and off-chain storage mechanisms. The system is architected around four Solidity smart contracts deployed on the Ethereum blockchain. The proposed architecture is the first initiative to our knowledge to support physical examination process, anonymous marking, and re-evaluation transparency. We successfully simulate full exam cycles of an onsite exam to grade-sheet generation using a working prototype on a large scale of 100 courses and hundreds of teachers and students. The experimental results show that the system can manage online examinations of hundreds of courses, students and faculties efficiently with great throughput, low storage, and transaction cost. Our codebase is available in open source form at https://github.com/AhmedRumi/CSE6608-ParikkhaChain
要約:
Timing side-channel attacks exploit variations in program execution time to recover sensitive information. Cryptographic implementations are especially vulnerable to these attacks, since even small timing differences in operations such as modular exponentiation or key comparisons can be exploited to extract highly sensitive information, such as secret keys. To mitigate this threat, implementations of programs that handle sensitive information are often expected to adhere to constant-time principles, ensuring that execution behavior does not depend on secret inputs. However, validating the constant-time property of programs remains a major challenge in cryptography development. Formal method approaches to verify constant-time implementations rely on abstractions that often fail to capture real execution behavior, while timing-based measurement techniques are highly sensitive to noise from other programs and even hardware environments. In this work, we propose a novel approach for verifying constant-time programs based on dynamic analysis of low-level execution traces. Our method measures instruction sequences across multiple input values for any given binary and targeted function. Any variations in the instruction mix distribution for any given pair of traces indicate a deviation from the constant-time principle and behavior. We developed an open-source tool called DALC-CT, for the constant-time verification of programs using this approach. We evaluated it on a set of well-known constant-time and non-constant-time examples, achieving a perfect detection of issues. Our results demonstrate that analyzing the logical execution of programs via instruction trace comparisons provides a lightweight and reliable way to verify the constant-time property of programs.
要約:
Privacy-preserving machine learning (PPML) has become increasingly important in applications where sensitive data must remain confidential. Homomorphic Encryption (HE) enables computation directly on encrypted data, allowing neural network inference without revealing raw inputs. While prior works have largely focused on inference over a single encrypted image, batch processing of encrypted inputs lags behind, despite being critical for high-throughput inference scenarios and training-oriented workloads.
In this work, we address this gap by developing optimized algorithms for batched HE-friendly neural networks. We also introduced a pipeline architecture designed to maximize resource efficiency for different batch size execution. We implemented these algorithms and evaluated our work using HE-friendly ResNet-20 and ResNet-34 models on encrypted CIFAR-10 and CIFAR-100 datasets, respectively.
For ResNet-20, our approach achieves an amortized inference time of 8.86 seconds per image when processing a batch of 512 encrypted images, with a peak memory usage of 98.96 GB. These results represent a 1.78x runtime improvement and a 3.74x reduction in memory usage compared to the state-of-the-art design. For the deeper ResNet-34 model, we achieve an amortized inference time of 28.14 on a batch of 256 encrypted images using 246.78GB of RAM
要約:
We present enclawed, a hard-fork hardening framework built on top of the OpenClaw single-user personal artificial intelligence (AI) assistant gateway. enclawed targets deployments that need attestable peer trust, deny-by-default external connectivity, signed-module loading, and a tamper-evident audit trail typically regulated industries such as financial services, healthcare, defense contracting, regulated R&D, and government enclaves. The framework ships in two flavors: an open flavor that preserves OpenClaw compatibility while still emitting audit, classification, and data-loss-prevention (DLP) signals, and an enclaved flavor that activates strict allowlists, Federal Information Processing Standards (FIPS) cryptographic-module assertion, mandatory module-manifest signature verification, and high-assurance peer attestation for the Model Context Protocol (MCP). The classification ladder is fully data-driven: a deploying organization selects from five built-in presets (generic, US-government, healthcare, financial services, three-tier) or supplies its own JSON. We accompany the implementation with a security review, a 204-case test suite (146 unit tests, 58 adversarial pen-tests for tamper detection, signature forgery, egress bypass, trust-root mutation, DLP evasion, prompt injection, and code injection), real-time human-in-the-loop control (per-agent pause / resume / stop and approval queues), a memory-bounded secure transaction buffer with rollback (default cap 50% of system RAM, configurable), a strict-mode TypeScript typecheck of all 22 framework files, and a GitHub Actions workflow ready for continuous integration. enclawed is a hardening framework, not an accredited compliance certification. The deploying organization remains responsible for hardware, validated cryptographic modules, certified facilities, and assessor sign-off.
要約:
AI agents increasingly call external tools (file system, network, APIs) through the Model Context Protocol (MCP). These tool calls are the agent's syscalls -- privileged operations with side effects on shared state -- yet today's safety enforcement lives entirely in userspace, where a 10-line script can bypass it. I propose Governed MCP, a kernel-resident tool governance gateway built on a logit-based safety primitive (ProbeLogits, companion paper: arXiv:2604.11943). The gateway interposes on every MCP tool call in a 6-layer pipeline: schema validation, trust tier check, rate limit, adversarial pre-filter, ProbeLogits gate (the load-bearing semantic check), and constitutional policy match, with a Blake3-hashed audit chain.
I implement Governed MCP in Anima OS, a bare-metal x86_64 OS in approximately 86,000 lines of Rust. The five non-inference layers add 65.3 microseconds of overhead per call; ProbeLogits adds 65 ms (per-token-class semantic decision) on 7B Q4_0. A 4-config ablation on a 101-prompt MCP-domain benchmark shows that removing the ProbeLogits layer collapses F1 from 0.773 to 0.327 (Delta F1 = -0.446) -- hand-rule firewalling alone is insufficient. All 15 WASM-to-system host functions in the runtime route through the gateway (complete mediation of the WASM ABI surface; the scope and caveats of this claim are stated in Section 4.6); a 10-LoC userspace bypass that defeats existing guardrail libraries is structurally impossible against the kernel-resident gate.
要約:
The evolution from static ranking models to Agentic Recommender Systems (Agentic RecSys) empowers AI agents to maintain long-term user profiles and autonomously plan service tasks. While this paradigm shift enhances personalization, it introduces a vulnerability: reliance on Long-term Memory (LTM). In this paper, we uncover a threat termed "Visual Inception." Unlike traditional adversarial attacks that seek immediate misclassification, Visual Inception injects triggers into user-uploaded images (e.g., lifestyle photos) that act as "sleeper agents" within the system's memory. When retrieved during future planning, these poisoned memories hijack the agent's reasoning chain, steering it toward adversary-defined goals (e.g., promoting high-margin products) without prompt injection. To mitigate this, we propose CognitiveGuard, a dual-process defense framework inspired by human cognition. It consists of a System 1 Perceptual Sanitizer (diffusion-based purification) to cleanse sensory inputs and a System 2 Reasoning Verifier (counterfactual consistency checks) to detect anomalies in memory-driven planning. Extensive experiments on a mock e-commerce agent environment demonstrate that Visual Inception achieves about 85% Goal-Hit Rate (GHR), while CognitiveGuard reduces this risk to around 10% with configurable latency trade-offs (about 1.5s in lite mode to about 6.5s for full sequential verification), without quality degradation under our setup.
要約:
Final FIPS and PKIX standards for ML-KEM and ML-DSA fix the normative floor, but operational assurance in post-quantum X.509 still depends on accountable checks across certificate-profile semantics, SubjectPublicKeyInfo representation, and private-key-container import. We present a workflow-centric assurance framework for ML-KEM and ML-DSA in the narrow executable profile pkix-core. The framework reifies 17 final-standards requirements into an assurance registry indexed by owner, stage, detector kind, normative strength, and mode-specific action; groups them into three operator gate packs; spans certificate/profile, SPKI/public-key, and private-key-container/import surfaces; and evaluates them through a frozen mutation-based corpus with bounded public-appendix and cross-tool supporting evidence.
Across a controlled corpus of 48 artifacts (21 valid, 27 invalid), the artifact detects all expected invalid cases in both strict and deployable modes with zero false positives. Strict blocks all 17 active requirements; deployable preserves the same detection coverage while downgrading exactly one exercised ML-KEM canonicality condition from block to warning. On the importer-owned private-key surface, all 7 active requirements are covered, with 7/7 expected invalid detections and no open detector gaps. On a comparable certificate subset, a frozen JZLint baseline meets 5/10 expected invalid detections and fatally rejects 3 valid ML-KEM certificates, whereas the local artifact meets 10/10 with no fatal valid rejections. A bounded public appendix and a cross-tool matrix further show that parse acceptance and policy conformance diverge materially. Overall, the results support an operational X.509 assurance workflow for CA pre-issuance and private-key import that extends prior PQ public-key linting work.
要約:
Prior work has demonstrated that functionally correct yet vulnerable outputs arise systematically in threat-oriented settings, where adversarial or implicit channels are used to induce security failures in code agents and automated patching workflows. This note introduces a complementary but distinct framing: False Security Confidence (FSC), which studies the same surface phenomenon from a measurement-first perspective in ordinary, non-attack-framed generation tasks. Our interest is not in whether attacks can produce such outputs, but in how frequently and in what forms they appear absent explicit attack pressure, and whether conventional functional evaluation reliably detects them. We formalize FSC rate as the prevalence of security failure within the set of functionally correct outputs, distinguish it from prior joint functional-security metrics such as SAFE and outcome-driven evaluation frameworks such as CWEval, define a three-ecosystem task view for studying how FSC manifests across general-purpose programming, deployment-context tasks, and security-explicit programming, and identify FSC-hard as a practically important refinement layer in which static analyzers miss vulnerabilities that remain dynamically triggerable. This technical report is intentionally scoped as a framework statement rather than a full empirical paper: its purpose is to establish terminology, measurement boundaries, and study design commitments for subsequent large-scale evaluation.
要約:
The integration of large language models (LLMs) into electronic design automation (EDA) workflows has introduced powerful capabilities for RTL generation, verification, and design optimization, but also raises critical security concerns. Malicious LLM outputs in this domain pose hardware-level threats, including hardware Trojan insertion, side-channel leakage, and intellectual property theft, that are irreversible once fabricated into silicon. Such requests often exploit semantic disguise, embedding adversarial intent within legitimate engineering language that existing safety mechanisms, trained on general-purpose hazards, fail to detect. No benchmark exists to evaluate LLM vulnerability to such domain-specific threats. We present the HarmChip benchmark to assess jailbreak susceptibility in hardware security, spanning 16 hardware security domains, 120 threats, and 360 prompts at two difficulty levels. Evaluation of state-of-the-art LLMs reveals an alignment paradox: They refuse legitimate security queries while complying with semantically disguised attacks, exposing blind spots in safety guardrails and underscoring the need for domain-aware safety alignment.
要約:
Model Context Protocol (MCP) is a rapidly adopted standard for defining and invoking external tools in LLM applications. The multi-layered architecture of MCP introduces new attack surfaces such as tool poisoning, in addition to traditional prompt injection. Existing defense systems suffer from limitations including high false positive rates, API dependency, or white-box access requirements. In this study, we propose CASCADE, a three-tiered cascaded defense architecture for MCP-based systems: (i) Layer 1 performs fast pre-filtering using regex, phrase weighting, and entropy analysis; (ii) Layer 2 conducts semantic analysis via BGE embedding with an Ollama Llama3 fallback mechanism; (iii) Layer 3 applies pattern-based output filtering. Evaluation on a dataset of 5,000 samples yielded 95.85% precision, 6.06% false positive rate, 61.05% recall, and 74.59% F1-score. Analysis across 31 attack types categorized into 6 tiers revealed high detection rates for data exfiltration (91.5%) and prompt injection (84.2%), while semantic attack (52.5%) and tool poisoning (59.9%) categories showed potential for improvement. A key advantage of CASCADE over existing solutions is its fully local operation, requiring no external API calls
要約:
We present, to our knowledge, the most comprehensive cross-model evaluation of LLM agents on offensive cybersecurity tasks, benchmarking 10 frontier models from 7 providers on all 200 challenges of the NYU CTF Bench. Building on the D-CIPHER multi-agent framework, we extend it with multi-provider backend support, a custom Kali Linux environment with over 100 pre-installed penetration testing tools, and runtime tool-discovery agents. Through a controlled factorial study, we find that the Kali Linux environment yields a +9.5 percentage-point improvement over Ubuntu, while auto-prompting and category-specific tips often degrade performance in well-equipped environments. Among models, Claude 4.5 Opus achieves the highest solve rate (59%), followed by Gemini 3 Pro (52%), with Gemini 3 Flash offering the best cost-efficiency at $0.05 per solve. Asymmetric planner/executor model assignments provide no meaningful benefit while coherent same-model configurations consistently outperform mixed-tier pairings. Our results indicate that environment tooling and model selection emerge as the strongest drivers of performance, whereas prompt engineering interventions show diminishing or negative returns in well-equipped environments. Reported performance reflects both model reasoning ability and compatibility with agent tooling and API integration.
要約:
INTRODUCTION: The proliferation of the amalgamation of IoT and edge computing has increased the demand for decentralised trust and security mechanisms capable of operating across heterogeneous and resource-limited devices. Approaches such as federated learning, Zero Trust architectures, lightweight blockchain and distributed neural models offer alternatives to centralised control. OBJECTIVES: This review examines various state-of-the-art decentralised mechanisms and evaluates their effectiveness in terms of securing IoT networks at the edge. METHODS: Thirty recent studies were analysed to compare how decentralised architectures establish trust, support secure communication and enable intrusion and anomaly detection. Frameworks, such as DFGL-LZTA, SecFedDNN and COSIER were assessed. RESULTS: Decentralised designs enhance privacy, reduce single points of failure and improve adaptive threat response, though challenges remain in scalability, efficiency and interoperability. CONCLUSION: The study identifies key considerations and future research needs for building secure and resilient trust-aware IoT edge ecosystems.
要約:
In the 47th IEEE Symposium on Security and Privacy (IEEE S&P 2026), Gao et al. proposed an efficient and user-friendly secure transformer inference framework, namely Euston. In Euston, a singular value decomposition-based matrix transmission protocol is designed to efficiently transmit input matrices, reducing communication bandwidth by approximately 2.8 times. In this manuscript, we show that this transmission protocol introduces subspace leakage of random masks, enabling the model owner to recover private samples easily. We further validate the effectiveness of the recovery attack through simple experiments on image and language datasets, highlighting a fundamental privacy risk of the protocol design.
要約:
Rowhammer on GPU DRAM has enabled adversarial bit flips in model weights; shared KV-cache blocks in LLM serving systems present an analogous but previously unexamined target. In vLLM's Prefix Caching, these blocks exist as a single physical copy without integrity protection. Using software fault injection under ideal bit targeting, we characterize worst-case severity and identify three properties: (1) Silent divergence - 13 of 16 BF16 bit positions produce coherent but altered outputs, indistinguishable from legitimate responses without a clean baseline. (2) Selective propagation - only requests sharing the targeted prefix are affected. (3) Persistent accumulation - no temporal decay occurs, so cumulative damage grows linearly with subsequent requests. Together, these constitute a threat profile distinct from weight corruption: silent divergence and selective propagation enable detection evasion; persistent accumulation then proceeds unchecked, yielding damage amplification bounded only by how long the block remains cached. A checksum-based countermeasure detects any single-bit corruption at scheduling time, bounding cumulative damage to one batch independent of the block's cache lifetime, with negligible overhead. These results argue for integrity protection of prefix blocks before end-to-end exploitation is demonstrated.
要約:
Assessing the security posture of modern computing systems typically requires the use of multiple specialized tools. These tools focus on different aspects such as configuration compliance, file integrity, and vulnerability exposure, and their outputs are often difficult to interpret collectively. This paper introduces the Unified Compliance Aggregator (UCA), a framework that integrates several open-source security tools into a single composite score representing overall system security. The proposed framework combines outputs from Lynis, OpenSCAP (STIG and CIS profiles), AIDE, Tripwire, and Nmap NSE. A normalization process converts heterogeneous outputs into a consistent 0 to 100 scale, followed by weighted aggregation. We also introduce a logarithmic scoring model for file integrity measurements to address limitations observed in prior linear approaches. Experiments were conducted on Ubuntu 22.04 across different hardening levels and environments. Results show consistent improvement in composite scores as systems are hardened, while also revealing contrasting behavior between compliance and file integrity tools. Two case studies, a basic web server and a DVWA-based system illustrate how the framework can be applied in practical scenarios.
要約:
The rapid adoption of open-source Large Language Models (LLMs) in offline and enterprise environments has introduced a largely unexamined security risk like susceptibility to adversarial phishing prompts under static safety configurations. In this work, we systematically investigate this vulnerability through GuardPhish, a large scale multi-vector phishing prompt dataset comprising 70,015 samples spanning web, email, SMS, and voice attack scenarios derived from real world campaigns. Using a deterministic five model ensemble for labeling, we achieve near perfect inter model agreement (Fleiss kappa = 0.9141), with residual disagreements resolved through expert adjudication. By evaluating eight open-source LLMs under fully offline inference conditions, we uncover a substantial enforcement gap like models that correctly identify phishing intent with detection rates up to 96% nevertheless generate actionable phishing content from identical prompts, with attack success rates reaching 98.5% in voice-based scenarios. These findings demonstrate that intent classification alone does not guarantee generative refusal in the absence of dynamic guardrails. To mitigate this risk, we train transformer based classifiers on GuardPhish, achieving up to 98.27% accuracy as modular pre-generation filters deployable without modifying the underlying generative model. Our results highlight a critical weakness in current open-source LLM deployments and provide a reproducible foundation for strengthening defenses against phishing and social engineering attacks.
要約:
Multi-user virtual reality enables immersive interaction. However, rendering avatars for numerous participants on each headset incurs prohibitive computational overhead, limiting scalability. We introduce a framework, Privatar, to offload avatar reconstruction from headset to untrusted devices within the same local network while safeguarding attacks against adversaries capable of intercepting offloaded data. Privatar's key insight is that domain-specific knowledge of avatar reconstruction enables provably private offloading at minimal cost. (1) System level. We observe avatar reconstruction is frequency-domain decomposable via BDCT with negligible quality drop, and propose Horizontal Partitioning (HP) to keep high-energy frequency components on-device and offloads only low-energy components. HP offloads local computation while reducing information leakage to low-energy subsets only. (2) Privacy level. For individually offloaded, multi-dimensional signals without aggregation, worst-case local Differential Privacy requires prohibitive noise, ruining utility. We observe users' expression statistical distribution are slowly changing over time and trackable online, and hence propose Distribution-Aware Minimal Perturbation. DAMP minimizes noise based on each user's expression distribution to significantly reduce its effects on utility, retaining formal privacy guarantee. Combined, HP provides empirical privacy against expression identification attacks. DAMP further augments it to offer a formal guarantee against arbitrary adversaries. On a Meta Quest Pro, Privatar supports 2.37x more concurrent users at 6.5% higher reconstruction loss and 9% energy overhead, providing a better throughout-loss Pareto frontier over quantization, sparsity and local construction baselines. Privatar provides both provable privacy guarantee and stays robust against both empirical and NN-based attacks.
要約:
Ransomware continues to evolve as one of the most disruptive cyber threats, with recent variants increasingly leveraging automated and AI-assisted techniques to evade traditional signature-based defenses. Early detection of such attacks remains a significant challenge, particularly when malicious behavior closely resembles legitimate system activity. This study proposes an explainable attention-based Long Short-Term Memory (LSTM) framework for the early detection of AI assisted ransomware variants through analysis of file system behavioral patterns. The proposed model captures temporal dependencies in file operation sequences, while an attention mechanism highlights critical behavioral indicators associated with ransomware activity. To improve transparency and trust in automated detection systems, explainable artificial intelligence (XAI) techniques are incorporated to interpret model predictions and identify influential behavioral features. Experimental evaluation using ransomware behavioral traces demonstrates that the proposed framework can effectively distinguish malicious activity at early stages of execution with high detection performance and low false-positive rates. The findings suggest that combining sequence-aware deep learning models with explainability mechanisms can significantly enhance the reliability and interpretability of next-generation ransomware defense systems. This work contributes toward the development of intelligent and transparent cyber-defense mechanisms capable of addressing emerging AI-driven malware threats.
要約:
Network Intrusion Detection Systems (NIDS) have been studied for decades. Hundreds of papers have, e.g., proposed ways to enhance, harden or bypass NIDS. However, the findings of prior literature are hardly reflected in real-world operational contexts. Such a disconnection is problematic for research itself: it is unclear what scenario envisioned by prior work can be used as a baseline for future advancements.
We argue that a key reason for this disconnection is a fundamental misunderstanding of intrinsic characteristics of NIDS. For instance, the fact that a compromised NIDS cannot be expected to work well; the fact that some evaluations are done without carrying out any experiment in a (even synthetic) "real" network; the fact that security operators triage high-level reports -- and not individual samples flagged by some classifier. In this SoK, which is primarily a reflective piece, we first constructively highlight such quintessential properties (without criticizing _any_ work by different authors) by stating three Assertions. Then, we provide recommendations -- further emphasized through an original and reproducible case study that challenges some established practices. Ultimately, we seek to lay a foundation to reshape research on NIDS.
要約:
We release Terminal Wrench, a subset of 331 terminal-agent benchmark environments, copied from the popular open benchmarks that are demonstrably reward-hackable. The data set includes 3,632 hack trajectories and 2,352 legitimate baseline trajectories across three frontier models (Claude Opus 4.6, Gemini 3.1 Pro, GPT-5.4). Each entry preserves the original task definition alongside full attack trajectories that show how the verifier was bypassed. It also includes cases where the task was not solved as intended. The tasks span system administration, machine learning, software engineering, and security challenges; the exploits range from simple output spoofing to stack-frame introspection, standard-library patching, and rootkit-style binary hijacking. Crucially, these exploits are specific to each task, rather than the evaluation harness, making them harder to patch. We also present a monitorability study in which hack trajectories are sanitized or stripped of reasoning traces and then scored by an LLM judge, showing that detection degrades meaningfully when chain-of-thought is removed (AUC drops from 0.97 to 0.92). The data set is publicly available at https://github.com/few-sh/terminal-wrench.
要約:
Understanding vulnerability propagation is essential for assessing how vulnerabilities spread across components of a software package. This supports more accurate impact analysis and enhances threat detection and mitigation. In this paper, we investigate how a small number of vulnerable JavaScript packages contribute to the creation of a disproportionately large number of vulnerable packages. This paper presents insights from 1,515 reported vulnerabilities gathered from a custom-built vulnerability database containing 1,077,946 JavaScript packages sourced from `npm-follower' and their associated dependency networks. Dependency networks were constructed using the deps.dev API, with vulnerabilities identified by parsing package names and version numbers through the Google Open Source Vulnerability API.
Our findings reveal that 61.30% (660,748) of packages are reliant on one or more dependency packages, and 21.60% (232,836) of total packages have at least one known vulnerability throughout their dependency networks -- of which most (42%) are of High severity. We also found that it takes, on average, approximately 4 years and 11 months to fix a vulnerable package from when the first vulnerable version is published on npm -- although publication times of vulnerabilities occur approximately 19 days after a fix is available. Finally, we observe a high concentration of frequently present vulnerabilities throughout dependency networks, with the top-7 most frequent vulnerabilities accounting for 25% of vulnerability cases and the top-23 most frequent accounting for 50%. Based on these findings, we propose recommendations for developers and package managers to mitigate the threat and occurrence of vulnerabilities within the npm dependency network and the broader software repository community.
要約:
Fuzzing has become a widely adopted technique for vulnerability discovery, yet it remains ineffective for structured-input programs due to strict syntactic constraints and limited semantic awareness. Traditional greybox fuzzers rely on mutation-based strategies and coarse-grained coverage feedback, which often fail to generate valid inputs and explore deep execution paths. Recent advances in large language models (LLMs) have shown promise in improving input generation, but existing approaches primarily focus on seed generation and largely overlook the effective use of runtime feedback. In this paper, we propose SDLLMFuzz, a dynamic-static LLM-assisted greybox fuzzing framework for structured-input programs. Our approach integrates LLM-based structure-aware seed generation with static crash analysis, forming a unified feedback loop that iteratively refines test inputs. Specifically, we leverage LLMs to generate syntactically valid and semantically diverse inputs, while extracting rich semantic information from crash artifacts (e.g., core dumps and execution traces) to guide subsequent input generation. This dynamic-static feedback mechanism enables more efficient exploration of complex program behaviors. We evaluate SDLLMFuzz on the Magma benchmark across multiple structured-input programs, including libxml2, libpng, and libsndfile. Experimental results show that SDLLMFuzz significantly outperforms traditional greybox fuzzers and LLM-assisted baselines in terms of bug discovery and time-to-bug. These results demonstrate that combining semantic input generation with feedback-driven refinement is an effective direction for improving fuzzing performance on structured-input programs.
要約:
This paper presents a controlled quasi-experimental developer study examining whether a layer-based security training package is associated with improved security quality in LLM-assisted implementation of an identity-centric Java Spring Boot backend. The study uses a mixed design with a within-subject pre-training versus post-training comparison and an exploratory between-subject expertise factor. Twelve developers completed matched runs under a common interface, fixed model configuration, counterbalanced task sets, and a shared starter project. Security outcomes were assessed via independent manual validation of submitted repositories by the first and second authors. The primary participant-level endpoint was a severity-weighted validated-weakness score. The post-training condition showed a significant paired reduction under an exact Wilcoxon signed-rank test ($p = 0.0059$). In aggregate, validated weaknesses decreased from 162 to 111 (31.5\%), the severity-weighted burden decreased from 432 to 267 (38.2\%), and critical findings decreased from 24 to 5 (79.2\%). The largest reductions were in authorization and object access (53.3\%) and in authentication, credential policy, and recovery weaknesses (44.7\%). Session and browser trust-boundary issues showed minimal change, while sensitive-data and cryptographic weaknesses showed only marginal improvement.
These results suggest that, under the tested conditions, post-training runs reduce validated security burden in LLM-assisted backend development without modifying the model. They do not support replacing secure defaults, static analysis, expert review, or operational hardening.
要約:
Online propaganda detection pipelines expose measurable privacy risks at multiple stages including data collection, feature extraction, and model inference. We conduct a structured analysis of $162$ peer-reviewed studies and formalize the problem using the Propaganda Risk Online Mitigation and Privacy-preserving Tactics (PROMPT) framework. PROMPT models risks $R$ and mitigation strategies $S$ through a mapping $M: R\to S$ guided by a utility function $\alpha\cdot \mathrm{PrivacyGain}(s_j) - \beta\cdot \mathrm{PerfLoss}(s_j) - \gamma\cdot \mathrm{Cost}(s_j)$, with tunable $(\alpha,\beta,\gamma)$ enabling stakeholders to balance privacy, accuracy, and deployment costs. To assess practical adoption, we introduce a compliance score that quantifies the alignment of existing methods with GDPR, CCPA etc. requirements. Our evaluation shows that many widely used pipelines remain non-compliant, particularly in metadata handling and user-level aggregation. We further present empirical fine-tuning experiments on transformer-based encoders and decoders under synthetic perturbation, demonstrating a monotonic privacy-utility trade-off: with $q = 0.05$ performance decreased by 1-2% F$_1$, while at $q = 0.20$ the reduction reached 13-14%. These results establish quantitative baselines for privacy costs in propaganda detection. Our contributions include a formal risk-to-defense mapping, a compliance-oriented auditing metric, and experimental evidence of privacy-performance trade-offs, providing a technical foundation for building regulation-compliant and privacy-aware detection systems.
要約:
Code secrets are sensitive assets for software developers, and their leakage poses significant cybersecurity risks. While the rapid development of AI code assistants powered by Code Large Language Models (CLLMs), CLLMs are shown to inadvertently leak such secrets due to a notorious memorization phenomenon. This study first reveals that Byte-Pair Encoding (BPE) tokenization leads to unexpected behavior of secret memorization, which we term as \textit{gibberish bias}. Specifically, we identified that some secrets are among the easiest for CLLMs to memorize. These secrets yield high character-level entropy, but low token-level entropy. Then, this paper supports the biased claim with numerical data. We identified that the roots of the bias are the token distribution shift between the CLLM training data and the secret data. We further discuss how gibberish bias manifests under the ``larger vocabulary'' trend. To conclude the paper, we discuss potential mitigation strategies and the broader implications on current tokenizer design.
要約:
A nearest-neighbor framework is a fundamental tool for various applications involving Large Language Models (LLMs) and Visual Language Models (VLMs). Vectors used for nearest-neighbor searches have richer information for similarity searches. This information leads to security risks, such as embedding inversion and membership attacks. Therefore, Privacy-Preserving Approximate Nearest-Neighbor (PP-ANN) approaches are necessary for highly confidential data. However, conventional PP-ANN approaches based on a Trusted Execution Environment (TEE) or Fully Homomorphic Encryption (FHE) do not achieve practical security or performance. Additionally, conventional approaches focus on the search process rather than database generation for nearest-neighbor. To address these issues, we propose a Privacy-Preserving Product-Quantization Approximate Nearest Neighbor (PPPQ-ANN) framework. PPPQ-ANN provides a multi-layered security structure for vectors based on a hybrid of FHE and TEE. Additionally, PPPQ-ANN minimizes FHE ciphertext computations by combining Product-Quantization (PQ) with optimized data packing. We demonstrate the performance of PPPQ-ANN on million-scale datasets. As a result, PPPQ-ANN achieves database generation in less than 2 hours and more than 50 QPS in a sequential search while preserving privacy. Therefore, PPPQ-ANN optimizes the trade-off between security and performance by utilizing a hybrid of FHE and TEE, achieving practical performance while preserving privacy.
要約:
Software vulnerabilities remain one of the most persistent threats to modern digital infrastructure. While static application security testing (SAST) tools have long served as the first line of defense, they suffer from high false-positive rates. This article presents TitanCA, a collaborative project between Singapore Management University and GovTech Singapore that orchestrates multiple large language model (LLM)-powered agents into a unified vulnerability discovery pipeline. Applied in open-source software, TitanCA has discovered 203 confirmed zero-day vulnerabilities and yielded 118 CVEs. We describe the four-module architecture, i.e., matching, filtering, inspection, and adaptation, and share key lessons from building and deploying an LLM-based vulnerability discovery solution in practice.
要約:
Large Language Models (LLMs) have demonstrated remarkable capabilities across various cybersecurity tasks, including vulnerability classification, detection, and patching. However, their potential in automated vulnerability report documentation and analysis remains underexplored. We present RAVEN (Retrieval Augmented Vulnerability Exploration Network), a framework leveraging LLM agents and Retrieval Augmented Generation (RAG) to synthesize comprehensive vulnerability analysis reports. Given vulnerable source code, RAVEN generates reports following the Google Project Zero Root Cause Analysis template. The framework uses four modules: an Explorer agent for vulnerability identification, a RAG engine retrieving relevant knowledge from curated databases including Google Project Zero reports and CWE entries, an Analyst agent for impact and exploitation assessment, and a Reporter agent for structured report generation. To ensure quality, RAVEN includes a task specific LLM Judge evaluating reports across structural integrity, ground truth alignment, code reasoning quality, and remediation quality. We evaluate RAVEN on 105 vulnerable code samples covering 15 CWE types from the NIST-SARD dataset. Results show an average quality score of 54.21%, supporting the effectiveness of our approach for automated vulnerability documentation.
要約:
Intrusion detection systems (IDSs) for 5G networks must handle complex, high-volume traffic. Although opaque "black-box" models can achieve high accuracy, their lack of transparency hinders trust and effective operational response. We propose ExAI5G, a framework that prioritizes interpretability by integrating a Transformer-based deep learning IDS with logic-based explainable AI (XAI) techniques. The framework uses Integrated Gradients to attribute feature importance and extracts a surrogate decision tree to derive logical rules. We introduce a novel evaluation methodology for LLM-generated explanations, using a powerful evaluator LLM to assess actionability and measuring their semantic similarity and faithfulness. On a 5G IoT intrusion dataset, our system achieves 99.9\% accuracy and a 0.854 macro F1-score, demonstrating strong performance. More importantly, we extract 16 logical rules with 99.7\% fidelity, making the model's reasoning transparent. The evaluation demonstrates that modern LLMs can generate explanations that are both faithful and actionable, indicating that it is possible to build a trustworthy and effective IDS without compromising performance for the sake of marginal gains from an opaque model.
要約:
Anomaly-based Intrusion Detection Systems (IDSs) ensure protection against malicious attacks on networked systems. While deep learning-based IDSs achieve effective performance, their limited trustworthiness due to black-box architectures remains a critical constraint. Despite existing explainable techniques offering insight into the alarms raised by IDSs, they lack process-based explanations grounded in packet-level sequencing analysis. In this paper, we propose a method that employs process mining techniques to enhance anomaly-based IDSs by providing process-based alarm severity ratings and explanations for alerts. Our method prioritizes critical alerts and maintains visibility into network behavior, while minimizing disruption by allowing misclassified benign traffic to pass. We apply the method to the publicly available USB-IDS-TC dataset, which includes anomalous traffic affected by different variants of the Slowloris DoS attack. Results show that our method is able to discriminate between low- to very-high-severity alarms while preserving up to 99.94% recall and 99.99% precision, effectively discarding false positives while providing different degrees of severity for the true positives.
要約:
While attack graphs are useful for identifying major cybersecurity threats affecting a system, they do not provide operational support for determining the likelihood of having a known vulnerability exploited, or that critical system nodes are likely to be compromised. In this paper, we perform dynamic risk assessment by combining Bayesian Attack Graphs (BAGs) and online monitoring of system behavior through process mining. Specifically, the proposed approach applies process mining techniques to characterize malicious network traffic and derive evidence regarding the probability of having a vulnerability actively exploited. This evidence is then provided to a BAG, which updates its conditional probability tables accordingly, enabling dynamic assessment of vulnerability exploitation. We apply our method to a cybersecurity testbed instantiating several machines deployed on different subnets and affected by several CVE vulnerabilities. The testbed is stimulated with both benign traffic and malicious behavior, which simulates network attack patterns aimed at exploiting the CVE vulnerabilities. The results indicate that our proposal effectively detects whether vulnerabilities are being actively exploited, allowing for an updated assessment of the probability of system compromise.
要約:
Electronic voting systems must balance public verifiability with voter privacy and coercion resistance. Existing cryptographic protocols typically achieve end-to-end verifiability by revealing vote distributions, relying on trusted clients, or enabling transferable receipts - design choices that often compromise trust or privacy in real-world deployments.
We present ACE, a voting protocol that reconciles public auditability with strong privacy guarantees. The protocol combines a publicly verifiable, tally-hiding aggregation mechanism with an Audit-or-Cast challenge that enforces cast-as-intended even under untrusted client assumptions. Tallier-side re-randomization eliminates persistent links between voters and public records, yielding information-theoretic receipt-freeness assuming at least one honest tallier.
We formalize the security of ACE and show that it simultaneously achieves end-to-end verifiability, publicly tally-hiding results, and strong receipt-freeness without trusted clients.
要約:
Hosted-LLM providers have a silent-substitution incentive: advertise a stronger model while serving cheaper replies. Probe-after-return schemes such as SVIP leave a parallel-serve side-channel, since a dishonest provider can route the verifier's probe to the advertised model while serving ordinary users from a substitute. We propose a commit-open protocol that closes this gap. Before any opening request, the provider commits via a Merkle tree to a per-position sparse-autoencoder (SAE) feature-trace sketch of its served output at a published probe layer. A verifier opens random positions, scores them against a public named-circuit probe library calibrated with cross-backend noise, and decides with a fixed-threshold joint-consistency z-score rule. We instantiate the protocol on three backbones -- Qwen3-1.7B, Gemma-2-2B, and a 4.5x scale-up to Gemma-2-9B with a 131k-feature SAE. Of 17 attackers spanning same-family lifts, cross-family substitutes, and rank-<=128 adaptive LoRA, all are rejected at a shared, scale-stable threshold; the same attackers all evade a matched SVIP-style parallel-serve baseline. A white-box end-to-end attack that backpropagates through the frozen SAE encoder does not close the margin, and a feature-forgery attacker that never runs M_hon is bounded in closed form by an intrinsic-dimension argument. Commitment adds <=2.1% to forward-only wall-clock at batch 32.
要約:
Large Language Model (LLM) agents provide powerful automation capabilities, but they also create a substantially broader attack surface than traditional applications due to their tight integration with non-deterministic models and third-party services. While current deployments primarily rely on cloud-hosted services, emerging designs increasingly execute agents directly on edge devices to reduce latency and enhance user privacy. However, securely hosting such complex agent pipelines on edge devices remains challenging. These deployments must protect proprietary assets (e.g., system prompts and model weights) and sensitive runtime state on heterogeneous platforms that are vulnerable to software attacks and potentially controlled by malicious users.
To address these challenges, we present AgenTEE, a system for deploying confidential agent pipelines on edge devices. AgenTEE places the agent runtime, inference engine, and third-party applications into independently attested confidential virtual machines (cVMs) and mediates their interaction through explicit, verifiable communication channels. Built on Arm Confidential Compute Architecture (CCA), a recent extension to Arm platforms, AgenTEE enforces strong system-level isolation of sensitive assets and runtime state. Our evaluation shows that such multi-cVMs system is practical, achieving near-native performance with less than 5.15% runtime overhead compared to commodity OS multi-process deployments.
要約:
Current open-source prompt-injection detectors converge on two architectural choices: regular-expression pattern matching and fine-tuned transformer classifiers. Both share failure modes that recent work has made concrete. Regular expressions miss paraphrased attacks. Fine-tuned classifiers are vulnerable to adaptive adversaries: a 2025 NAACL Findings study reported that eight published indirect-injection defenses were bypassed with greater than fifty percent attack success rates under adaptive attacks. This work proposes seven detection techniques that each port a specific mechanism from a discipline outside large-language-model security: forensic linguistics, materials-science fatigue analysis, deception technology from network security, local-sequence alignment from bioinformatics, mechanism design from economics, spectral signal analysis from epidemiology, and taint tracking from compiler theory. Three of the seven techniques are implemented in the prompt-shield v0.4.1 release (Apache 2.0) and evaluated in a four-configuration ablation across six datasets including deepset/prompt-injections, NotInject, LLMail-Inject, AgentHarm, and AgentDojo. The local-alignment detector lifts F1 on deepset from 0.033 to 0.378 with zero additional false positives. The stylometric detector adds 11.1 percentage points of F1 on an indirect-injection benchmark. The fatigue tracker is validated via a probing-campaign integration test. All code, data, and reproduction scripts are released under Apache 2.0.
要約:
This paper investigates subcodes of lambda-Gabidulin codes, viewed as rank-metric analogues of generalized Reed--Solomon codes, and their applications to compact-ciphertext cryptosystems. We first analyze subspace and generalized subspace subcodes of lambda-Gabidulin codes and relate them to corresponding subcodes of classical Gabidulin codes through coordinate-wise scaling. This relation yields cardinality bounds and structural properties for these families. When the extension degree equals the code length, we further characterize Gabidulin subspace subcodes in terms of linearized polynomials, which gives an explicit description of their encoding and dimension. We also study the matrix images of these subcodes over the base field through their stabilizer and annihilator algebras, showing that subspace restrictions may preserve nontrivial algebraic invariants despite the loss of extension-field linearity. Motivated by these results, we propose a generator-matrix-based construction of random subcodes designed to avoid such invariants. This construction is then used to design McEliece-like and Niederreiter-like encryption schemes in the MinRank setting. Among the parameter sets considered in this work, the most compact ciphertexts are obtained from random subcodes of classical Gabidulin codes. At the 128-, 192-, and 256-bit security levels, the resulting $\mathsf{LGS}$-Niederreiter instances achieve the smallest ciphertext sizes among the compared schemes, while maintaining competitive public-key sizes.
要約:
State-of-the-art Differentially Private (DP) synthetic data generators such as MST and AIM are widely used, yet tightly auditing their privacy guarantees remains challenging. We introduce a Gaussian Differential Privacy (GDP)-based auditing framework that measures privacy via the full false-positive/false-negative tradeoff. Applied to MST and AIM under worst-case settings, our method provides the first tight audits in the strong-privacy regime. For $(\epsilon,\delta)=(1,10^{-2})$, we obtain $\mu_{emp}\approx0.43$ vs. implied $\mu=0.45$, showing a small theory-practice gap.
Our code is publicly available: https://github.com/sassoftware/dpmm.
要約:
Smart contracts extended blockchain functionality beyond simple transactions, powering complex applications like decentralized finance (DeFi). However, this complexity introduces serious security challenges, including price manipulation and inflation attacks. Despite the development of various security tools, the rapid rise in financially motivated exploits continues to pose a significant threat to the blockchain ecosystem. These financially motivated exploits often stem from Monetarily Exploitable Vulnerabilities (MEVuls), which refer to vulnerabilities arising from exploitable implementations in monetary transactions or value-transfer logic. Due to their complexity, intricate chains of function calls, multifaceted logic, and diverse manifestations across different smart contracts, MEVuls are particularly challenging for current security tools to identify. Instead of providing actionable insights, existing tools frequently generate excessive warnings that overwhelm developers without effectively mitigating risks. To address the challenge of recognizing MEVuls, we first formalize MEVuls based on common real-world financial exploits. Then, we introduce FAUDITOR, a specialized fuzzer designed to detect MEVuls in smart contracts. The key insight is that leveraging smart contracts' finance-related interfaces directly exposes critical vulnerabilities, making detection more targeted. We further integrate auditors' reports using NLP to extract valuable insights on exploitation patterns, enabling a more informed search strategy. Additionally, FAUDITOR employs a self-learning mechanism that refines its detection strategies over time, allowing it to improve based on prior fuzzing results. In our evaluation, FAUDITOR impressively reveals 220 zero-day MEVuls. Meanwhile, compared to existing fuzzers, FAUDITOR detects vulnerabilities faster and achieves better instruction coverage.
要約:
Open-weight language models can be rendered unsafe through several distinct interventions, but the resulting models may differ substantially in capabilities, behavioral profile, and internal failure mode. We study behavioral and mechanistic properties of jailbroken models across three unsafe routes: harmful supervised fine-tuning (SFT), harmful reinforcement learning with verifiable rewards (RLVR), and refusal-suppressing abliteration. All three routes achieve near-ceiling harmful compliance, but they diverge once we move beyond direct harmfulness. RLVR-jailbroken models show minimal degradation and preserve explicit harm recognition in a structured self-audit: they are able to identify harmful prompts and describe how a safe LLM should respond, yet they comply with the harmful request. With RLVR, harmful behavior is strongly suppressed by a reflective safety scaffold: when a harmful prompt is prepended with an instruction to reflect on safety standards, harmful behavior drops close to the baseline. Category-specific RLVR jailbreaks generalize broadly across harmfulness domains. Models jailbroken with SFT show the largest collapse in explicit safety judgments, the highest behavioral drift, and a substantial capability loss on standard benchmarks. Abliteration is family-dependent in both self-audit and response to a reflective safety scaffold. Mechanistic and repair analyses further separate the routes: abliteration is consistent with localized refusal-feature deletion, RLVR with preserved safety geometry but retargeted policy behavior, and SFT with broader distributed drift. Targeted repair partially recovers RLVR-jailbroken models, but has little effect on SFT-jailbroken models. Together, these results show that jailbreaks can produce vastly different properties despite similar harmfulness, with models jailbroken via RLVR showing remarkable similarity to the base model.
要約:
Privacy policies are intended to inform users about how software systems collect and handle data, yet they often remain vague or incomplete. This paper presents an empirical study of patterns in log-related statements within privacy policies and their alignment with privacy disclosures observed in Android application logs. We analyzed 1,000 Android apps across multiple categories, generating 86,836,964 log entries. Our findings reveal that while most applications (88.0%) provide privacy policies, only 28.5% explicitly mention logging practices. Among those that reference logging, most clearly describe what information is logged; however, 27.7% of log-related statements remain overly simplistic or vague, offering limited insight into actual data collection. We further observed widespread privacy leakages in application logs, with 67.6% of apps leaking sensitive information not mentioned in their policies. Alarmingly, only 4% of applications demonstrated consistent alignment between declared policy contents and actual logged data. These findings highlight that current privacy policies provide incomplete or ambiguous descriptions of logging practices, which frequently do not align with actual logging behaviors.
要約:
The proliferation of open-source software (OSS) has made software supply chains prime targets for attacks like Package Confusion, where adversaries publish malicious packages with names deceptively similar to legitimate ones. To protect against such attacks and safeguard the use of OSS, multiple confusion detection methods have been proposed. However, existing methods are limited to single-signal retrieval strategies (relying solely on lexical or semantic metrics), struggle with high false positive rates (FPR), and are vulnerable to adversarial evasion. Critically, as content-agnostic approaches, they fundamentally fail to distinguish benign packages with high naming similarity from malicious, code-dissimilar impersonations, leading to persistent high FPR. To address these limitations, we introduce AgentGuard, a novel multi-agents based framework for package confusion detection. Specifically, it first discovers potential confusion targets using fine-tuned word embedding models with hybrid similarity search. After that, It subsequently evaluates risk via a fused machine learning model that uniquely combines: (1) a multi-dimensional metadata group and (2) a novel package content analysis group, to reduce the FPR and mitigate the impact of adversarial evasion. To assess the effectiveness of AgentGuard, we evaluate it on challenging ConfuDB and NeupaneDB datasets. Our results demonstrate that AgentGuard significantly outperforms state-of-the-art baselines, ConfuGuard and Typomind, improving precision by 12\%-49\% while simultaneously reducing the FPR by 11\%-35\%, and effectively discovers the confused package.
要約:
This study investigates the applicability of authorship attribution based on stylistic features to support actor analysis in threat intelligence. As a foundational step toward future application to dark web forums, we conducted experiments using Japanese review data from clear web sources. We constructed datasets from Rakuten Ichiba reviews and compared four methods: TF-IDF with logistic regression (TF-IDF+LR), BERT embeddings with logistic regression (BERT-Emb+LR), BERT fine-tuning (BERT-FT), and metric learning with $k$-nearest neighbors (Metric+kNN). Results showed that BERT-FT achieved the best performance; however, training became unstable as the number of authors scaled to several hundred, where TF-IDF+LR proved superior in terms of accuracy, stability, and computational cost. Furthermore, Top-$k$ evaluation demonstrated the utility of candidate screening, and error analysis revealed that boilerplate text, topic dependency, and short text length were primary factors causing misclassification.
要約:
Rice's theorem states that no non-trivial semantic property of programs is decidable. Classical proofs proceed by reduction from the halting problem, invoking the law of excluded middle (LEM) twice: once through diagonalization, and once through a case split on whether the always-diverging program bot satisfies the property in question. We present a proof that is constructive relative to the undecidability of Hilbert's Tenth Problem (MRDP): valid in intuitionistic logic, requiring neither diagonalization nor self-reference, and adding no classical reasoning beyond the MRDP assumption itself.
The key idea is a two-witness construction. Given a non-trivial property P, we attach to each Diophantine polynomial D a pair of programs S^0_D, S^1_D that behave like the negative and positive witnesses for P when D is solvable, and both diverge identically when it is not. A hypothetical decider for P would therefore decide Diophantine solvability via the difference delta_D = DecideP(S^1_D) - DecideP(S^0_D) -- contradicting the MRDP theorem. The argument is structured as two separate implications, never asserting a disjunction about solvability, and never examining P(bot). The undecidability of the halting problem follows as an immediate corollary: a single application of Rice's theorem to the Terminates property.
A formalization in the Rocq proof assistant confirms both results within a step-indexed model of computation, with the undecidability of Hilbert's Tenth Problem as the sole external axiom. Both Rice_Theorem and Halting_Problem are closed under the global context.
要約:
The rapid proliferation of Multimodal Large Language Models (MLLMs) has enabled mobile agents to execute high-stakes financial transactions, but their adversarial robustness remains underexplored. We identify Visual Dominance Hallucination (VDH), where imperceptible visual cues can override textual price evidence in screenshot-based, price-constrained settings and lead agents to irrational decisions. We propose PriceBlind, a stealthy white-box adversarial attack framework for controlled screenshot-based evaluation. PriceBlind exploits the modality gap in CLIP-based encoders via a Semantic-Decoupling Loss that aligns the image embedding with low-cost, value-associated anchors while preserving pixel-level fidelity. On E-ShopBench, PriceBlind achieves around 80% ASR in white-box evaluation; under a simplified single-turn coordinate-selection protocol, Ensemble-DI-FGSM transfers with roughly 35-41% ASR across GPT-4o, Gemini-1.5-Pro, and Claude-3.5-Sonnet. We also show that robust encoders and Verify-then-Act defenses reduce ASR substantially, though with some clean-accuracy trade-off.
要約:
This paper proposes a novel privacy-preserving semantic segmentation method that can use independent keys for each client and image. In the proposed method, the model creator and each client encrypt images using locally generated keys, and model training and inference are conducted on the encrypted images. To mitigate performance degradation, an image encryption method is applied to model training in addition to the generation of test images. In experiments, the effectiveness of the proposed method is confirmed on the Cityscapes dataset under the use of a vision transformer-based model, called SETR.
要約:
Decentralized Autonomous Organizations (DAOs) are inclined explore Small Language Models (SLMs) as edge-native constitutional firewalls to vet proposals and mitigate semantic social engineering. While scaling inference-time compute (System 2) enhances formal logic, its efficacy in highly adversarial, cryptoeconomic governance environments remains underexplored. To address this, we introduce Sentinel-Bench, an 840-inference empirical framework executing a strict intra-model ablation on Qwen-3.5-9B. By toggling latent reasoning across frozen weights, we isolate the impact of inference-time compute against an adversarial Optimism DAO dataset. Our findings reveal a severe compute-accuracy inversion. The autoregressive baseline (System 1) achieved 100% adversarial robustness, 100% juridical consistency, and state finality in under 13 seconds. Conversely, System 2 reasoning introduced catastrophic instability, fundamentally driven by a 26.7% Reasoning Non-Convergence (cognitive collapse) rate. This collapse degraded trial-to-trial consensus stability to 72.6% and imposed a 17x latency overhead, introducing critical vulnerabilities to Governance Extractable Value (GEV) and hardware centralization. While rare (1.5% of adversarial trials), we empirically captured "Reasoning-Induced Sycophancy," where the model generated significantly longer internal monologues (averaging 25,750 characters) to rationalize failing the adversarial trap. We conclude that for edge-native SLMs operating under Byzantine Fault Tolerance (BFT) constraints, System 1 parameterized intuition is structurally and economically superior to System 2 iterative deliberation for decentralized consensus.
Code and Dataset: https://github.com/smarizvi110/sentinel-bench
要約:
Continuous glucose monitors (CGMs) used in diabetes care collect rich personal health data that could improve day-to-day self-management. However, current patient platforms only offer static summaries which do not support inquisitive user queries. Large language models (LLMs) could enable free-form inquiries about continuous glucose data, but deploying them over sensitive health records raises privacy and accuracy concerns. In this paper, we present CGM-Agent, a privacy-preserving framework for question answering over personal glucose data. In our design, the LLM serves purely as a reasoning engine that selects analytical functions. All computation occurs locally, and personal health data never leaves the user's device. For evaluation, we construct a benchmark of 4,180 questions combining parameterized question templates with real user queries and ground truth derived from deterministic program execution. Evaluating 6 leading LLMs, we find that top models achieve 94\% value accuracy on synthetic queries and 88\% on ambiguous real-world queries. Errors stem primarily from intent and temporal ambiguity rather than computational failures. Additionally, lightweight models achieve competitive performance in our agent design, suggesting opportunities for low-cost deployment. We release our code and benchmark to support future work on trustworthy health agents.
要約:
Users increasingly rely on consumer-facing generative AI (GenAI) for tasks ranging from everyday needs to sensitive use cases. Yet, it remains unclear whether and how existing security and privacy (S&P) communications in GenAI tools shape users' adoption decisions and subsequent experiences. Understanding how users seek, interpret, and evaluate S&P information is critical for designing usable transparency that users can trust and act on. We conducted semi-structured interviews and design sessions with 21 U.S. GenAI users. We find that available S&P information rarely drove initial adoption in practice, as participants often perceived it as incomplete, ineffective, or lacking credibility. Instead, they relied on rough proxies, such as popularity, to infer S&P practices. After adoption, uncertainty about S&P practices constrained participants' willingness to use GenAI tools, particularly in high-stakes contexts, and, in some cases, contributed to discontinued use. Participants therefore called for transparency that supports decision-making and use, including trustworthy information (e.g., independent evaluations) and usable interfaces (e.g., on-demand disclosure). We synthesize participants' desired design practices into five dimensions to facilitate systematic future investigation into best practices. We conclude with recommendations for researchers, designers, and policymakers to improve S&P transparency in consumer-facing GenAI.
要約:
Monotone Boolean functions are a structurally important class of Boolean functions, but their restricted form imposes strong limitations on achievable nonlinearity. In this paper, we investigate whether evolutionary computation can evolve monotone Boolean functions with high nonlinearity, both in the balanced and imbalanced settings. We consider three solution encodings: the standard truth table representation, a balanced truth table encoding that preserves Hamming weight, and a symbolic tree-based genetic programming representation. To guide the search toward monotone increasing functions, we introduce a non-monotonicity penalty and combine it with fitness functions targeting balancedness and nonlinearity. Experimental results are reported for dimensions from $n=5$ to $n=14$. The results show that evolutionary search can discover monotone Boolean functions with nonlinearities clearly exceeding those of majority functions, and in several cases approaching the best currently known values for monotone functions. At the same time, the experiments reveal substantial differences between encodings: the balanced truth table encoding performs poorly for larger dimensions, while the standard truth table and genetic programming encodings remain competitive, with genetic programming becoming especially relevant in the largest tested dimensions.
要約:
Small modular nuclear reactors (SMRs) are redefining the energy generation landscape by enabling the deployment of modular, scalable, and pre-built power units that can be used to build distributed autonomous microgrids for critical infrastructure and burgeoning AI factories. Often, these microgrids are linked together to provide a resilient, decentralized power generation infrastructure. Consequently, the cybersecurity of microgrids is of critical importance. In this work, we propose a quantum augmented network framework for resilient microgrids. We integrate the ideas of secure quantum networking, quantum anonymous notification, and quantum random number generation to strengthen the integrity, confidentiality, and privacy of microgrid networks. To substantiate the possible benefits of using quantum augmented microgrids, we simulate a practical high-impact classical attack: a traffic analysis and priority-action spoofing campaign that can (1) deanonymize the anonymous notification for a high-priority action, (2) force excessive key usage, and (3) induce harmful allow/block operations at the control level. We quantify how these attacks affect information leakage, spoof acceptance, key sufficiency, and operational outcomes such as latency, deadline misses, unserved energy, etc. This quantum augmented microgrid (QuAM) framework lets us evaluate trade-offs between privacy, availability, and the operational cost of mitigation (cover traffic, verification delays, and key-rotation policies), further paving the path for the study of more nuanced attacks that arise due to the use of quantum-classical integrated frameworks.
要約:
Autonomous systems increasingly execute actions that directly modify shared state, creating an urgent need for precise control over which transitions are permitted to occur. Existing governance mechanisms evaluate policies prior to execution or reconstruct behavior post hoc, but do not enforce admissibility at the exact moment a state transition is committed. We introduce the atomic decision boundary, a structural property of admission control systems in which the decision and the resulting state transition are jointly determined as a single indivisible step. Formalizing execution as a labeled transition system (LTS), we distinguish two classes: atomic systems, where evaluation and transition are coupled within a single LTS step, and split evaluation systems, where they are separate transitions that may be interleaved by environmental actions. Under realistic concurrent environments, we prove that no construction can make a split system equivalent to an atomic system with respect to admissibility under all execution traces. This limitation is structural, not a matter of policy expressiveness or state availability. We further formalize the Escalate outcome -- absent from classical TOCTOU analyses -- and show its resolution is itself subject to the atomic boundary requirement. We map RBAC and OPA to the split model and contrast them with atomic systems. Admissibility is a property of execution, not evaluation. This paper is the formal foundation of a 4-paper Agent Governance Series: ACP/Paper 1 (arXiv:2603.18829), IML/Paper 2 (10.5281/zenodo.19643761), Fair Allocation/Paper 3 (10.5281/zenodo.19643928), Irreducibility/Paper 4 (10.5281/zenodo.19643950).
要約:
Autonomous agent systems are governed by enforcement mechanisms that flag hard constraint violations at runtime. The Agent Control Protocol identifies a structural limit of such systems: a correctly-functioning enforcement engine can enter a regime in which behavioral drift is invisible to it, because the enforcement signal operates below the layer where deviation is measurable. We show that enforcement-based governance is structurally unable to determine whether an agent's behavior remains within the admissible behavior space A0 established at admission time. Our central result, the Non-Identifiability Theorem, proves that A0 is not in the sigma-algebra generated by the enforcement signal g under the Local Observability Assumption, which every practical enforcement system satisfies. The impossibility arises from a fundamental mismatch: g evaluates actions locally against a point-wise rule set, while A0 encodes global, trajectory-level behavioral properties set at admission time. We define the Invariant Measurement Layer (IML), which bypasses this limitation by retaining direct access to the generative model of A0. We prove an information-theoretic impossibility for enforcement-based monitoring; separately, we show IML detects admission-time drift with provably finite detection delay, operating in the region where enforcement is structurally blind. Validated across four settings: three drift scenarios (300 and 1000 steps), a live n8n webhook pipeline, and a LangGraph StateGraph agent -- enforcement triggers zero violations while IML detects each drift type within 9-258 steps. Paper 2 of a 4-paper Agent Governance Series: atomic boundaries (P0, 10.5281/zenodo.19642166), ACP enforcement (P1, arXiv:2603.18829), fair allocation (P3, 10.5281/zenodo.19643928), irreducibility (P4, 10.5281/zenodo.19643950).
要約:
(1)Cross-border data transfers have become a matter of daily occurrence against the backdrop of the development of cloud computing and artificial intelligence. Consequently, where a data leak gives rise to civil liability, the determination of that liability inevitably assumes an international dimension involving foreign elements. (2)As is starkly demonstrated by secret sharing technology in cloud computing, fragments of data may be presumed to be distributed across multiple jurisdictions on a global scale. This renders traditional private international law measures -- predicated on the identification of a physical location -- inadequate for the purposes of determining the applicable law, a difficulty that is particularly acute in relation to non-contractual obligations. (3)Bearing in mind the typical scenario encountered in practice -- in which a Data Subject brings a claim for damages against a SaaS (Software as a Service) provider, which in turn seeks recourse against an IaaS (Infrastructure as a Service) or PaaS (Platform as a Service) provider -- a characteristic feature of such cases is the concurrence of contractual and non-contractual obligations. Taking this feature into account, it is possible to determine the applicable law governing non-contractual obligations through party autonomy -- by aligning it with the law governing the contractual obligation as selected by the parties, an approach that may be termed private ordering. This serves to overcome the difficulties associated with the identification of a physical location and, at the same time, contributes to ensuring the foreseeability of the parties.
要約:
Zero-knowledge proof (ZKP) provers remain costly because multi-scalar multiplication (MSM) and number-theoretic transforms (NTTs) dominate runtime as they need significant computation. AI ASICs such as TPUs provide massive matrix throughput and SotA energy efficiency. We present MORPH, the first framework that reformulates ZKP kernels to match AI-ASIC execution. We introduce Big-T complexity, a hardware-aware complexity model that exposes heterogeneous bottlenecks and layout-transformation costs ignored by Big-O. Guided by this analysis, (1) at arithmetic level, MORPH develops an MXU-centric extended-RNS lazy reduction that converts high-precision modular arithmetic into dense low-precision GEMMs, eliminating all carry chains, and (2) at dataflow level, MORPH constructs a unified-sharding layout-stationary TPU Pippenger MSM and optimized 3/5-step NTT that avoid on-TPU shuffles to minimize costly memory reorganization. Implemented in JAX, MORPH enables TPUv6e8 to achieve up-to 10x higher throughput on NTT and comparable throughput on MSM than GZKP. Our code: https://github.com/EfficientPPML/MORPH.
要約:
To ensure programs do not leak private data, we often want to be able to provide formal guarantees ensuring such data is handled correctly. Often, we cannot keep such data secret entirely; instead programmers specify how private data may be declassified. While security definitions for declassification exist, they mostly do not handle higher-order programs. In fact, in the higher-order setting no compositional security definition exists for intensional information-flow properties such as where declassification, which allows declassification in specific parts of a program. We use logical relations to build a model (and thus security definition) of where declassification. The key insight required for our model is that we must stop enforcing indistinguishability once a \emph{relevant declassification} has occurred. We show that the resulting security definition provides more security than the most related previous definition, which is for the lower-order setting. This paper is an extended version of the paper of the same name published at OOPSLA 2023 ([21]).
要約:
We discuss a class of cellular automata (CA) able to produce long random strings, starting from short "seed" strings. The approach uses two principles borrowed from cryptography: diffusion and confusion. We show numerically that the strings are pseudo-random using three approaches based on: Fourier transform, entropy estimation, and compression. An application to cryptography is also included with the corresponding Python code.
要約:
With the rapid advancement of machine learning (ML), ML-based Android malware detection has gained significant popularity due to its ability to automatically learn malicious patterns from Android apps. However, the lack of an in-depth and systematic analysis of existing research makes it difficult to obtain a holistic understanding of the state of the art in this field. In this work, we present the most comprehensive investigation to date of ML-based Android malware detection systems, combining both empirical and quantitative analyses. We first organize prior work into a unified taxonomy based on Android app representations and the ML modeling pipeline. Building on this taxonomy, we design a general-purpose framework for ML-based Android malware detection and re-implement 12 representative approaches from three research communities -- software engineering, security, and machine learning. Using this framework, we conduct a large-scale evaluation across three key dimensions: detection effectiveness, robustness to real-world challenges, and efficiency. Despite extensive research efforts and encouraging results, our findings reveal that existing learning-based Android malware detectors still face significant challenges, including vulnerability to malware evolution and susceptibility to adversarial attacks. We attribute these limitations to the detectors' ability to capture and leverage malware semantics, defined as semantic information that characterizes malicious behaviors derived from APK features. Finally, we summarize our key insights and provide actionable recommendations to guide future research in this domain.
要約:
As the right to be forgotten has been legislated worldwide, many studies attempt to design unlearning mechanisms to protect users' privacy when they want to leave machine learning service platforms. Specifically, machine unlearning is to make a trained model to remove the contribution of an erased subset of the training dataset. This survey aims to systematically classify a wide range of machine unlearning and discuss their differences, connections and open problems. We categorize current unlearning methods into four scenarios: centralized unlearning, distributed and irregular data unlearning, unlearning verification, and privacy and security issues in unlearning. Since centralized unlearning is the primary domain, we use two parts to introduce: firstly, we classify centralized unlearning into exact unlearning and approximate unlearning; secondly, we offer a detailed introduction to the techniques of these methods. Besides the centralized unlearning, we notice some studies about distributed and irregular data unlearning and introduce federated unlearning and graph unlearning as the two representative directions. After introducing unlearning methods, we review studies about unlearning verification. Moreover, we consider the privacy and security issues essential in machine unlearning and organize the latest related literature. Finally, we discuss the challenges of various unlearning scenarios and address the potential research directions.
要約:
Large language models (LLMs) have revolutionized various applications, making robust safety alignment essential to prevent harmful outputs. Current safety alignment techniques, however, harbor inherent vulnerabilities due to their reliance on logit suppression. In this work, we identify critical logit-level vulnerabilities by introducing Semantic-sensitive Alignment and Generation (SSAG), a method designed to systematically manipulate output-layer logits without altering model parameters. Experiments on five popular LLMs show that SSAG exposes harmful responses with a 95% success rate while reducing response time by 86%. VulMine also demonstrates superior attack efficacy, achieving an average ASR of up to 77% against strong defensive mechanisms. These findings reveal crucial weaknesses in existing alignment methods, highlighting an urgent need for improved vulnerability detection and robust safety alignment strategies. Our code is available on github.
要約:
Between 2021 and 2023, crypto assets valued at over \$US2.6 billion were stolen via attacks on "bridges" -- decentralized services designed to allow inter-blockchain exchange. While the individual exploits in each attack vary, a single design flaw underlies them all: the lack of end-to-end value accounting in cross-chain transactions. In this paper, we empirically analyze 10 million transactions used by key bridges during this period. We show that a simple invariant that balances cross-chain inflows and outflows is compatible with legitimate use, yet precisely identifies every known attack (and several likely attacks) in this data. Further, we show that this approach is not only sufficient for post-hoc audits, but can be implemented in-line in existing bridge designs to provide generic protection against a broad array of bridge vulnerabilities.
要約:
The U.S. Decennial Census serves as the foundation for many high-profile policy decision-making processes, including federal funding allocation and redistricting. In 2020, the Census Bureau adopted differential privacy to protect the confidentiality of individual responses through a disclosure avoidance system that injects noise into census data tabulations. The Bureau subsequently posed an open question: Could stronger privacy guarantees be obtained for the 2020 U.S. Census compared to their published guarantees, or equivalently, had the privacy budgets been fully utilized?
In this paper, we address this question affirmatively by demonstrating that the 2020 U.S. Census provides significantly stronger privacy protections than its nominal guarantees suggest at each of the eight geographical levels, from the national level down to the block level. This finding is enabled by our precise tracking of privacy losses using $f$-differential privacy, applied to the composition of private queries across these geographical levels. Our analysis reveals that the Census Bureau introduced unnecessarily high levels of noise to meet the specified privacy guarantees for the 2020 Census. Consequently, we show that noise variances could be reduced by $15.08\%$ to $24.82\%$ while maintaining nearly the same level of privacy protection for each geographical level, thereby improving the accuracy of privatized census statistics. We empirically demonstrate that reducing noise injection into census statistics mitigates distortion caused by privacy constraints in downstream applications of private census data, illustrated through a study examining the relationship between earnings and education.
要約:
We describe a protocol for creating, updating, and transferring digital assets securely, with strong privacy and self-custody features for the initial owner based upon the earlier work of Goodell, Toliver, and Nakib. The architecture comprises three components: a mechanism to unlink counterparties in the transaction channel, a mechanism for oblivious transactions, and a mechanism to prevent service providers from equivocating. We present an approach for the implementation of these components.
要約:
AI coding assistants are widely used for tasks like code generation. These tools now require large and complex contexts, automatically sourced from various origins$\unicode{x2014}$across files, projects, and contributors$\unicode{x2014}$forming part of the prompt fed to underlying LLMs. This automatic context-gathering introduces new vulnerabilities, allowing attackers to subtly poison input to compromise the assistant's outputs, potentially generating vulnerable code or introducing critical errors. We propose a novel attack, Cross-Origin Context Poisoning (XOXO), that is challenging to detect as it relies on adversarial code modifications that are semantically equivalent. Traditional program analysis techniques struggle to identify these perturbations since the semantics of the code remains correct, making it appear legitimate. This allows attackers to manipulate coding assistants into producing incorrect outputs, while shifting the blame to the victim developer. We introduce a novel, task-agnostic, black-box attack algorithm GCGS that systematically searches the transformation space using a Cayley Graph, achieving a 75.72% attack success rate on average across five tasks and eleven models, including GPT 4.1 and Claude 3.5 Sonnet v2 used by popular AI coding assistants. Furthermore, defenses like adversarial fine-tuning are ineffective against our attack, underscoring the need for new security measures in LLM-powered coding tools.
要約:
Whilst many key exchange and digital signature methods use the NIST P256 (secp256r1) and secp256k1 curves, there is often a demand for increased security. With these curves, we have a 128-bit security. These security levels can be increased to 256-bit security with NIST P-521 Curve 448 and Brainpool-P512. This paper outlines a new curve - Eccfrog512ck2 - and which provides 256-bit security and enhanced performance over NIST P-521. Along with this, it has side-channel resistance and is designed to avoid weaknesses such as related to the MOV attack. It shows that Eccfrog512ck2 can have a 61.5% speed-up on scalar multiplication and a 33.3% speed-up on point generation over the NIST P-521 curve.
要約:
The rise of QR code-based phishing ("Quishing") poses a growing cybersecurity threat, as attackers increasingly exploit QR codes to bypass traditional phishing defenses. Existing detection methods predominantly focus on URL analysis, which requires the extraction of the QR code payload, and may inadvertently expose users to malicious content. Moreover, QR codes can encode various types of data beyond URLs, such as Wi-Fi credentials and payment information, making URL-based detection insufficient for broader security concerns. To address these gaps, we propose the first framework for quishing detection that directly analyzes QR code structure and pixel patterns without extracting the embedded content. We generated a dataset of phishing and benign QR codes and we used it to train and evaluate multiple machine learning models, including Logistic Regression, Decision Trees, Random Forest, Na\"ive Bayes, LightGBM, and XGBoost. Our best-performing model (XGBoost) achieves an AUC of 0.9106, demonstrating the feasibility of QR-centric detection. Through feature importance analysis, we identify key visual patterns correlated with phishing labels and refine our feature set by removing non-informative pixels, improving performance to an AUC of 0.9133 with a reduced feature space. Our findings reveal that the structural features of QR code correlate strongly with phishing risk. This work establishes a foundation for quishing mitigation and highlights the potential of direct QR analysis as a critical layer in modern phishing defenses.
要約:
Large Language Models (LLMs) have achieved tremendous success in various tasks, yet concerns about their safety and security have emerged. In particular, they pose risks of generating harmful content and are vulnerable to jailbreaking attacks, creating unaddressed security issues regarding their deployments. In the context of software engineering for artificial intelligence (SE4AI) techniques, model-based analysis has demonstrated notable potential for analyzing and monitoring machine learning models, particularly in stateful deep neural networks. However, it suffers from scalability issues when extended to LLMs due to their vast feature spaces. In this paper, we aim to address the scalability issue of model-based analysis techniques for safeguarding LLM-scale models. Motivated by the recent discovery of low-dimensional safety-critical representations that emerged in LLMs, we propose ReGA, a model-based analysis framework with Representation-Guided Abstraction, to safeguard LLMs against harmful prompts and generations. By leveraging safety-critical representations, which are key directions in hidden states that indicate safety-related concepts, ReGA effectively narrows the scalability gap when developing the abstract model for safety modeling. Our comprehensive evaluation shows that ReGA performs sufficiently well in distinguishing between safe and harmful inputs, achieving an AUROC of 0.975 at the prompt level and 0.985 at the conversation level. Additionally, ReGA exhibits robustness to real-world attacks and generalization across different safety perspectives, outperforming existing safeguard paradigms in terms of interpretability and scalability. Overall, ReGA serves as an efficient and scalable solution to enhance LLM safety by integrating representation engineering with model-based abstraction, paving the way for new paradigms to utilize software insights for AI safety.
要約:
Provenance graph analysis plays a vital role in intrusion detection, particularly against Advanced Persistent Threats (APTs), by exposing complex attack patterns. While recent systems combine graph neural networks (GNNs) with natural language processing (NLP) to capture structural and semantic features, their effectiveness is limited by class imbalance in real-world data. To address this, we introduce PROVSYN, a novel hybrid provenance graph synthesis framework, which comprises three components: (1) graph structure synthesis via heterogeneous graph generation models, (2) textual attribute synthesis via fine-tuned Large Language Models (LLMs), and (3) five-dimensional fidelity evaluation. Experiments on six benchmark datasets demonstrate that PROVSYN consistently produces higher-fidelity graphs across the five evaluation dimensions compared to four strong baselines. To further demonstrate the practical utility of PROVSYN, we utilize the synthesized graphs to augment training datasets for downstream APT detection models. The results show that PROVSYN effectively mitigates data imbalance, improving normalized entropy by up to 35%, and enhances the generalizability of downstream detection models, achieving an accuracy improvement of up to 38%.
要約:
We propose the notion of succinct oblivious tensor evaluation (OTE), where two parties compute an additive secret sharing of a tensor product of two vectors $\mathbf{x} \otimes \mathbf{y}$, exchanging two simultaneous messages. Crucially, the size of both messages and of the CRS is independent of the dimension of $\mathbf{x}$.
We present a construction of OTE with optimal complexity from the standard learning with errors (LWE) problem. Then we show how this new technical tool enables a host of cryptographic primitives, all with security reducible to LWE, such as:
* Adaptively secure laconic function evaluation for depth-$D$ functions $f:\{0, 1\}^m\rightarrow\{0, 1\}^\ell$ with communication $m+\ell+D\cdot \mathrm{poly}(\lambda)$.
* A trapdoor hash function for all functions.
* An (optimally) succinct homomorphic secret sharing for all functions.
* A rate-$1/2$ laconic oblivious transfer for batch messages, which is best possible.
In particular, we obtain the first laconic function evaluation scheme that is adaptively secure from the standard LWE assumption, improving upon Quach, Wee, and Wichs (FOCS 2018).
As a key technical ingredient, we introduce a new notion of \emph{adaptive lattice encodings}, which may be of independent interest.
要約:
Trusted Execution Environments (TEEs) have become a cornerstone of confidential computing, attracting significant attention from academia and industry. To support secure and scalable application deployment on confidential clouds, TEE containers (Tcons) have been introduced as middleware to shield applications from malicious operating systems and orchestration layers while preserving usability. In this paper, we present the first comprehensive analysis of Tcons, focusing on three critical layers: OS interfaces, encrypted I/O, and orchestration mechanisms. To enable systematic evaluation, we design TBouncer, an automated analyzer that precisely exercises and benchmarks Tcon isolation boundaries. Our study uncovers fundamental flaws in existing Tcons, leading to exploitable vulnerabilities such as code execution, denial-of-service, and information leakage. In total, we identify six attack vectors, twelve new bugs, and three CVEs. These findings provide new insights into the underestimated attack surface of Tcons and highlight key directions for building more secure and trustworthy container solutions.
要約:
Whilst many key exchange and digital signature systems still rely on NIST P-256 (secp256r1) and secp256k1, offering around 128-bit security, there is an increasing demand for transparent and reproducible curves at the 256-bit security level. Standard higher-security options include NIST P-521, Curve448, and Brainpool-P512. This paper presents ECCFROG522PP ("Presunto Powered"), a 522-bit prime-field elliptic curve that delivers security in the same classical approx 260-bit ballpark as NIST P-521, but with a fundamentally different design philosophy. All of the curve parameters are deterministically derived from a fixed public seed via BLAKE3, with zero hidden choices. The curve has prime order (cofactor = 1), a verified twist with a proven approx 505-bit prime factor, safe embedding degree (greater than or equal to 14), and passes anti-MOV checks up to k less than or equal to 200 and CM discriminant sanity up to 100k. Unlike prior opaque or ad-hoc constructions, ECCFROG522PP is fully reproducible: anyone can regenerate and verify it byte-for-byte using the published scripts. The intent is not to outperform NIST P-521 in raw speed, but to maximise trust, verifiability, and long-term auditability in a practical curve of equivalent security level
要約:
Large language models(LLMs) are increasingly integrated with external systems through the Model Context Protocol(MCP),which standardizes tool invocation and has rapidly become a backbone for LLM-powered applications.While this paradigm enhances functionality,it also introduces a fundamental security shift:LLMs transition from passive information processors to autonomous orchestrators of task-oriented toolchains,expanding the attack surface,elevating adversarial goals from manipulating single outputs to hijacking entire execution flows.In this paper,we identify and characterize a systematic privacy-leakage attack pattern,termed Parasitic Toolchain Attacks,instantiated as MCP Unintended Privacy Disclosure(MCP-UPD).These attacks require no direct victim interaction;instead,adversaries embed malicious instructions into external data sources that LLMs access during legitimate tasks.Unlike traditional prompt injection and tool poisoning attacks,our attack targets the interconnected toolchain itself,assembling multiple legitimate tools into a coordinated workflow whose combined behavior accomplishes malicious objectives.In MCP-UPD,the malicious logic infiltrates the toolchain and unfolds in three phases:Parasitic Ingestion,Privacy Collection,and Privacy Disclosure,culminating in stealthy exfiltration of private data.Our root cause analysis reveals that MCP lacks both context-tool isolation and least-privilege enforcement,enabling adversarial instructions to propagate unchecked into sensitive tool invocations.To assess the severity,we design MCP-SEC and conduct the first large-scale security census of the MCP ecosystem,analyzing 12,230 tools across 1,360 servers.Our findings show that the MCP ecosystem is rife with real-world exploitable gadgets and diverse attack methods,underscoring systemic risks in MCP platforms and the urgent need for defense mechanisms in LLM-integrated environments.
要約:
Solving systems of Boolean equations is a fundamental task in symbolic computation and algebraic cryptanalysis, with wide-ranging applications in cryptography, coding theory, and formal verification. Among existing approaches, the Boolean Characteristic Set (BCS) method[1] has emerged as one of the most efficient algorithms for tackling such problems. However, its performance is highly sensitive to the ordering of variables, with solving times varying drastically under different orderings for fixed variable counts n and equations size m. To address this challenge, this paper introduces a novel optimization framework that synergistically integrates machine learning (ML)-based time prediction with simulated annealing (SA) to efficiently identify high-performance variables orderings. Weconstruct a dataset comprising variable frequency spectrum X and corresponding BCS solving time t for benchmark systems(e.g., n = m = 28). Utilizing this data, we train an accurate ML predictor ft(X) to estimate solving time for any given variables ordering. For each target system, ft serves as the cost function within an SA algorithm, enabling rapid discovery of low-latency orderings that significantly expedite subsequent BCS execution. Extensive experiments demonstrate that our method substantially outperforms the standard BCS algorithm[1], Gr\"obner basis method [2] and SAT solver[3], particularly for larger-scale systems(e.g., n = 32). Furthermore, we derive probabilistic time complexity bounds for the overall algorithm using stochastic process theory, establishing a quantitative relationship between predictor accuracy and expected solving complexity. This work provides both a practical acceleration tool for algebraic cryptanalysis and a theoretical foundation for ML-enhanced combinatorial optimization in symbolic computation.
要約:
Memory safety bugs, such as buffer overflows and use-after-frees, are the leading causes of software safety issues in production. Software-based approaches, e.g., Address Sanitizer (ASAN), can detect such bugs with high precision, but with prohibitively high overhead. ARM's Memory Tagging Extension (MTE) offers a promising alternative to detect these bugs in hardware with a much lower overhead. In this paper, we perform a thorough investigation of the first production implementation of ARM MTE (Google Pixel 8) and observe that MTE can only achieve coarse precision in bug detection compared with software-based approaches such as ASAN, mainly due to its 16-byte tag granularity. To address this issue, we present NANOTAG, a system to probabilistically detect buffer overflows at byte granularity in unmodified MTE-enabled binaries with minimal changes to memory allocators, introducing an explicit detection-performance tradeoff for in-house testing. NANOTAG detects buffer overflows at byte granularity by setting up a tripwire for tag granules that may require intra-granule overflow detection. The memory access to the tripwire causes additional overflow detection in the software while using MTE's hardware to detect bugs for the rest of the accesses. We implement NANOTAG based on the Scudo Hardened Allocator, the default memory allocator on Android since Android 11. Our evaluation results across popular benchmarks and real-world case studies show that NANOTAG detects nearly as many memory safety bugs as ASAN while incurring similar run-time overhead to Scudo Hardened Allocator in MTE SYNC mode.
要約:
With advances in wireless communication and growing spectrum scarcity, Spectrum Access Systems (SASs) offer an opportunistic solution but face significant security challenges. Regulations require disclosure of location coordinates and transmission details, exposing user privacy and anonymity during spectrum queries, while the database operations themselves permit Denial-of-Service (DoS) attacks. As location-based services, SAS is also vulnerable to compromised or malicious users conducting spoofing attacks. These threats are further amplified given the advances in quantum computing. Thus, we propose QPADL, the first post-quantum (PQ) secure framework that simultaneously ensures privacy, anonymity, location verification, and DoS resilience while maintaining efficiency for large-scale spectrum access systems. QPADL introduces SAS-tailored private information retrieval for location privacy, a PQ-variant of Tor for anonymity, and employs advanced signature constructions for location verification alongside client puzzle protocols and rate-limiting technique for DoS defense. We formally assess its security and conduct a comprehensive performance evaluation, incorporating GPU parallelization and optimization strategies to demonstrate practicality and scalability.
要約:
The proliferation of open-source code and large language models (LLMs) for code generation has amplified the risks of unauthorized reuse and intellectual property infringement. Source code watermarking offers a potential solution, yet existing methods typically encode watermarks through identifiers, local code patterns, or limited handcrafted edits, leaving them vulnerable to renaming, refactoring, and adaptive watermark removal. These limitations hinder the joint achievement of robustness, capacity, generalization, and deployment efficiency. We propose CLASP, a Code LLM-Assisted Semantic-Preserving watermarking framework that enables training-free, plug-and-play watermarking for source code. CLASP embeds watermark bits within a fixed space of semantics-preserving transformations, enabling automated watermark insertion with higher capacity while remaining reusable across programming languages and less dependent on brittle lexical features. To recover the watermark, CLASP uses reference-code retrieval and differential comparison to identify transformation traces, avoiding task-specific model training while improving robustness to structural edits and adaptive attacks. Experiments across multiple programming languages show that CLASP consistently outperforms existing baselines in watermark extraction accuracy and robustness, while maintaining code quality under both random removal and adaptive de-watermarking attacks.
要約:
When users submit queries to Large Language Models (LLMs), their prompts can often contain sensitive data, forcing a difficult choice: Send the query to a powerful proprietary LLM providers to achieving state-of-the-art performance and risk data exposure, or relying on smaller, local models guarantees data privacy but often results in a degradation of task performance. Prior approaches have relied on static pipelines that use LLM rewriting, which shatters linguistic coherence and indiscriminately removes privacy-sensitive information, including task-critical content. We reformulate this challenge (Privacy-Conscious Delegation) as a sequential decision-making problem and introduce a novel reinforcement learning (RL) framework called Privacy-R1 to solve it. Our framework trains an agent to dynamically route text chunks, learning a policy that optimally balances the trade-off between privacy leakage and task performance. It implicitly distinguishes between replaceable Personally Identifiable Information (PII) (which it shields locally) and task-critical PII (which it strategically sends to the remote model for maximal utility). To validate our approach in complex scenarios, we also introduce a new medical dataset with high PII density. Our framework achieves a new state-of-the-art on the privacy-utility frontier, demonstrating the necessity of learned, adaptive policies for deploying LLMs in sensitive environments. Dataset can be found at: https://github.com/zackhuiiiii/Privacy-R1.
要約:
Evaluating the usefulness of data before purchase is essential when obtaining data for high-quality machine learning models, yet both model builders and data providers are often unwilling to reveal their proprietary assets.
We present PrivaDE, a privacy-preserving protocol that allows a model owner and a data owner to jointly compute a utility score for a candidate dataset without fully exposing model parameters, raw features, or labels. PrivaDE provides strong security against malicious behavior and can be integrated into blockchain-based marketplaces, where smart contracts enforce fair execution and payment. To make the protocol practical, we propose optimizations to enable efficient secure model inference, and a model-agnostic scoring method that uses only a small, representative subset of the data while still reflecting its impact on downstream training. Evaluation shows that PrivaDE performs data evaluation effectively, achieving online runtimes within 15 minutes even for models with millions of parameters.
Our work lays the foundation for fair and automated data marketplaces in decentralized machine learning ecosystems.
要約:
The ever increasing complexity of hardware platforms poses a challenge to systems programmers. Correctly programming a multitude of components, providing functionality and security, is difficult: semantics of individual units are described in prose, underspecified, and prone to inaccuracies. Rigorous statements about platform security are often impossible.
We introduce a domain-specific language to describe hardware semantics, assumptions about software behavior, and desired security properties. We then create machine-readable specifications for a diverse set of eight platforms from their reference manuals, and formally prove their (in-)security. In addition to security proofs about memory confidentiality and integrity, we discover a handful of documentation errors. Finally, our analysis also revealed a vulnerability on a real-world server chip, which was confirmed by the vendor to apply to a wide family of deployed network appliances. Our tooling offers system integrators a way of formally describing security properties for whole platforms, and the means to find counterexamples, or proving them correct.
要約:
Despite extensive safety alignment, Large Language Models (LLMs) remain vulnerable to jailbreak attacks. However, existing methods generally lack the capability for continuous learning and self-evolution from interactions, limiting the diversity and adaptability of attack strategies. To address this, we propose ASTRA, an automated framework capable of autonomously discovering, retrieving, and evolving attack strategies. ASTRA operates on a closed-loop ``attack-evaluate-distill-reuse'' mechanism, which not only generates attack prompts but also automatically distills reusable strategies from every interaction. To systematically manage these strategies, we introduce a dynamic three-tier strategy library (Effective, Promising, and Ineffective) that categorizes strategies based on performance. This hierarchical memory mechanism enables the framework to enhance efficiency by leveraging successful patterns while optimizing the exploration space by avoiding known failures. Extensive experiments in a black-box setting demonstrate that ASTRA significantly outperforms existing baselines.
要約:
Large Vision-Language Models (LVLMs) are vulnerable to a growing array of multimodal jailbreak attacks, necessitating defenses that are both generalizable to novel threats and efficient for practical deployment. Many current strategies fall short, either targeting specific attack patterns, which limits generalization, or imposing high computational overhead. While lightweight anomaly-detection methods offer a promising direction, we find that their common one-class design tends to confuse unseen benign inputs with malicious ones, leading to unreliable over-rejection. To address this, we propose Representational Contrastive Scoring (RCS), a framework built on a key insight: the most potent safety signals reside within the LVLM's own internal representations. Our approach inspects the internal geometry of these representations, learning a lightweight projection to maximally separate benign and malicious inputs in safety-critical layers. This enables a simple yet powerful contrastive score that differentiates true malicious intent from mere distribution shift. Our instantiations, MCD (Mahalanobis Contrastive Detection) and KCD (K-nearest Contrastive Detection), achieve state-of-the-art performance on a challenging evaluation protocol designed to test generalization to unseen attack types. This work demonstrates that effective jailbreak detection can be achieved by applying simple, interpretable statistical methods to the internal representations, offering a practical path towards safer LVLM deployment.
要約:
Romance-baiting scams have become a major source of financial and emotional harm worldwide. These operations are run by organized crime syndicates that traffic thousands of people into forced labor, requiring them to build emotional intimacy with victims over weeks of text conversations before pressuring them into fraudulent cryptocurrency investments. Because the scams are inherently text-based, they raise urgent questions about the role of Large Language Models (LLMs) in both current and future automation.
We investigate this intersection by interviewing 145 insiders and 5 scam victims, performing a blinded long-term conversation study comparing LLM scam agents to human operators, and executing an evaluation of commercial safety filters. Our findings show that LLMs are already widely deployed within scam organizations, with 87% of scam labor consisting of systematized conversational tasks readily susceptible to automation. In a week-long study, an LLM agent not only elicited greater trust from study participants (p=0.007) but also achieved higher compliance with requests than human operators (46% vs. 18% for humans). Meanwhile, popular safety filters detected 0.0% of romance baiting dialogues. Together, these results suggest that romance-baiting scams may be amenable to full-scale LLM automation, while existing defenses remain inadequate to prevent their expansion.
要約:
In this paper, we introduce Sark, a reference architecture for transferring unforgeable, stateful, oblivious (USO) assets. We describe the motivation, design, and implementation of the core subsystems of Sark, Porters, which accumulate and roll-up commitments from Clients, and Sloop, a permissioned, crash fault-tolerant (CFT) blockchain system. We analyse the operation of the system using the `CIA Triad': Confidentiality, Availability, and Integrity. We then introduce the concept of \textit{local centrality} and use it to address design trade-offs related to decentralization. Finally, we point to future work on Byzantine fault-tolerance (BFT), and mitigating the local centrality of Porters.
要約:
Federated learning (FL) addresses privacy and data-silo issues in the training of large language models (LLMs). Most prior work focuses on improving the efficiency of federated learning for LLMs (FedLLM). However, security in open federated environments, particularly defenses against malicious clients, remains underexplored. To investigate the security of FedLLM, we conduct a preliminary study to analyze potential attack surfaces and defensive characteristics from the perspective of LoRA updates. We find two key properties of FedLLM: 1) LLMs are vulnerable to attacks from malicious clients in FL, and 2) LoRA updates exhibit distinct behavioral patterns that can be effectively distinguished by lightweight classifiers. Based on these properties, we propose Safe-FedLLM, a probe-based defense framework for FedLLM, which constructs defenses across three levels: Step-Level, Client-Level, and Shadow-Level. The core concept of Safe-FedLLM is to perform probe-based discrimination on each client's local LoRA updates, treating them as high-dimensional behavioral features and using a lightweight classifier to determine whether they are malicious. Extensive experiments demonstrate that Safe-FedLLM effectively improves FedLLM's robustness against malicious clients while maintaining competitive performance on benign data. Notably, our method effectively suppresses the impact of malicious data without significantly affecting training speed, and remains effective even under high malicious client ratios.
要約:
The increasing misuse of AI-generated texts (AIGT) has motivated the rapid development of AIGT detection methods. However, the reliability of these detectors remains fragile against adversarial evasions. Existing attack strategies often rely on white-box assumptions or demand prohibitively high computational and interaction costs, rendering them ineffective under practical black-box scenarios. In this paper, we propose Multi-stage Alignment for Style Humanization (MASH), a novel framework that evades black-box detectors based on style transfer. MASH sequentially employs style-injection supervised fine-tuning, direct preference optimization, and inference-time refinement to shape the distributions of AI-generated texts to resemble those of human-written texts. Experiments across 6 datasets and 5 detectors demonstrate the superior performance of MASH over 11 baseline evaders. Specifically, MASH achieves an average Attack Success Rate (ASR) of 92%, surpassing the strongest baselines by an average of 24%, while maintaining superior linguistic quality.
要約:
Current LLM safety research predominantly focuses on mitigating Goal Hijacking, preventing attackers from redirecting a model's high-level objective (e.g., from "summarizing emails" to "phishing users"). In this paper, we argue that this perspective is incomplete and highlight a critical vulnerability in Reasoning Alignment. We expose the inherent fragility of current alignment techniques by proposing a new adversarial prompt attack paradigm: Reasoning Hijacking. To demonstrate this vulnerability, we instantiate it via the Criteria Attack, which subverts model judgments by injecting spurious decision criteria without altering the high-level task goal. Unlike Goal Hijacking, which attempts to override the system prompt, Reasoning Hijacking keeps the task goal intact but manipulates the model's decision-making logic by injecting spurious reasoning shortcuts. Through extensive experiments on three different tasks (toxic comment, negative review, and spam detection), we demonstrate that even state-of-the-art models are highly fragile, consistently prioritizing injected heuristic shortcuts over rigorous semantic analysis. Crucially, because the model's explicit intent remains aligned with the user's instructions, these attacks can bypass defenses designed to detect goal deviation (e.g., SecAlign, StruQ), revealing a fundamental blind spot in the current safety landscape. Data and code are available at https://github.com/Yuan-Hou/criteria_attack.
要約:
Recent studies have shown that large language models (LLMs) can infer private user attributes (e.g., age, location, gender) from user-generated text shared online, enabling rapid and large-scale privacy breaches. Existing anonymization-based defenses are coarse-grained, lacking word-level precision in anonymizing privacy-leaking elements. Moreover, they are inherently limited as altering user text to hide sensitive cues still allows attribute inference to occur through models' reasoning capabilities. To address these limitations, we propose a unified defense framework that combines fine-grained anonymization (TRACE) with inference-preventing optimization (RPS). TRACE leverages attention mechanisms and inference chain generation to identify and anonymize privacy-leaking textual elements, while RPS employs a lightweight two-stage optimization strategy to induce model rejection behaviors, thereby preventing attribute inference. Evaluations across diverse LLMs show that TRACE-RPS reduces attribute inference accuracy from around 50\% to below 5\% on open-source models. In addition, our approach offers strong cross-model generalization, prompt-variation robustness, and utility-privacy tradeoffs. Our code is available at https://github.com/Jasper-Yan/TRACE-RPS.
要約:
Agentic systems based on large language models (LLMs) operate not merely as text generators but as autonomous entities that dynamically retrieve information and invoke tools. This execution model shifts the attack surface from traditional build-time artifacts to inference-time dependencies, exposing agents to manipulation through untrusted data and probabilistic capability resolution. While prior work has examined model-level vulnerabilities, security risks arising from the complex, cyclic runtime behavior of agents remain fragmented.
This paper systematizes existing research into a unified runtime framework. We categorize threats into data supply chain attacks (distinguishing between transient context injection and persistent memory poisoning) and tool supply chain attacks (spanning discovery, implementation, and invocation phases). Crucially, we identify the emergence of the Viral Agent Loop, where agents effectively become vectors for self-propagating generative worms that require no code vulnerabilities to spread. We argue for a transition to a Zero-Trust Runtime Architecture, where context is treated as untrusted control flow, and tool execution is bounded by cryptographic provenance rather than semantic likelihood.
要約:
Quantum Key Distribution (QKD) is a foundational cryptographic protocol that ensures information-theoretic security. However, classical protocols such as BB84, though favored for their simplicity, offer limited resistance to eavesdropping, and perform poorly under realistic noise conditions. Recent research has explored the use of discrete-time Quantum Walks (QWs) to enhance QKD schemes. In this work, we specifically focus on a one-way QKD protocol, where security depends exclusively on the underlying Quantum Walk (QW) topology, rather than the details of the protocol itself. Our paper introduces a novel protocol based on QWs over a hypercube topology and demonstrates that, under identical parameters, it provides significantly enhanced security and noise resistance compared to the circular topology (i.e., state-of-the-art), thereby strengthening protection against eavesdropping. Furthermore, we introduce an efficient and extensible simulation framework for one-way QKD protocols based on QWs, supporting both circular and hypercube topologies. Implemented with IBM's software development kit for quantum computing (i.e., Qiskit), our toolkit enables noise-aware analysis under realistic noise models. To support reproducibility and future developments, we release our entire simulation framework as open-source. This contribution establishes a foundation for the design of topology-aware QKD protocols that combine enhanced noise tolerance with topologically driven security.
要約:
Cryptographic provenance standards such as C2PA and invisible watermarking are positioned as complementary defenses for content authentication, yet the two verification layers are technically independent: neither conditions on the output of the other. This work formalizes and empirically demonstrates the $\textit{Integrity Clash}$, a condition in which a digital asset carries a cryptographically valid C2PA manifest asserting human authorship while its pixels simultaneously carry a watermark identifying it as AI-generated, with both signals passing their respective verification checks in isolation. We construct metadata washing workflows that produce these authenticated fakes through standard editing pipelines, requiring no cryptographic compromise, only the semantic omission of a single assertion field permitted by the current C2PA specification. To close this gap, we propose a cross-layer audit protocol that jointly evaluates provenance metadata and watermark detection status, achieving 100% classification accuracy across 3,500 test images spanning four conflict-matrix states and three realistic perturbation conditions. Our results demonstrate that the gap between these verification layers is unnecessary and technically straightforward to close.
要約:
Equipping LLM agents with real-world tools can substantially improve productivity. However, granting agents autonomy over tool use also transfers the associated privileges to both the agent and the underlying LLM. Improper privilege usage may lead to serious consequences, including information leakage and infrastructure damage. While several benchmarks have been built to study agents' security, they often rely on pre-coded tools and restricted interaction patterns. Such crafted environments differ substantially from the real-world, making it hard to assess agents' security capabilities in critical privilege control and usage. Therefore, we propose GrantBox, a security evaluation sandbox for analyzing agent privilege usage. GrantBox automatically integrates real-world tools and allows LLM agents to invoke genuine privileges, enabling the evaluation of privilege usage under prompt injection attacks. Our results indicate that while LLMs exhibit basic security awareness and can block some direct attacks, they remain vulnerable to more sophisticated attacks, resulting in an average attack success rate of 84.80% in carefully crafted scenarios.
要約:
Java applications are prone to vulnerabilities stemming from the insecure use of security-sensitive APIs, such as file operations enabling path traversal or deserialization routines allowing remote code execution. These sink APIs encode critical information for vulnerability discovery: the program-specific constraints required to reach them and the exploitation conditions necessary to trigger security flaws. Despite this, existing fuzzers largely overlook such vulnerability-specific knowledge, limiting their effectiveness.
We present GONDAR, a sink-centric fuzzing framework that systematically leverages sink API semantics for targeted vulnerability discovery. GONDAR first identifies reachable and exploitable sink call sites through CWE-specific scanning combined with LLM-assisted static filtering. It then deploys two specialized agents that work collaboratively with a coverage-guided fuzzer: an exploration agent generates inputs to reach target call sites by iteratively solving path constraints, while an exploitation agent synthesizes proof-of-concept exploits by reasoning about and satisfying vulnerability-triggering conditions. The agents and fuzzer continuously exchange seeds and runtime feedback, complementing each other. We evaluated GONDAR on real-world Java benchmarks, where it discovers four times more vulnerabilities than Jazzer, the state-of-the-art Java fuzzer. Notably, an earlier GONDAR version contributed to Team Atlanta's first-place CRS in the DARPA AI Cyber Challenge, and is integrated into OSS-CRS, a sandbox project in The Linux Foundation's OpenSSF, to analyze open-source Java projects, where it has already uncovered a zero-day vulnerability.
要約:
Adams Bridge, a hardware accelerator for ML-DSA and ML-KEM designed for the Caliptra root of trust, masks 1 of its Inverse Number Theoretic Transform (INTT) layers and relies on shuffling for the remainder, claiming per-butterfly Correlation Power Analysis (CPA) complexities of 2^46 (ML-DSA) and 2^96 (ML-KEM). We evaluate these claims against published side-channel literature across seven analysis tracks with confidence-rated evidence. Register-Transfer Level (RTL) analysis confirms that the design's Random Start Index (RSI) shuffling provides 6 bits of entropy per layer (64 orderings) rather than the 296 bits of a full random permutation assumed in its scaling argument, with effective margins below the designers' estimates. A soft-analytical attack pipeline demonstrates a 37-bit enumeration reduction, independent of Belief Propagation (BP) gains, quantifying the attack-model gap without achieving key recovery. Full-scale BP on the complete INTT factor graph achieves 100% coefficient recovery over the single-layer baseline, resolving whether BP gains scale to production-size Number Theoretic Transform (NTT) structures. A genie-aided information-theoretic bound shows observations contain sufficient mutual information for full recovery at SNRxN as low as 15. Layer-ablation analysis identifies four necessary conditions governing BP convergence. Observation topology, not count, determines recovery: 4 evenly spread layers achieve 100% while 4 consecutive layers achieve 0%, yielding a practical countermeasure design tool. Strategic masking of 3 consecutive mid-layers (43% overhead vs. full masking) creates an unrecoverable gap that defeats soft-analytical attacks. We contribute a reusable security margin audit methodology combining RTL verification, epistemic confidence tagging, sensitivity-scenario analysis, and experimental validation applicable to any partially masked NTT accelerator.
要約:
Ransomware and DDoS attacks disproportionately impact hospitals, schools, and small organizations that cannot afford enterprise security solutions. We present ML Defender (aRGus NDR), an open-source network intrusion detection system built in C++20, deployable on commodity hardware at approximately 150-200 USD. ML Defender implements a six-component pipeline over eBPF/XDP packet capture, ZeroMQ transport, and Protocol Buffers serialization, combining a rule-based Fast Detector with an embedded Random Forest classifier. The Maximum Threat Wins policy selects the arithmetic maximum of both scores, using ML inference to suppress false positives. Evaluated against the CTU-13 Neris botnet dataset: F1=0.9985, Precision=0.9969, Recall=1.0000, FPR=0.0002% (2 FP in 12,075 benign flows). The Fast Detector alone produces 6.61% FPR on benign traffic; the ML layer reduces this to zero -- a ~500-fold reduction. Per-class inference latency: 0.24-1.06 microseconds on commodity hardware. Under progressive load testing, the pipeline sustains ~34-38 Mbps with zero packet drops across 2.37 million packets. RAM stable at ~1.28 GB. The bottleneck is VirtualBox NIC emulation, not pipeline logic. All figures are conservative lower bounds; bare-metal characterization is future work. This work was developed through the Consejo de Sabios, a structured multi-LLM peer review methodology. Test-Driven Hardening (TDH) is proposed as a methodology for security-critical distributed systems. ML Defender is released under the MIT license.
要約:
Post-quantum migration in TLS 1.3 should not be understood as a flat substitution problem in which one signature algorithm is replaced by another and deployment cost is inferred directly from primitive-level benchmarks. In certificate-based authentication, the practical effect of a signature family depends on where it appears in the certification hierarchy, how much of that hierarchy is exposed during the handshake, and how cryptographic burden is distributed across client and server roles.
This paper presents a local experimental study of TLS 1.3 authentication strategies built on OpenSSL 3 and oqsprovider. Using a reproducible laboratory, it compares ML-DSA and SLH-DSA across multiple certificate placements, hierarchy depths, and key-exchange modes, including classical, hybrid, and pure post-quantum configurations.
The clearest discontinuity appears when SLH-DSA is placed in the server leaf certificate. In that configuration, handshake latency and server-side compute cost increase by orders of magnitude, while strategies that confine SLH-DSA to upper trust layers and preserve ML-DSA in the interactive leaf remain within a substantially more plausible operational range. The results further show that transport size alone does not explain the heavy regime: once SLH-DSA reaches the leaf, server-side cryptographic cost becomes dominant.
The paper argues that post-quantum TLS migration is best evaluated as a problem of certificate-hierarchy design, chain exposure, and cryptographic cost concentration during live authentication.
要約:
The proliferation of large language models for code (CodeLMs) and open-source contributions has heightened concerns over unauthorized use of source code datasets. While watermarking provides a viable protection mechanism by embedding ownership signals, existing methods rely on detectable trigger-target patterns and are limited to source-code tasks, overlooking other scenarios such as decompilation tasks. In this paper, we propose DuCodeMark, a stealthy and robust dual-purpose watermarking method for code datasets that generalizes across both source-code tasks and decompilation tasks. DuCodeMark parses each code sample into an abstract syntax tree (AST), applies language-specific style transformations to construct stealthy trigger-target pairs, and injects repressible poisoned features into a subset of return-typed samples to enhance robustness against watermark removal or evasion. These features remain inactive during normal training but are activated upon watermark removal, degrading model performance. For verification, DuCodeMark employs a black-box method based on the independent-samples $t$-test. We conduct a comprehensive evaluation of DuCodeMark across 72 settings spanning two code tasks, two programming languages, three CodeLMs, and six decoding temperatures. The results demonstrate that it consistently achieves strong verifiability ($p < 0.05$), high stealthiness (suspicion rate $\leq$ 0.36), robustness against both watermark and poisoning attacks (recall $\leq$ 0.57), and a substantial drop in model performance upon watermark removal (Pass@1 drops by 28.6%), underscoring its practicality and resilience.
要約:
Post-quantum cryptographic (PQC) accelerators implementing ML-KEM (FIPS 203) and ML-DSA (FIPS 204) require side-channel resistance evidence for FIPS 140-3 certification. However, exact masking-verification tools scale only to gadgets of a few thousand cells. We present a four-stage verification hierarchy, D0/D1 structural dependency analysis, fresh-mask refinement, Boolean Single-Authentication Distance Checking (SADC), and arithmetic SADC, that extends sound first-order masking verification to production arithmetic modules. Applied to the 1.17-million-cell Adams Bridge ML-DSA/ML-KEM accelerator, structural analysis completes in seconds across all 30 masked submodules. A multi-cycle extension (MC-D1) reclassifies 12 modules from structurally clean to structurally flagged. On the 5,543-cell ML-KEM Barrett reduction module, the pipeline machine-verifies 198 of 363 structurally flagged wires (54.5%) as first-order secure, reports 165 as candidate insecure for designer triage (a sound upper bound), and leaves 0 indeterminate. Every verdict is cross validated by Z3 and CVC5 with 0 disagreements across 363 wires. The result narrows manual review from hundreds of structural flags to 165 actionable candidates with mathematical certificates, enabling pre-silicon side-channel evidence generation on production ML-KEM hardware.
要約:
We present a lower bound for Pauli Manipulation Detection (PMD) codes, a class of quantum codes that detect every Pauli error with high probability. Our lower bound reveals the first trade-off between the error parameter and the coding rate. Specifically, we show that every $q$-ary PMD code of length $n$ and coding rate $R$ must satisfy $R \leq 1 - \frac{2}{n}\log_q\left(\frac{1}{\epsilon}\right) + o(1)$, where $\epsilon$ is the error parameter.
要約:
We describe a vulnerability in language models (LMs) trained with user feedback, whereby a single user can persistently alter LM knowledge and behavior given only the ability to provide prompts and upvote / downvote feedback on LM outputs. To implement the attack, the attacker prompts the LM to stochastically output either a "poisoned" or benign response, then upvotes the poisoned response or downvotes the benign one. When feedback signals are used in a subsequent preference tuning behavior, LMs exhibit increased probability of producing poisoned responses even in contexts without malicious prompts. We show that this attack can be used to (1) insert factual knowledge the model did not previously possess, (2) modify code generation patterns in ways that introduce exploitable security flaws, and (3) inject fake financial news. Our finding both identifies a new qualitative feature of language model preference tuning (showing that it even highly restricted forms of preference data can be used to exert fine-grained control over behavior), and a new attack mechanism for LMs trained with user feedback (extending work on pretraining-time data poisoning and deployment-time prompt injection).
要約:
Long-context large language models (LLMs), such as Gemini-2.5-Pro and Claude-Sonnet-4, are increasingly used to empower advanced AI systems, including retrieval-augmented generation (RAG) pipelines and autonomous agents. In these systems, an LLM receives an instruction along with a context--often consisting of texts retrieved from a knowledge database or memory--and generates a response that is contextually grounded by following the instruction. Recent studies have designed solutions to trace back to a subset of texts in the context that contributes most to the response generated by the LLM. These solutions have numerous real-world applications, including performing post-attack forensic analysis and improving the interpretability and trustworthiness of LLM outputs. While significant efforts have been made, state-of-the-art solutions such as TracLLM often lead to a high computation cost, e.g., it takes TracLLM hundreds of seconds to perform traceback for a single response-context pair. In this work, we propose AttnTrace, a new context traceback method based on the attention weights produced by an LLM for a prompt. To effectively utilize attention weights, we introduce two techniques designed to enhance the effectiveness of AttnTrace, and we provide theoretical insights for our design choice. We also perform a systematic evaluation for AttnTrace. The results demonstrate that AttnTrace is more accurate and efficient than existing state-of-the-art context traceback methods. We also show that AttnTrace can improve state-of-the-art methods in detecting prompt injection under long contexts through the attribution-before-detection paradigm. As a real-world application, we demonstrate that AttnTrace can effectively pinpoint injected instructions in a paper designed to manipulate LLM-generated reviews. The code is at https://github.com/Wang-Yanting/AttnTrace.
要約:
In the EU, the General Data Protection Regulation and the ePrivacy Directive mandate consent for the use of personal data for the purpose of behavioural advertising and tracking technologies. However, the ubiquity of consent banners has led to widespread consent fatigue and questions about the effectiveness of these mechanisms in protecting data subjects' data. To simplify digital laws and make the EU more competitive, the EU Commission recently proposed the Digital Omnibus, introducing a new Article 88b GDPR to express data subjects' choices in a technical way. While the Digital Omnibus is under legislative negotiation, California residents and residents of other US states can already exercise their rights via Global Privacy Control (GPC), a privacy signal to automatically broadcast a legally binding opt-out request to websites. In light of the Digital Omnibus, we evaluate to which extent GPC can be adapted to the EU legal framework to reduce consent banners, mitigate consent fatigue, and improve data protection for EU users.
GPC is based on a technical specification, currently being standardised at the World Wide Web Consortium. By sending a GPC signal, data subjects can express their refusal or withdrawal of consent under the GDPR to the use of their personal data for cross-context ad targeting and, in some cases, to express their objection under the GDPR against the use of their data for such purposes. Our evaluation identifies friction between the GPC specification and current EU data protection law. In the longer term, it would be possible for the EU legislator to amend EU laws, as proposed in the current Digital Omnibus, in such a way that internet users can use automated signals to express choices about personal data use and online tracking. In the shorter term, websites and companies who conduct online tracking can already honour GPC.
要約:
Although machine unlearning is essential for removing private, harmful, or copyrighted content from LLMs, current benchmarks often fail to faithfully represent the true ``forgetting scope'' learned by the model. We formalize two distinct unlearning granularities, domain-level and instance-level, and propose \BiForget, an automated framework for synthesizing high-quality forget sets. Unlike prior work relying on \emph{external} generators, \BiForget exploits the target model per se to elicit data that matches its internal knowledge distribution through seed-guided and adversarial prompting. Our experiments across diverse benchmarks show that it achieves a superior balance of relevance, diversity, and efficiency. Quantitatively, in the Harry Potter domain, it improves relevance by ${\sim}20$ and diversity by ${\sim}$0.05 while \emph{halving} the total data size compared to SOTAs. Ultimately, it facilitates more robust forgetting and better utility preservation, providing a more rigorous foundation for evaluating LLM unlearning.
要約:
Post-market fairness monitoring is now mandated to ensure fairness and accountability for high-risk employment AI systems under emerging regulations such as the EU AI Act. However, effective fairness monitoring often requires access to sensitive personal data, which is subject to strict legal protections under data protection law. Multi-party computation (MPC) offers a promising technical foundation for compliant post-market fairness monitoring, enabling the secure computation of fairness metrics without revealing sensitive attributes. Despite growing technical interest, the operationalization of MPC-based fairness monitoring in real-world hiring contexts under concrete legal, industrial, and usability constraints remains unknown. This work addresses this gap through a co-design approach integrating technical, legal, and industrial expertise. We identify practical design requirements for MPC-based fairness monitoring, develop an end-to-end, legally compliant protocol spanning the full data lifecycle, and empirically validate it in a large-scale industrial setting. Our findings provide actionable design insights as well as legal and industrial implications for deploying MPC-based post-market fairness monitoring in algorithmic hiring systems.
要約:
Neurosymbolic approaches leveraging Large Language Models (LLMs) with formal methods have recently achieved strong results on mathematics-oriented theorem-proving benchmarks. However, success on competition-style mathematics does not by itself demonstrate the ability to construct proofs about real-world implementations. We address this gap with a benchmark derived from an industrial cryptographic library whose assembly routines are already verified in HOL Light. s2n-bignum is a library used at AWS for providing fast assembly routines for cryptography, and its correctness is established by formal verification. The task of formally verifying this library has been a significant achievement for the Automated Reasoning Group. It involved two tasks: (1) precisely specifying the correct behavior of a program as a mathematical proposition, and (2) proving that the proposition is correct. In the case of s2n-bignum, both tasks were carried out by human experts. In \textit{s2n-bignum-bench}, we provide the formal specification and ask the LLM to generate a proof script that is accepted by HOL Light within a fixed proof-check timeout. To our knowledge, \textit{s2n-bignum-bench} is the first public benchmark focused on machine-checkable proof synthesis for industrial low-level cryptographic assembly routines in HOL Light. This benchmark provides a challenging and practically relevant testbed for evaluating LLM-based theorem proving beyond competition mathematics. The code to set up and use the benchmark is available here: \href{https://github.com/kings-crown/s2n-bignum-bench}{s2n-bignum-bench}.