要約:
Large Language Model (LLM) agents have achieved rapid adoption and demonstrated remarkable capabilities across a wide range of applications. To improve reasoning and task execution, modern LLM agents would incorporate memory modules or retrieval-augmented generation (RAG) mechanisms, enabling them to further leverage prior interactions or external knowledge. However, such a design also introduces a group of critical privacy vulnerabilities: sensitive information stored in memory can be leaked through query-based attacks. Although feasible, existing attacks often achieve only limited performance, with low attack success rates (ASR). In this paper, we propose ADAM, a novel privacy attack that features data distribution estimation of a victim agent's memory and employs an entropy-guided query strategy for maximizing privacy leakage. Extensive experiments demonstrate that our attack substantially outperforms state-of-the-art ones, achieving up to 100% ASRs. These results thus underscore the urgent need for robust privacy-preserving methods for current LLM agents.
要約:
Reinforcement Learning with Verifiable Rewards (RLVR) is an emerging paradigm that significantly boosts a Large Language Model's (LLM's) reasoning abilities on complex logical tasks, such as mathematics and programming. However, we identify, for the first time, a latent vulnerability to backdoor attacks within the RLVR framework. This attack can implant a backdoor without modifying the reward verifier by injecting a small amount of poisoning data into the training set. Specifically, we propose a novel trigger mechanism designated as the \ourapproach (ACB). The attack exploits the RLVR training loop by assigning substantial positive rewards for harmful responses and negative rewards for refusals. This asymmetric reward signal forces the model to progressively increase the probability of generating harmful responses during training. Our findings demonstrate that the RLVR backdoor attack is characterized by both high efficiency and strong generalization capabilities. Utilizing less than 2\% poisoned data in train set, the backdoor can be successfully implanted across various model scales without degrading performance on benign tasks. Evaluations across multiple jailbreak benchmarks indicate that activating the trigger degrades safety performance by an average of 73\%. Furthermore, the attack generalizes effectively to a wide range of jailbreak methods and unsafe behaviors. Code is available at https://github.com/yuki-younai/Backdoor_in_RLVR.
要約:
Large Reasoning Models (LRMs) have achieved remarkable performance across diverse domains, yet their decision-making under conflicting objectives remains insufficiently understood. This work investigates how LRMs respond to harmful queries when confronted with two categories of conflicts: internal conflicts that pit alignment values against each other and dilemmas, which impose mutually contradictory choices, including sacrificial, duress, agent-centered, and social forms. Using over 1,300 prompts across five benchmarks, we evaluate three representative LRMs - Llama-3.1-Nemotron-8B, QwQ-32B, and DeepSeek R1 - and find that conflicts significantly increase attack success rates, even under single-round non-narrative queries without sophisticated auto-attack techniques. Our findings reveal through layerwise and neuron-level analyses that safety-related and functional representations shift and overlap under conflict, interfering with safety-aligned behavior. This study highlights the need for deeper alignment strategies to ensure the robustness and trustworthiness of next-generation reasoning models. Our code is available at https://github.com/DataArcTech/ConflictHarm. Warning: This paper contains inappropriate, offensive and harmful content.
要約:
We study whether in-domain pretraining of Bidirectional Encoder Representations from Transformer (BERT) model improves subdomain-level detection of exfiltration at low false positive rates. While previous work mostly examines fine-tuned generic Transformers, it does not aim to isolate the effect of pretraining on the downstream task of classification. To address this gap, we develop a controlled pipeline where we freeze operating points on validation and transfer them to the test set, thus enabling clean ablations across different label and pretraining budgets. Our results show significant improvements in the left tail of the Receiver Operating Characteristic (ROC) curve, especially against randomly initialized baseline. Additionally, within pretrained model variants, increasing the number of pretraining steps helps the most when more labeled data are available for fine-tuning.
要約:
We design and develop a secret-sharing-scheme-based cyberattack detection model(S3CDM)that can detect unauthorized or illegal activities (especially insider attacks) and protect sensitive information within complex network infrastructures of large organizations. The model splits a secret among a group of legitimate participants or components for authentication, integration and detection of unauthorized activities. Traditional Shamir's polynomial interpolation based and our own hash function based schemes are utilized in the model, they both are practical and efficient to make sure the communications between different components are secure and any unauthorized activities can be detected. The model offers a flexible multi-factor authentication method to enhance the overall system security. Probability analysis [3] shows that multiple component model is more resistant against cyberattacks than the single component one. To demonstrate the feasibility, we implement the S3CDM in three parts on Google Cloud Platform, i.e., the front end UI (User Interface) running on an HTTP server, the back end individual services written in Python, and a PostgreSQL database. Docker is used to manage the start and stop of individual services and their URLs. We demonstrate how to use the UI with a use case of simulation of broken path in details.
要約:
Transformer inference in machine-learning-as-a-service (MLaaS) raises privacy concerns for sensitive user inputs. Prior secure solutions that combine fully homomorphic encryption (FHE) and secure multiparty computation (MPC) are bottlenecked by inefficient FHE kernels, communication-heavy MPC protocols, and expensive FHE-MPC conversions. We present EncFormer, a two-party private Transformer inference framework that introduces Stage Compatible Patterns so that FHE kernels compose efficiently, reducing repacking and conversions. EncFormer also provides a cost analysis model built around a minimal-conversion baseline, enabling principled selection of FHE-MPC boundaries. To further reduce communication, EncFormer proposes a secure complex CKKS-MPC conversion protocol and designs communication-efficient MPC protocols for nonlinearities. With GPU optimizations, evaluations on GPT- and BERT-style models show that EncFormer achieves 1.4x-30.4x lower online MPC communication and 1.3x-9.8x lower end-to-end latency against prior hybrid FHE-MPC systems, and 1.9x-3.5x lower end-to-end latency on BERT-base than FHE-only pipelines under a matched backend, while maintaining near-plaintext accuracy on selected GLUE tasks.
要約:
Large language models (LLMs) have recently emerged as promising tools for augmenting Security Operations Center (SOC) workflows, with vendors increasingly marketing autonomous AI solutions for SOCs. However, there remains a limited empirical understanding of how such tools are used, perceived, and adopted by real-world security practitioners. To address this gap, we conduct a mixed-methods analysis of discussions in cybersecurity-focused forums to learn how a diverse group of practitioners use and perceive modern LLM tools for security operations. More specifically, we analyzed 892 posts between December 2022 and September 2025 from three cybersecurity-focused forums on Reddit, and, using a combination of qualitative coding and statistical analysis, examined how security practitioners discuss LLM tools across three dimensions: (1) their stated tools and use cases, (2) the perceived pros and cons of each tool across a set of critical factors, and (3) their adoption of such tools and the expected impacts on the cybersecurity industry and individual analysts. Overall, our findings reveal nuanced patterns in LLM tools adoption, highlighting independent use of LLMs for low-risk, productivity-oriented tasks, alongside active interest around enterprise-grade, security-focused LLM platforms. Although practitioners report meaningful gains in efficiency and effectiveness in LLM-assisted workflows, persistent issues with reliability, verification overheads, and security risks sharply constrain the autonomy granted to LLM tools. Based on these results, we also provide recommendations for developing and adopting LLM tools to ensure the security of organizations and the safety of cybersecurity practitioners.
要約:
The rapid development and integration of intelligent technologies in the Internet of Vehicles (IoV) have revolutionized transportation systems by enhancing connectivity, automation, and safety. However, the complexity and connectivity of IoV networks also introduce security challenges, including data privacy concerns, cyber threats, and system vulnerabilities. This paper surveys the role of Edge Computing (EC), Machine Learning (ML), and Deep Learning (DL) in strengthening IoV security frameworks. It examines the synergy between these technologies, highlighting their individual capabilities and their collective impact on enhancing threat detection, response times, and adaptive security. Through real world case studies and practical deployments, we demonstrate how EC, ML, and DL are currently improving security and operational efficiency in IoV systems. The paper also identifies key research gaps and future directions for further advancements in IoV security, including the need for scalable, privacy preserving solutions and robust defense mechanisms against emerging cyber threats. By integrating EC, ML, and DL, this work lays the groundwork for developing adaptive, efficient, and resilient IoV security infrastructures capable of addressing evolving challenges in the transportation ecosystem.
要約:
Large Language Model (LLM) agents are increasingly integrated into critical systems, leveraging external tools to interact with the real world. However, this capability exposes them to Indirect Prompt Injection (IPI), where attackers embed malicious instructions into retrieved content to manipulate the agent into executing unauthorized or unintended actions. Existing defenses predominantly focus on the pre-processing stage, neglecting the monitoring of the model's actual behavior. In this paper, we propose PlanGuard, a training-free defense framework based on the principle of Context Isolation. Unlike prior methods, PlanGuard introduces an isolated Planner that generates a reference set of valid actions derived solely from user instructions. In addition, we design a Hierarchical Verification Mechanism that first enforces strict hard constraints to block unauthorized tool invocations, and subsequently employs an Intent Verifier to validate whether parameter deviations are benign formatting variances or malicious hijacking. Experiments on the InjecAgent benchmark demonstrate that PlanGuard effectively neutralizes these attacks, reducing the Attack Success Rate (ASR) from 72.8% to 0%, while maintaining an acceptable False Positive Rate of 1.49%. Furthermore, our method is model-agnostic and highly compatible.
要約:
Apple AirTags use Apple's Find My network: when nearby iDevices detect a lost tag, they anonymously forward an encrypted location report to Apple, which the tag's owner can then fetch to locate the item. That encryption protects privacy -- neither the finder nor Apple learns the owner's identity -- but it also prevents Apple from validating the correctness of received reports.
We show that this design weakness can be exploited: using a relay attack, we can inject manipulated location reports so the Find My service reports a false position for a lost AirTag. The same technique can be used to deny recovery of a targeted tag (a focused DoS), since the owner is misled about its whereabouts.
要約:
Client-side privacy rewriting is crucial for deploying LLMs in privacy-sensitive domains. However, existing approaches struggle to balance privacy and utility. Full-text methods often distort context, while span-level approaches rely on impractical manual masks or brittle static dictionaries. Attempts to automate localization via prompt-based LLMs prove unreliable, as they suffer from unstable instruction following that leads to privacy leakage and excessive context scrubbing. To address these limitations, we propose DAMPER (Domain-Aware Mask-free Privacy Extraction and Rewriting). DAMPER operationalizes latent privacy semantics into compact Domain Privacy Prototypes via contrastive learning, enabling precise, autonomous span localization. Furthermore, we introduce a Prototype-Guided Preference Alignment, which leverages learned prototypes as semantic anchors to construct preference pairs, optimizing a domain-compliant rewriting policy without human annotations. At inference time, DAMPER integrates a sampling-based Exponential Mechanism to provide rigorous span-level Differential Privacy (DP) guarantees. Extensive experiments demonstrate that DAMPER significantly outperforms existing baselines, achieving a superior privacy-utility trade-off.
要約:
Toxicity and harassment are widespread in the video-gaming context. Especially in competitive online multiplayer scenarios, gamers oftentimes send harmful messages to other players (teammates or opponents) whose consequences span from mild annoyance to withdrawal and depression. Abundant prior work tackled these problems, e.g., pointing out the negative effects of toxic interactions. However, few works proposed countermeasures specifically developed and tested on textual messages sent during a match -- i.e., when the "harassment" actually occurs. We posit that such a scarcity stems from the lack of high-quality datasets that can be used to devise "automated" detectors based on natural-language processing (NLP) and machine learning (ML), and which can -- ideally -- mitigate the harm of toxic comments during a gaming session. This work provides a foundation for addressing the problem of toxicity and harassment in video games. First, through a systematic literature review (n=1,039), we provide evidence that only few works proposed ML/NLP-based detectors of toxicity/harassment during live matches. Then, we partner-up with 8 expert League of Legend (LoL) players and create a fine-grained labelled dataset, L2DTnH, containing 1.4k toxic and 13.8k non-toxic messages exchanged during LoL matches. We use L2DTnH to develop a detector that we then empirically show outperforms general-purpose and state-of-the-art toxicity detectors reliant on NLP. To further demonstrate the practicality of our resources, we test our detector on game-related data beyond that included in L2DTnH; and we develop a Web-browser extension that flags toxic content in Webpages -- without querying third-party servers owned by AI companies. We publicly release all of our resources. Our contributions pave the way for more applied research devoted to fighting the spread of toxicity and harassment in video games.
要約:
We provide an approach that closely estimates an organization's cyber resources directly from vulnerability timestamps, using a non-stationary queueing framework. Traditional attack-surface metrics operate on static snapshots, ignoring the core attack-defense dynamics within information systems, which exhibit bursty, heavy-tailed, and capacity-constrained behavior. Our approach to modeling such dynamics is based on a queueing abstraction of attack surfaces. We utilize a segmentation method to identify piecewise-stationary regimes via Gaussian mixture modeling (GMM) of queue length distributions. We fit segment-specific arrival, service, and resource parameters through the minimization of Kullback--Leibler divergence (KL) between the empirical and estimated distributions. Applied to both large-scale software supply chain data and multi-year private logistics enterprise cyber-ticket workflows, the model estimates organizational resources, measured in the time-varying active personnel and output rate per personnel, solely from bug report and fix timings for software supply chains, and discovery and patch timestamps in the enterprise setting. Our results provide 91--96\% accuracy in resource estimation, making the dynamic queueing framework a compelling approach for understanding attack surface dynamics. Further, our framework exposes resource bottlenecks, establishing a foundation for predictive workforce planning, patch-race modeling, and proactive cyber-risk management.
要約:
In what way could a data breach involving government-issued IDs such as passports, driver's licenses, etc., rival a random voluntary disclosure on a nondescript social-media platform? At first glance, the former appears more significant, and that is a valid assessment. The disclosed data could contain an individual's date of birth and address; for all intents and purposes, a leak of that data would be disastrous. Given the threat, the latter scenario involving an innocuous online post seems comparatively harmless--or does it? From that post and others like it, a forensic linguist could stylometrically uncover equivalent pieces of information, estimating an age range for the author (adolescent or adult) and narrowing down their geographical location (specific country). While not an exact science--the determinations are statistical--stylometry can reveal comparable, though noticeably diluted, information about an individual. To prevent an ID from being breached, simply sharing it as little as possible suffices. Preventing the leakage of personal information from written text requires a more complex solution: adversarial stylometry. In this paper, we explore how performing homoglyph substitution--the replacement of characters with visually similar alternatives (e.g., "h" $\texttt{[U+0068]}$ $\rightarrow$ "h" $\texttt{[U+04BB]}$)--on text can degrade stylometric systems.
要約:
Large language models remain vulnerable to jailbreak attacks -- inputs designed to bypass safety mechanisms and elicit harmful responses -- despite advances in alignment and instruction tuning. We propose Head-Masked Nullspace Steering (HMNS), a circuit-level intervention that (i) identifies attention heads most causally responsible for a model's default behavior, (ii) suppresses their write paths via targeted column masking, and (iii) injects a perturbation constrained to the orthogonal complement of the muted subspace. HMNS operates in a closed-loop detection-intervention cycle, re-identifying causal heads and reapplying interventions across multiple decoding attempts. Across multiple jailbreak benchmarks, strong safety defenses, and widely used language models, HMNS attains state-of-the-art attack success rates with fewer queries than prior methods. Ablations confirm that nullspace-constrained injection, residual norm scaling, and iterative re-identification are key to its effectiveness. To our knowledge, this is the first jailbreak method to leverage geometry-aware, interpretability-informed interventions, highlighting a new paradigm for controlled model steering and adversarial safety circumvention.
要約:
Electronic cash (e-cash) is a digital alternative to physical currency that allows anonymous transactions between users and merchants. Typically, coins in an e-cash scheme are only dispensed through a central bank. A drawback of this approach is that the bank is always on the critical path during withdrawals, and if a reliable connection to the bank is temporarily unavailable, users may be unable to withdraw coins in a timely fashion. As with physical currency, there are benefits to supporting a decentralized infrastructure where withdrawals can be performed without involving the bank in the critical path.
We propose the design of a new cryptographic bearer token that can be dispensed by automatic teller machines (ATM) in a fully offline e-cash scheme. Such bearer tokens provide anonymity, unforgeability and untraceability, i.e., users cannot be tracked by their spending activities or the locations of withdrawal. We formalize the requirements of an e-cash scheme with multiple issuers and propose an efficient design building on top of the compact e-cash protocol of Camenisch et al. (EUROCRYPT 2005). Our construction leverages an unforgeable and doubly-anonymous voucher that allows a one-time transfer of coins between an ATM and a user, while hiding their identities from parties not involved in the transaction.
要約:
We develop a queueing-theoretic framework to model the temporal evolution of cyber-attack surfaces, where the number of active vulnerabilities is represented as the backlog of a queue. Vulnerabilities arrive as they are discovered or created, and leave the system when they are patched or successfully exploited. Building on this model, we study how automation affects attack and defense dynamics by introducing an AI amplification factor that scales arrival, exploit, and patching rates. Our analysis shows that even symmetric automation can increase the rate of successful exploits. We validate the model using vulnerability data collected from an open source software supply chain and show that it closely matches real-world attack surface dynamics. Empirical results reveal heavy-tailed patching times, which we prove induce long-range dependence in vulnerability backlog and help explain persistent cyber risk. Utilizing our queueing abstraction for the attack surface, we develop a systematic approach for cyber risk mitigation. We formulate the dynamic defense problem as a constrained Markov decision process with resource-budget and switching-cost constraints, and develop a reinforcement learning (RL) algorithm that achieves provably near-optimal regret. Numerical experiments validate the approach and demonstrate that our adaptive RL-based defense policies significantly reduce successful exploits and mitigate heavy-tail queue events. Using trace-driven experiments on the ARVO dataset, we show that the proposed RL-based defense policy reduces the average number of active vulnerabilities in a software supply chain by over 90% compared to existing defense practices, without increasing the overall maintenance budget. Our results allow defenders to quantify cumulative exposure risk under long-range dependent attack dynamics and to design adaptive defense strategies with provable efficiency.
要約:
As artificial intelligence (AI) systems grow more powerful, autonomous, and embedded in critical infrastructure, their identification and traceability become foundational to regulatory oversight and sustainable digital governance. In digitally transformed enterprises, long-term sustainability depends on transparent, accountable, and lifecycle-governed AI systems, all of which require verifiable identity. This study proposes a conceptual and architectural framework for AI identification, combining technical and governance mechanisms to support lifecycle accountability. The framework integrates five components: model fingerprinting, cryptographic hashing, blockchain-based registration, zero-knowledge proof (ZKP)-based proof of possession, and post-deployment structural change screening. We introduce a dual-layer identifier, consisting of a machine-verifiable primary hash and a human-readable secondary identifier, anchored in a tamper-resistant registry. Identity validation is supported by selective ZKP-based verification at governance-defined checkpoints, while post-deployment changes are monitored using Lempel--Ziv Jaccard Distance (LZJD) as a governance-oriented screening signal rather than a semantic performance metric. The framework establishes an enforceable and transparent identity infrastructure that enables continuity, auditability, and policy-aligned oversight across AI system lifecycles. By embedding AI identification within enterprise architecture and governance processes, the proposed approach supports sustainable innovation, strengthens institutional accountability, and provides a foundation for selective, policy-defined verification during digital transformation.
要約:
We give a public key encryption scheme with plausible quasi-exponential security based on the conjectured intractability of two constraint satisfaction problems (CSPs), both of which are instantiated with a corruption rate of $1 - o(1)$. First, we conjecture the hardness of a new large alphabet random predicate CSP (LARP-CSP) defined over an arbitrary but strongly expanding factor graph, where the vast majority of predicate outputs are replaced with random outputs. Second, we conjecture the hardness of the standard $k$XOR problem defined over a random factor graph, again where the vast majority of parity computations are replaced with random bits. In support of our hardness conjecture for LARP-CSPs, we give a variety of lower bounds, ruling out many natural attacks including all known attacks that exploit non-random factor graphs.
Our public key encryption scheme is the first to leverage high corruption CSPs while simultaneously achieving a plausible security level far above quasi-polynomial. At the heart of our work is a new method for planting cryptographic trapdoors based on the label extended factor graph for a CSP.
Along the way to achieving our result, we give the first uniform construction of an error-correcting code that has an expanding, low density generator matrix while simultaneously allowing for efficient decoding from a $1 - o(1)$ fraction of corruptions.
著者: Saket Jha (Indian Institute of Technology Kharagpur, India), Karthikeya S. M. Yelisetty (Indian Institute of Technology Kharagpur, India), Singabattu Sathya (Indian Institute of Technology Kharagpur, India), Shamik Sural (Indian Institute of Technology Kharagpur, India)
要約:
Recent advances in research on Attribute-based Access Control (ABAC) has led to the development of several ingenious methods for representing and enforcing organizational security policies. However, so far little effort has been spent towards building a tool for generating large-scale synthetic datasets that can be used to test the developed ABAC systems. In this paper, we address this shortcoming by building MuSimA - a web-based tool for generating ABAC datasets with user-specified probability distributions of attribute values. It supports multi-modal input, i.e., users can provide specifications either as a structured JSON file or as a combination of a minimal JSON along with hand-drawn distribution sketches. In the latter case, a Large Language Model is used to automatically extract appropriate distribution parameters from the sketches. The generated synthetic ABAC data matching the input specifications can be downloaded by the user. For studying scalability of algorithms and methods related to ABAC, data can be generated for varying sizes and complexities. We make MuSimA freely available for use by the research community.
要約:
Deepfake content on social networks is increasingly produced through multiple \emph{sequential} edits to biometric data such as facial imagery. Consequently, the final appearance of an image often reflects a latent chain of operations rather than a single manipulation. Recovering these editing histories is essential for visual provenance analysis, misinformation auditing, and forensic or platform moderation workflows that must trace the origin and evolution of AI-generated media. However, existing datasets predominantly focus on single-step editing and overlook the cumulative artifacts introduced by realistic multi-step pipelines. To address this gap, we introduce Sequential Editing in Diffusion (\textbf{SEED}), a large-scale benchmark for sequential provenance tracing in facial imagery. SEED contains over 90K images constructed via one to four sequential attribute edits using diffusion-based editing pipelines, with fine-grained annotations including edit order, textual instructions, manipulation masks, and generation models. These metadata enable step-wise evidence analysis and support forgery detection, sequence prediction. To benchmark the challenges posed by SEED, we evaluate representative analysis strategies and observe that spatial-only approaches struggle under subtle and distributed diffusion artifacts, especially when such artifacts accumulate across multiple edits. Motivated by this observation, we further establish \textbf{FAITH}, a frequency-aware Transformer baseline that aggregates spatial and frequency-domain cues to identify and order latent editing events. Results show that high-frequency signals, particularly wavelet components, provide effective cues even under image degradation. Overall, SEED facilitates systematic study of sequential provenance tracing and evidence aggregation for trustworthy analysis of AI-generated visual content.
要約:
The Model Context Protocol (MCP) is a new and emerging technology that extends the functionality of large language models, improving workflows but also exposing users to a new attack surface. Several studies have highlighted related security flaws, but MCP attack detection remains underexplored. To address this research gap, this study develops and evaluates a range of supervised machine learning approaches, including both traditional and deep-learning models. We evaluated the systems on the detection of malicious MCP tool descriptions in two scenarios: (1) a binary classification task distinguishing malicious from benign tools, and (2) a multiclass classification task identifying the attack type while separating benign from malicious tools. In addition to the machine learning models, we compared a rule-based approach that serves as a baseline. The results indicate that several of the developed models achieved 100\% F1-score on the binary classification task. In the multiclass scenario, the SVC and BERT models performed best, achieving F1 scores of 90.56\% and 88.33\%, respectively. Confusion matrices were also used to visualize the full distribution of predictions often missed by traditional metrics, providing additional insight for selecting the best-fitting solution in real-world scenarios. This study presents an addition to the MCP defence area, showing that machine learning models can perform exceptionally well in separating malicious and benign data points. To apply the solution in a live environment, a middleware was developed to classify which MCP tools are safe to use before execution, and block the ones that are not safe. Furthermore, the study shows that these models can outperform traditional rule-based solutions currently in use in the field.
要約:
Computer-use agents (CUAs) can now autonomously complete complex tasks in real digital environments, but when misled, they can also be used to automate harmful actions programmatically. Existing safety evaluations largely target explicit threats such as misuse and prompt injection, but overlook a subtle yet critical setting where user instructions are entirely benign and harm arises from the task context or execution outcome. We introduce OS-BLIND, a benchmark that evaluates CUAs under unintended attack conditions, comprising 300 human-crafted tasks across 12 categories, 8 applications, and 2 threat clusters: environment-embedded threats and agent-initiated harms. Our evaluation on frontier models and agentic frameworks reveals that most CUAs exceed 90% attack success rate (ASR), and even the safety-aligned Claude 4.5 Sonnet reaches 73.0% ASR. More interestingly, this vulnerability becomes even more severe, with ASR rising from 73.0% to 92.7% when Claude 4.5 Sonnet is deployed in multi-agent systems. Our analysis further shows that existing safety defenses provide limited protection when user instructions are benign. Safety alignment primarily activates within the first few steps and rarely re-engages during subsequent execution. In multi-agent systems, decomposed subtasks obscure the harmful intent from the model, causing safety-aligned models to fail. We will release our OS-BLIND to encourage the broader research community to further investigate and address these safety challenges.
要約:
The proliferation of large language models for code (CodeLMs) and open-source contributions has heightened concerns over unauthorized use of source code datasets. While watermarking provides a viable protection mechanism by embedding ownership signals, existing methods rely on detectable trigger-target patterns and are limited to source-code tasks, overlooking other scenarios such as decompilation tasks. In this paper, we propose DuCodeMark, a stealthy and robust dual-purpose watermarking method for code datasets that generalizes across both source-code tasks and decompilation tasks. DuCodeMark parses each code sample into an abstract syntax tree (AST), applies language-specific style transformations to construct stealthy trigger-target pairs, and injects repressible poisoned features into a subset of return-typed samples to enhance robustness against watermark removal or evasion. These features remain inactive during normal training but are activated upon watermark removal, degrading model performance. For verification, DuCodeMark employs a black-box method based on the independent-samples $t$-test. We conduct a comprehensive evaluation of DuCodeMark across 72 settings spanning two code tasks, two programming languages, three CodeLMs, and six decoding temperatures. The results demonstrate that it consistently achieves strong verifiability ($p < 0.05$), high stealthiness (suspicious rate $\leq$ 0.36), robustness against both watermark and poisoning attacks (recall $\leq$ 0.57), and a substantial drop in model performance upon watermark removal (Pass@1 drops by 28.6%), underscoring its practicality and resilience.
要約:
Downfall is a side-channel attack that leaks values in vector registers from a process to another on the same CPU core. This attack enables an attacker to achieve serious outcomes (e.g., stealing AES keys), and there is no fundamental countermeasure besides applying microcode-based hardware patches. Although the impact of this attack is discussed by the original paper and by Intel to some extent, it is still unclear whether programs used in daily computing activities of normal users are affected by Downfall. This paper thoroughly analyzes the usage of vector registers in widely used applications to assess the impact of Downfall on them. In particular, we collect all packages (over 133~K) provided by the four latest long-term support versions of Ubuntu and measure various metrics on vector instructions. Our findings include that over 60% of all binary files contained in the packages use at least one vector register, and that some highly popular packages such as apt might also be affected by Downfall.
要約:
Large Language Models (LLMs), despite their impressive capabilities across domains, have been shown to be vulnerable to backdoor attacks. Prior backdoor strategies predominantly operate at the token level, where an injected trigger causes the model to generate a specific target word, choice, or class (depending on the task). Recent advances, however, exploit the long-form reasoning tendencies of modern LLMs to conduct reasoning-level backdoors: once triggered, the victim model inserts one or more malicious reasoning steps into its chain-of-thought (CoT). These attacks are substantially harder to detect, as the backdoored answer remains plausible and consistent with the poisoned reasoning trajectory. Yet, defenses tailored to this type of backdoor remain largely unexplored. To bridge this gap, we propose Critical-CoT, a novel defense mechanism that conducts a two-stage fine-tuning (FT) process on LLMs to develop critical thinking behaviors, enabling them to automatically identify potential backdoors and refuse to generate malicious reasoning steps. Extensive experiments across multiple LLMs and datasets demonstrate that Critical-CoT provides strong robustness against both in-context learning-based and FT-based backdoor attacks. Notably, Critical-CoT exhibits strong cross-domain and cross-task generalization. Our code is available at hthttps://github.com/tuanvu171/Critical-CoT.
要約:
The Self-Sovereign Identity (SSI) paradigm is instrumental for decentralised identity management, allowing an entity to create, manage, and present their digital credentials without relying on centralised authorities. Credential selective disclosure is one of the most attractive privacy-preserving features of SSI, allowing users to reveal only the minimum necessary information from their credentials. However, current selective disclosure mechanisms primarily focus on protecting the privacy of credential Holders, while offering limited protection to the Verifiers of credentials. Indeed, the specific credential information requested by a Verifier can inadvertently reveal to credential Holders sensitive information, including internal decision-making criteria, business rules, or strategic plans. In this work, we address this threat by proposing, to the best of our knowledge, the first approach that enforces mutual privacy in credential exchanges. To this end, we introduce COD-ssi (Claim Oblivious Disclosure for SSI), a novel framework that leverages Oblivious Pseudorandom Functions to allow Verifiers to selectively access a subset of claims without revealing which specific claims were accessed to the credential Holder. The security of our solution is formally verified and its feasibility is assessed through the experimental evaluation of our open-source prototype implementation. These results show that provable mutual privacy in the context of SSI can be achieved with just moderate computational and communication overhead.
要約:
Retrieval-Augmented Generation (RAG) systems augment large language models with external knowledge, yet introduce a critical security vulnerability: RAG Knowledge Base Leakage, wherein adversarial prompts can induce the model to divulge retrieved proprietary content. Recent studies reveal that such leakage can be executed through adaptive and iterative attack strategies (named RAG extraction attack), while effective countermeasures remain notably lacking. To bridge this gap, we propose CanaryRAG, a runtime defense mechanism inspired by stack canaries in software security. CanaryRAG embeds carefully designed canary tokens into retrieved chunks and reformulates RAG extraction defense as a dual-path runtime integrity game. Leakage is detected in real time whenever either the target or oracle path violates its expected canary behavior, including under adaptive suppression and obfuscation. Extensive evaluations against existing attacks demonstrate that CanaryRAG provides robust defense, achieving substantially lower chunk recovery rates than state-of-the-art baselines while imposing negligible impact on task performance and inference latency. Moreover, as a plug-and-play solution, CanaryRAG can be seamlessly integrated into arbitrary RAG pipelines without requiring retraining or structural modifications, offering a practical and scalable safeguard for proprietary data.
要約:
In recent years, many countries have started enacting laws to safeguard privacy of personal data of their citizens collected and maintained by various enterprises through websites, mobile apps, and other means. It is imperative that the privacy policies of these enterprises respect the provisions of the applicable law. In this paper, we show how such organizational privacy policies can be efficiently checked against a prevalent law. Our novel approach named APLiance (\underline{A}BAC framework for \underline{P}olicy-\underline{L}aw Compl\underline{iance}) models the requirements of the different sections of a privacy law in the form of Attribute-based Access Control (ABAC) rules and the clauses of a privacy policy as a sequence of implied access requests. A policy is considered to be compliant with the law if these access requests are permitted by the corresponding ABAC rules. Although APLiance can be used in any policy-law setting, we demonstrate its effectiveness in the context of the recently introduced Digital Personal Data Protection Act of India. A browser plugin has been developed and publicly released for real time compliance checking using APLiance whenever a user visits the privacy policy page of a website.
要約:
Watermarking provides a critical safeguard for large language model (LLM) services by facilitating the detection of LLM-generated text. Correspondingly, stealing watermark algorithms (SWAs) derive watermark information from watermarked texts generated by victim LLMs to craft highly targeted adversarial attacks, which compromise the reliability of watermarks. Existing SWAs rely on fixed strategies, overlooking the non-uniform distribution of stolen watermark information and the dynamic nature of real-world LLM generation processes. To address these limitations, we propose Adaptive Stealing (AS), a novel SWA featuring enhanced design flexibility through Position-Based Seal Construction and Adaptive Selection modules. AS operates by defining multiple attack perspectives derived from distinct activation states of contextually ordered tokens. During attack execution, AS dynamically selects the optimal perspective based on watermark compatibility, generation priority, and dynamic generation relevance. Our experiments demonstrate that AS significantly increases steal efficiency against target watermarks under identical experimental conditions. These findings highlight the need for more robust LLM watermarks to withstand potential attacks. We release our code to the community for future research\footnote{https://github.com/DrankXs/AdaptiveStealingWatermark}.
要約:
Deep neural networks remain highly vulnerable to adversarial perturbations, limiting their reliability in security- and safety-critical applications. To address this challenge, we introduce QShield, a modular hybrid quantum-classical neural network (HQCNN) architecture designed to enhance the adversarial robustness of classical deep learning models. QShield integrates a conventional convolutional neural network (CNN) backbone for feature extraction with a quantum processing module that encodes the extracted features into quantum states, applies structured entanglement operations under realistic noise models, and outputs a hybrid prediction through a dynamically weighted fusion mechanism implemented via a lightweight multilayer perceptron (MLP). We systematically evaluate both classical and hybrid quantum-classical models on the MNIST, OrganAMNIST, and CIFAR-10 datasets, using a comprehensive set of robustness, efficiency, and computational performance metrics. Our results demonstrate that classical models are highly vulnerable to adversarial attacks, whereas the proposed hybrid models with entanglement patterns maintain high predictive accuracy while substantially reducing attack success rates across a wide range of adversarial attacks. Furthermore, the proposed hybrid architecture significantly increased the computational cost required to generate adversarial examples, thereby introducing an additional layer of defense. These findings indicate that the proposed modular hybrid architecture achieves a practical balance between predictive accuracy and adversarial robustness, positioning it as a promising approach for secure and reliable machine learning in sensitive and safety-critical applications.
要約:
Existing methods for detection rule generation are tightly coupled to specific input-output combinations, requiring dedicated pipelines for each. We formalize this problem as a unified mapping f:C*L->R and characterize optimal rules through semantic distance. We propose UniRule, an agentic RAG framework built on dual semantic projection spaces: detection intent and detection logic. This design enables retrieval and generation across arbitrary contexts and target languages within a single system. Experiments across 12 scenarios (3 languages, 4 context types, 12,000 pairwise comparisons) show that UniRule significantly outperforms pure LLM generation with a Bradley-Terry coefficient of 0.52, validating semantic projection as an effective abstraction for unified rule generation. Together, the formalization, method, and evaluation provide an initial framework for studying detection rule generation as a unified task.
要約:
Over the years, many techniques have been introduced to protect integrated circuits (ICs) from hardware security threats that emerged in the globalized IC manufacturing supply chain, such as overproduction and piracy. However, most of these techniques have been rendered inefficient since they do not rely on provably secure algorithms. Moreover, the previously proposed techniques using cryptography algorithms lead to a significant increase in hardware complexity and are vulnerable to the removal and power analysis attacks. In this paper, we propose a compound IC protection mechanism that uses a lightweight cryptography algorithm with prominent logic locking and hardware obfuscation techniques. Experimental results show that the secure designs generated by the developed tool have significantly less hardware complexity when compared to those generated by previously proposed techniques using cryptography algorithms and are resilient to existing removal, algebraic, and logic locking attacks.
要約:
Large Language Models (LLMs) face prominent security risks from jailbreaking, a practice that manipulates models to bypass built-in security constraints and generate unethical or unsafe content. Among various jailbreak techniques, multi-turn jailbreak attacks are more covert and persistent than single-turn counterparts, exposing critical vulnerabilities of LLMs.
However, existing multi-turn jailbreak methods suffer from two fundamental limitations that affect the actual impact in real-world scenarios: (a) As models become more context-aware, any explicit harmful trigger is increasingly likely to be flagged and blocked; (b) Successful final-step triggers often require finely tuned, model-specific contexts, making such attacks highly context-dependent. To fill this gap, we propose \textit{Salami Slicing Risk}, which operates by chaining numerous low-risk inputs that individually evade alignment thresholds but cumulatively accumulate harmful intent to ultimately trigger high-risk behaviors, without heavy reliance on pre-designed contextual structures. Building on this risk, we develop Salami Attack, an automatic framework universally applicable to multiple model types and modalities.
Rigorous experiments demonstrate its state-of-the-art performance across diverse models and modalities, achieving over 90\% Attack Success Rate on GPT-4o and Gemini, as well as robustness against real-world alignment defenses. We also proposed a defense strategy to constrain the Salami Attack by at least 44.8\% while achieving a maximum blocking rate of 64.8\% against other multi-turn jailbreak attacks. Our findings provide critical insights into the pervasive risks of multi-turn jailbreaking and offer actionable mitigation strategies to enhance LLM security.
要約:
IoT botnet detection has advanced, yet most published systems are validated on a single dataset and rarely generalise across environments. Heterogeneous feature spaces make multi-dataset training practically impossible without discarding semantic interpretability or introducing data integrity violations. No prior work has addressed both problems with a formally specified, reproducible methodology. This paper does. We introduce BRIDGE (Benchmark Reference for IoT Domain Generalisation Evaluation), the first formally specified heterogeneous multi-dataset benchmark for IoT intrusion detection, unifying CICIDS-2017, CIC-IoT-2023, Bot-IoT, Edge-IIoTset, and N-BaIoT through a 46-feature semantic canonical vocabulary grounded in CICFlowMeter nomenclature, with genuine-equivalence-only feature mapping, explicit zero-filling, and per-dataset coverage from 15% to 93%. A leave-one-dataset-out (LODO) protocol makes the generalisation gap precisely measurable: all five evaluated architectures achieve mean LODO F1 between 0.39 and 0.47, and we establish the first community generalisation baseline at mean LODO F1 = 0.5577, a result that shifts the agenda from single-benchmark optimisation toward cross-environment generalisation. We propose TCH-Net, a multi-branch network fusing a three-path Temporal branch (residual convolutional-BiGRU, stride-downsampled BiGRU, pre-LayerNorm Transformer), a provenance-conditioned Contextual branch, and a Statistical branch via Cross-Branch Gated Attention Fusion (CB-GAF) with learnable sigmoid gates for dynamic feature-wise mixing. Across five random seeds, TCH-Net achieves F1 = 0.8296 +/- 0.0028, AUC = 0.9380 +/- 0.0025, and MCC = 0.6972 +/- 0.0056, outperforming all twelve baselines (p < 0.05, Wilcoxon) and recording the highest LODO F1 overall. BRIDGE and the full pipeline are at https://github.com/Ammar-ss/TCH-Net.
要約:
Embedding-as-a-Service (EaaS) has become an important semantic infrastructure for natural language and multimedia applications, but it is highly vulnerable to model stealing and copyright infringement. Existing EaaS watermarking methods face a fundamental robustness--utility--verifiability tension: trigger-based methods are fragile to paraphrasing, transformation-based methods are sensitive to dimensional perturbation, and region-based methods may incur false positives due to coincidental geometric affinity.
To address this problem, we propose GeoMark, a geometry-aware localized watermarking framework for EaaS copyright protection. GeoMark uses a natural in-manifold embedding as a shared watermark target, constructs geometry-separated anchors with explicit target--anchor margins, and activates watermark injection only within adaptive local neighborhoods. This design decouples where watermarking is triggered from what ownership is attributed to, achieving localized triggering and centralized attribution.
Experiments on four benchmark datasets show that GeoMark preserves downstream utility and geometric fidelity while maintaining robust copyright verification under paraphrasing, dimensional perturbation, and CSE (Clustering, Selection, Elimination) attacks, with improved verification stability and low false-positive risk.
要約:
We consider threshold secret sharing schemes based on cellular automata (CA) that allows for anonymous reconstruction, meaning that the secret can be recovered only as a function of the shares, without knowing the participants' identities. To this end, we revisit the basic characterization of $(2,n)$ threshold schemes based on CA in terms of Mutually Orthogonal Latin Squares (MOLS), and redefine the secret space as the MOLS family itself, showing that the new resulting scheme enables anonymous reconstruction of secret CA rules. Finally, we discuss the trade-off between the number of secret CA that can be shared and the computational complexity of the recovery phase.
要約:
Security operations in smart cities demand detection systems that balance accuracy with response time. While ensemble methods like Random Forest achieve high accuracy, their computational overhead impedes real-time forensic triage. We present the first systematic evaluation of TabPFNv2.5, a transformer-based foundation model, against traditional ensemble classifiers for IoT intrusion detection. Using the TON IoT dataset, we demonstrate that TabPFNv2.5 achieves 40 faster inference than Random Forest while maintaining 97% binary classification accuracy. We propose a hybrid pipeline in which TabPFNv2.5 performs rapid threat screening, while ensemble models handle detailed classification. Our analysis reveals that scanning attacks remain the hardest to detect (F1: 69.8%) and cross-device generalization depends critically on feature similarity. These findings establish foundation models as viable components for time-sensitive IoT security operations
要約:
SMS Phishing (also known as 'smishing') is a growing deceptive social engineering (SE) attack that leverages mobile SMS to conduct cybercrimes such as stealing sensitive information or spreading malware by tricking users into interacting with attackers' messages (e.g., responding to or clicking URLs). This threat has increased rapidly in recent years, causing $470M in financial losses for United States users in 2024 alone. This threat is also evolving rapidly, meaning that attackers continually adapt their tactics, reshaping the landscape. There is a significant body of literature on investigating smishing attacks and defenses. However, there is no systematic review that reflects the current attack and defense landscape along with available resources (i.e., relevant datasets). This motivates us to systematize the current smishing research efforts, including the following four research pillars: (a) user perception and susceptibility, (b) attack characterization, (c) defense landscape, and (d) smishing datasets. This leads us to propose novel future research directions towards effectively mitigating smishing attacks.
要約:
AI agents that pay for resources via the x402 protocol embed payment metadata - resource URLs, descriptions, and reason strings - in every HTTP payment request. This metadata is transmitted to the payment server and to the centralised facilitator API before any on-chain settlement occurs; neither party is typically bound by a data processing agreement. We present presidio-hardened-x402, the first open-source middleware that intercepts x402 payment requests before transmission to detect and redact personally identifiable information (PII), enforce declarative spending policies, and block duplicate replay attempts. To evaluate the PII filter, we construct a labeled synthetic corpus of 2,000 x402 metadata triples spanning seven use-case categories, and run a 42-configuration precision/recall sweep across two detection modes (regex, NLP) and five confidence thresholds. The recommended configuration (mode=nlp, min_score=0.4, all entity types) achieves micro-F1 = 0.894 with precision 0.972, at a p99 latency of 5.73ms - well within the 50ms overhead budget. The middleware, corpus, and all experiment code are publicly available at https://github.com/presidio-v/presidio-hardened-x402.
要約:
The application of Machine Learning techniques in code generation is now a common practice for most developers. Tools such as ChatGPT from OpenAI leverage the natural language processing capabilities of Large Language Models to generate machine code from natural language descriptions. In the cybersecurity field, red teams can also take advantage of generative models to build malicious code generators, providing more automation to Pentest audits. However, the application of Large Language Models in malicious code generation remains challenging due to the lack of data to train and evaluate offensive code generators. In this work, we propose RedShell, a tool that allows ethical hackers to generate malicious PowerShell code. We also introduce a ground truth dataset, combining publicly available code samples to fine-tune models in malicious PowerShell generation. Our experiments demonstrate the strong capabilities of RedShell in generating syntactically valid PowerShell, with fewer than 10% of the generated samples resulting in parse errors. Furthermore, our specialized model was able to produce samples that were semantically consistent with reference snippets, achieving a competitive performance on standard output similarity metrics such as Edit Distance and METEOR, with their mean similarity scores exceeding 50% and 40%, respectively. This work sheds light on the state-of-the-art research in the field of Generative AI applied to Pentesting, and also serves as a steppingstone for future advancements, highlighting the potential benefits these models hold within such controlled environments.
要約:
Traditionally, industrial control systems (ICS) were designed without security in mind, prioritizing availability and real-time communication. As these systems increasingly become targets of powerful adversaries, security can no longer be neglected. Driven by flexibility and automation needs, ICS are transitioning from wired to 5G communication, introducing new attack surfaces and a less reliable communication medium, thereby exacerbating existing security challenges. Given their critical role in society, a comprehensive evaluation of their security is imperative. To this end, we introduce SWICS, a fully virtual testbed simulating an ICS in a realistic 5G environment, and study how this transition affects security under varying channel conditions. Our results show three key findings: under optimal channel conditions, industrial 5G networks can achieve resilience comparable to wired systems, while degraded channel conditions can amplify traditional attacks, threaten system stability, and undermine detection mechanisms based on predictable traffic patterns. We further demonstrate the inherent limits of securing 5G channels for ICS through eavesdropping and jamming on the open-air interface. Our work highlights the interplay between security and 5G channel conditions, showing that traditional security controls may no longer be sufficient and motivating further research.
要約:
Large language model (LLM) watermarking has emerged as a promising approach for detecting and attributing AI-generated text, yet its robustness to black-box spoofing remains insufficiently evaluated. Existing evaluation methods often demand extensive datasets and white-box access to algorithmic internals, limiting their practical applicability. In this paper, we study watermark resilience against spoofing fundamentally from a distributional perspective. We first establish a \textit{local capacity bottleneck}, which theoretically characterizes the probability mass that can be reallocated under KL-bounded local updates while preserving semantic fidelity. Building on this, we propose RLSpoofer, a reinforcement learning-based black-box spoofing attack that requires only 100 human-watermarked paraphrase training pairs and zero access to the watermarking internals or detectors. Despite weak supervision, it empowers a 4B model to achieve a 62.0\% spoof success rate with minimal semantic shift on PF-marked texts, dwarfing the 6\% of baseline models trained on up to 10,000 samples. Our findings expose the fragile spoofing resistance of current LLM watermarking paradigms, providing a lightweight evaluation framework and stressing the urgent need for more robust schemes.
著者: Lara D'Agata, Carlos Agull\'o-Domingo, \'Oscar Vera-L\'opez, Kaustubh Shivdikar, Ardhi W. B. Yudha, Ferhat Yaman, David Kaeli, Jos\'e L. Abell\'an, Ian Colbert, Jos\'e Cano
要約:
Fully homomorphic encryption (FHE) has recently attracted significant attention as both a cryptographic primitive and a systems challenge. Given the latest advances in accelerated computing, FHE presents a promising opportunity for progress, with applications ranging from machine learning to information security. We target the most computationally intensive operation in deep neural networks from a hardware perspective, matrix multiplication (matmul), and adapt it for execution on AMD GPUs. We propose a new optimized method that improves the runtime and complexity of ciphertext matmul by using FIDESlib, a recent open-source FHE library designed specifically for GPUs. By exploiting sparsity in both operands, our sparse matmul implementation outperforms its CPU counterpart by up to $3.0\times$ and reduces the time complexity from cubic to semi-linear, demonstrating an improvement over existing FHE matmul implementations.
要約:
From production to consumption, ensuring food quality and traceability depends on reliable monitoring of environmental conditions across the supply chain. Ambient sensing devices can collect relevant data such as temperature and humidity, but ensuring its integrity among stakeholders remains a challenge. This work presents AmBox, a system that enables device-to-blockchain ambient sensing for food traceability. AmBox connects sensors to a blockchain, ensuring secure, verifiable, and tamper-resistant data collection with minimal intermediaries. It manages sensor commissioning and operation with the adequate business context. AmBox can operate with standalone nodes or within a distributed node-mote architecture, allowing flexible deployment at different points along the supply chain. A prototype using Raspberry Pi and ESP32 hardware can record sensor data directly on Hyperledger Fabric. Experimental results show that AmBox provides timely and reliable data that can increase transparency and trust between the supply chain stakeholders.
要約:
Smishing (SMS phishing) has become a serious cybersecurity threat, especially for elderly and cyber-unaware individuals, causing financial loss and undermining user trust. Although prior work has focused on detecting smishing at the level of individual messages, real-world attackers often rely on multi-stage social engineering, gradually manipulating victims through extended conversations before attempting to steal sensitive information. Despite the existence of several datasets for single-message smishing detection, datasets capturing conversational smishing remain largely unavailable, limiting research on multi-turn attack detection. To address this gap, this paper presents a synthetically generated dataset of 3,201 labeled multi-round conversations designed to emulate realistic conversational smishing attacks. The dataset reflects diverse attacker strategies and victim responses across multiple stages of interaction. Using this dataset, we establish baseline performance by evaluating eight models, including traditional machine learning approaches (Logistic Regression, Random Forest, Linear SVM, and XGBoost) and transformer-based architectures (DistilBERT and Longformer), with both engineered conversational features and TF-IDF text representations. Experimental results show that TF-IDF-based models consistently outperform those using engineered features alone. The best-performing model, XGBoost with TF-IDF features, achieves 72.5% accuracy and a macro F1 score of 0.691, surpassing both transformer models. Our analysis suggests that transformer performance is limited primarily by input-length constraints and the relatively small size of the training data. Overall, the results highlight the value of lexical signals in conversational smishing detection and demonstrate the usefulness of the proposed dataset for advancing research on defenses against multi-turn social engineering attacks.
要約:
Large Language Models (LLMs) are redefining offensive cybersecurity by allowing the generation of harmful machine code with minimal human intervention. While attackers take advantage of dark LLMs such as XXXGPT and WolfGPT to produce malicious code, ethical hackers can follow similar approaches to automate traditional pentesting workflows. In this work, we present RedShell, a privacy-preserving, hardware-efficient framework that leverages fine-tuned LLMs to assist pentesters in generating offensive PowerShell code targeting Microsoft Windows vulnerabilities. RedShell was trained on a malicious PowerShell dataset from the literature, which we further enhanced with manually curated code samples. Experiments show that our framework achieves over 90% syntactic validity in generated samples and strong semantic alignment with reference pentesting snippets, outperforming state-of-the-art counterparts in distance metrics such as edit distance (above 50% average code similarity). Additionally, functional experiments emphasize the execution reliability of the snippets produced by RedShell in a testing scenario that mirrors real-world settings. This work sheds light on the state-of-the-art research in the field of Generative AI applied to malicious code generation and automated testing, acknowledging the potential benefits that LLMs hold within controlled environments such as pentesting.
要約:
Tool-augmented Large Language Model (LLM) agents have demonstrated impressive capabilities in automating complex, multi-step real-world tasks, yet remain vulnerable to indirect prompt injection. Adversaries exploit this weakness by embedding malicious instructions within tool-returned content, which agents directly incorporate into their conversation history as trusted observations. This vulnerability manifests across three primary attack channels: web and local content injection, MCP server injection, and skill file injection. To address these vulnerabilities, we introduce \textsc{ClawGuard}, a novel runtime security framework that enforces a user-confirmed rule set at every tool-call boundary, transforming unreliable alignment-dependent defense into a deterministic, auditable mechanism that intercepts adversarial tool calls before any real-world effect is produced. By automatically deriving task-specific access constraints from the user's stated objective prior to any external tool invocation, \textsc{ClawGuard} blocks all three injection pathways without model modification or infrastructure change. Experiments across five state-of-the-art language models on AgentDojo, SkillInject, and MCPSafeBench demonstrate that \textsc{ClawGuard} achieves robust protection against indirect prompt injection without compromising agent utility. This work establishes deterministic tool-call boundary enforcement as an effective defense mechanism for secure agentic AI systems, requiring neither safety-specific fine-tuning nor architectural modification. Code is publicly available at https://github.com/Claw-Guard/ClawGuard.
要約:
Smart-home users increasingly want to control their homes in natural language rather than assemble rules, dashboards, and API integrations by hand. At the same time, real deployments are brittle: devices fail, integrations break, and recoveries often require manual intervention. Existing agent toolkits are effective for session-scoped delegation, but smart-home control operates under a different scenario: it is persistent, event-driven, failure-prone, and tied to physical devices with no shared context window. We present HearthNet, an edge multi-agent orchestration system for smart homes. HearthNet deploys a small set of persistent, role-specialized LLM agents at the home hub, where they coordinate through MQTT, Git-backed shared state, and root-issued actuation leases to govern heterogeneous devices through thin adapters. This design externalizes context, preserves execution history, and separates planning, verification, authorization, and actuation across explicit boundaries. Our current prototype runs on commodity edge hardware and Android devices; it keeps orchestration, state management, and device control on-premise while using hosted LLM APIs for inference. We demonstrate the system through three live scenarios: intent-driven multi-agent coordination from ambiguous natural language, conflict resolution with timeline-based tracing, and rejection of stale or unauthorized commands before device actuation.
要約:
In this report we flesh out a sketch by Krachun and Kazanin to prove that for a certain family of Reed-Solomon codes, proximity gaps fail at radii that are $O(1/\log n)$ below the capacity rate of the code, where $n$ is the length of the code.
要約:
Encrypted cloning enables the redundant storage of an unknown qubit while remaining compatible with the no-cloning theorem, since only one clone can later be recovered through key-consuming decryption. Because encryption in this protocol is introduced to enable cloning-compatible redundancy rather than to guarantee confidentiality by design, its secrecy properties must be assessed explicitly. Here we classify the subsets of the encrypted-clone storage register into authorized, completely non-informative, and partially informative sets. We show that intermediate non-authorized subsets may retain only a restricted residual dependence on the input state, and we characterize exactly when this dependence occurs. The resulting leakage pattern is parity-dependent, revealing a structural confidentiality limitation of encrypted cloning.
要約:
ERC-4337, the Ethereum account abstraction standard, simplifies account management and transaction fee payment in decentralized applications by introducing programmable smart contract wallets and gas sponsorship via paymasters. However, its heavy reliance on on-chain validation and frequent state updates incurs substantial gas overhead, leading to performance bottlenecks and limiting scalability in large-scale deployments. To mitigate these issues, we propose GasLiteAA, a framework that optimize ERC-4337 by offloading paymaster logic to Trusted Execution Environments (TEE). GasLiteAA delegates the secure execution of stateful gas sponsorship logic and user quota management to TEE, enforcing validation rules off-chain while anchoring their integrity on-chain via lightweight cryptographic attestations. This verifiable offloading architecture significantly reduces on-chain computation and storage costs without sacrificing verifiability or decentralization. Experimental results demonstrate that GasLiteAA substantially lowers transaction fees, while remaining fully compatible with Ethereum Layer 1. By balancing security, efficiency, and deployability, GasLiteAA provides a practical and scalable approach to gas sponsorship for account-abstraction-based decentralized applications.
要約:
The rapid growth of generative AI has introduced new challenges in content moderation and digital forensics. In particular, benign AI-generated images can be paired with harmful or misleading text, creating difficult-to-detect misuse. This contextual misuse undermines the traditional moderation framework and complicates attribution, as synthetic images typically lack persistent metadata or device signatures. We introduce a steganography enabled attribution framework that embeds cryptographically signed identifiers into images at creation time and uses multimodal harmful content detection as a trigger for attribution verification. Our system evaluates five watermarking methods across spatial, frequency, and wavelet domains. It also integrates a CLIP-based fusion model for multimodal harmful-content detection. Experiments demonstrate that spread-spectrum watermarking, especially in the wavelet domain, provides strong robustness under blur distortions, and our multimodal fusion detector achieves an AUC-ROC of 0.99, enabling reliable cross-modal attribution verification. These components form an end-to-end forensic pipeline that enables reliable tracing of harmful deployments of AI-generated imagery, supporting accountability in modern synthetic media environments. Our code is available at GitHub: https://github.com/bli1/steganography
要約:
Learned classifiers deployed in agentic pipelines face a fundamental reliability problem: predictions are probabilistic inferences, not verified conclusions, and acting on them without grounding in observable evidence leads to compounding failures across downstream stages. Software vulnerability analysis makes this cost concrete and measurable. We address this through a unified cross-language vulnerability lifecycle framework built around three LLM-driven reasoning stages-hybrid structural-semantic detection, execution-grounded agentic validation, and validation-aware iterative repair-governed by a strict invariant: no repair action is taken without execution-based confirmation of exploitability. Cross-language generalization is achieved via a Universal Abstract Syntax Tree (uAST) normalizing Java, Python, and C++ into a shared structural schema, combined with a hybrid fusion of GraphSAGE and Qwen2.5-Coder-1.5B embeddings through learned two-way gating, whose per-sample weights provide intrinsic explainability at no additional cost. The framework achieves 89.84-92.02% intra-language detection accuracy and 74.43-80.12% zero-shot cross-language F1, resolving 69.74% of vulnerabilities end-to-end at a 12.27% total failure rate. Ablations establish necessity: removing uAST degrades cross-language F1 by 23.42%, while disabling validation increases unnecessary repairs by 131.7%. These results demonstrate that execution-grounded closed-loop reasoning is a principled and practically deployable mechanism for trustworthy LLM-driven agentic AI.
要約:
Differential privacy is a mathematical notion of data privacy that has fast become the de facto standard in privacy-preserving data analysis. Recently a lot of work has focused on differential privacy in the quantum setting. Continuing on this line of study, we investigate how to answer counting queries on a quantum encoded dataset with differential privacy. An example of a counting query is ``How many people in the dataset are over the age of 25 and with a university education?'' Counting queries form the most basic but nonetheless rich set of statistics extractable from a dataset. We show that answering these queries on a quantum encoded dataset reduces to measuring the amplitude of one of two orthogonal states. We then analyze the differential privacy properties of two algorithms from literature to measure amplitude: one which performs repeated measurements in the computational basis, and the other which utilizes the classic amplitude estimation algorithm. For the first technique, we prove privacy results for the case of counting queries that improve on previously known results on general queries, and show that the mechanism in fact \emph{amplifies} privacy due to inherent randomness. For the second method, we derive a tight bound on maximum possible change in the amplitude if we add or remove a single item in the dataset, a quantity called global sensitivity which is central in making an algorithm differentially private. We then show a differentially private version of the amplitude estimation algorithm for counting queries. We also discuss how these methods can be outsourced to a quantum server to blindly compute counting queries with differential privacy.
要約:
Although LLMs drive automation, it is critical to ensure immense consideration for high-stakes enterprise workflows such as those involving legal matters, risk management, and privacy compliance. For Meta, and other organizations like ours, a single hallucinated clause in such high stakes workflows risks material consequences. We show that by framing hallucination mitigation as a Minimum Bayes Risk (MBR) problem, we can dramatically reduce this risk. Specifically, we introduce a Hybrid Utility MBR (HUMBR) framework that synthesizes semantic embedding similarity with lexical precision to identify consensus without ground-truth references, for which we derive rigorous error bounds. We complement this theoretical analysis with a comprehensive empirical evaluation on widely-used public benchmark suites (TruthfulQA and LegalBench) and also real world data from Meta production deployment. The results from our empirical study show that MBR significantly outperforms standard Universal Self-Consistency. Notably, 81% of the pipeline's suggestions were preferred over human-crafted ground truth, and critical recall failures were virtually eliminated.
要約:
Mobile GUI agents powered by Multimodal Large Language Models (MLLMs) can execute complex tasks on mobile devices. Despite this progress, most existing systems still optimize task success or efficiency, neglecting users' privacy personalization. In this paper, we study the often-overlooked problem of agent personalization. We observe that personalization can induce systematic structural heterogeneity in execution trajectories. For example, privacy-first users often prefer protective actions, e.g., refusing permissions, logging out, and minimizing exposure, leading to logically different execution trajectories from utility-first users. Such variable-length and structurally different trajectories make standard preference optimization unstable and less informative. To address this issue, we propose Trajectory Induced Preference Optimization (TIPO), which uses preference-intensity weighting to emphasize key privacy-related steps and padding gating to suppress alignment noise. Results on our Privacy Preference Dataset show that TIPO improves persona alignment and distinction while preserving strong task executability, achieving 65.60% SR, 46.22 Compliance, and 66.67% PD, outperforming existing optimization methods across various GUI tasks. The code and dataset will be publicly released at https://github.com/Zhixin-L/TIPO.
要約:
Cyber threat intelligence (CTI) analysts must answer complex questions over large collections of narrative security reports. Retrieval-augmented generation (RAG) systems help language models access external knowledge, but traditional vector retrieval often struggles with queries that require reasoning over relationships between entities such as threat actors, malware, and vulnerabilities. This limitation arises because relevant evidence is often distributed across multiple text fragments and documents. Knowledge graphs address this challenge by enabling structured multi-hop reasoning through explicit representations of entities and relationships. However, multiple retrieval paradigms, including graph-based, agentic, and hybrid approaches, have emerged with different assumptions and failure modes. It remains unclear how these approaches compare in realistic CTI settings and when graph grounding improves performance. We present a systematic evaluation of four RAG architectures for CTI analysis: standard vector retrieval, graph-based retrieval over a CTI knowledge graph, an agentic variant that repairs failed graph queries, and a hybrid approach combining graph queries with text retrieval. We evaluate these systems on 3,300 CTI question-answer pairs spanning factual lookups, multi-hop relational queries, analyst-style synthesis questions, and unanswerable cases. Results show that graph grounding improves performance on structured factual queries. The hybrid graph-text approach improves answer quality by up to 35 percent on multi-hop questions compared to vector RAG, while maintaining more reliable performance than graph-only systems.
要約:
The proliferation of autoregressive (AR) image generators demands reliable detection and attribution of their outputs to mitigate misinformation, and to filter synthetic images from training data to prevent model collapse. To address this need, watermarking techniques, specifically designed for AR models, embed a subtle signal at generation time, enabling downstream verification through a corresponding watermark detector. In this work, we study these schemes and demonstrate their vulnerability to both watermark removal and forgery attacks. We assess existing attacks and further introduce three new attacks: (i) a vector-quantized regeneration removal attack, (ii) adversarial optimization-based attack, and (iii) a frequency injection attack. Our evaluation reveals that removal and forgery attacks can be effective with access to a single watermarked reference image and without access to original model parameters or watermarking secrets. Our findings indicate that existing watermarking schemes for AR image generation do not reliably support synthetic content detection for dataset filtering. Moreover, they enable Watermark Mimicry, whereby authentic images can be manipulated to imitate a generator's watermark and trigger false detection to prevent their inclusion in future model training.
要約:
A novel form of inference attack in vertical federated learning (VFL) is proposed, where two parties collaborate in training a machine learning (ML) model. Logistic regression is considered for the VFL model. One party, referred to as the active party, possesses the ground truth labels of the samples in the training phase, while the other, referred to as the passive party, only shares a separate set of features corresponding to these samples. It is shown that the active party can carry out inference attacks on both training and prediction phase samples by acquiring an ML model independently trained on the training samples available to them. This type of inference attack does not require the active party to be aware of the score of a specific sample, hence it is referred to as an agnostic inference attack. It is shown that utilizing the observed confidence scores during the prediction phase, before the time of the attack, can improve the performance of the active party's autonomous ML model, and thus improve the quality of the agnostic inference attack. As a countermeasure, privacy-preserving schemes (PPSs) are proposed. While the proposed schemes preserve the utility of the VFL model, they systematically distort the VFL parameters corresponding to the passive party's features. The level of the distortion imposed on the passive party's parameters is adjustable, giving rise to a trade-off between privacy of the passive party and interpretabiliy of the VFL outcomes by the active party. The distortion level of the passive party's parameters could be chosen carefully according to the privacy and interpretabiliy concerns of the passive and active parties, respectively, with the hope of keeping both parties (partially) satisfied. Finally, experimental results demonstrate the effectiveness of the proposed attack and the PPSs.
要約:
Continuously evolving cyber-attacks against industrial networks reduce the effectiveness of signature-based detection methods. Once malware has infiltrated a network (for example, entering via an unsecured device), it can infect further network nodes and carry out malicious activity. Infected nodes can exhibit unusual behaviour in their use of Address Resolution Protocol (ARP) calls within the network. In order to detect such anomalous nodes, we propose a two-stage method: (i) modelling of ARP call behaviour via hierarchical time series prediction methods, and (ii) exploiting Extreme Value Theory (EVT) to robustly detect whether deviations from expected behaviour are anomalous. EVT is able to handle heavy-tailed distributions which are exhibited by internet traffic. Empirical evaluations on a real-life dataset containing over 10M ARP calls from 362 nodes show that the proposed method results in considerably reduced number of false positives, addressing the problem of alert fatigue commonly reported by security professionals.
要約:
To investigate the effectiveness of the model explanation in detecting adversarial examples, we reproduce the results of two papers, Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples and Is AmI (Attacks Meet Interpretability) Robust to Adversarial Examples. And then conduct experiments and case studies to identify the limitations of both works. We find that Attacks Meet Interpretability(AmI) is highly dependent on the selection of hyperparameters. Therefore, with a different hyperparameter choice, AmI is still able to detect Nicholas Carlini's attack. Finally, we propose recommendations for future work on the evaluation of defense techniques such as AmI.
要約:
This paper presents TetraBFT, a novel unauthenticated Byzantine fault tolerant protocol for solving consensus in partial synchrony, eliminating the need for public key cryptography and ensuring resilience against computationally unbounded adversaries. TetraBFT has several compelling features: it necessitates only constant local storage, has optimal communication complexity, satisfies optimistic responsiveness -- allowing the protocol to operate at actual network speeds under ideal conditions -- and can achieve consensus in just 5 message delays, which outperforms all known unauthenticated protocols achieving the other properties listed. We validate the correctness of TetraBFT through rigorous security analysis and formal verification. Furthermore, we extend TetraBFT into a multi-shot, chained consensus protocol, making a pioneering effort in applying pipelining techniques to unauthenticated protocols. This positions TetraBFT as a practical and deployable solution for blockchain systems aiming for high efficiency.
要約:
Since the introduction of the Kolmogorov complexity of binary sequences in the 1960s, there have been significant advancements in the topic of complexity measures for randomness assessment, which are of fundamental importance in theoretical computer science and of practical interest in cryptography. This survey reviews notable research from the past four decades on the linear, quadratic and maximum-order complexities of pseudo-random sequences and their relations with Lempel-Ziv complexity, expansion complexity, 2-adic complexity, and correlation measures.
要約:
Zero-knowledge proofs (ZKPs) enable computational integrity and privacy by allowing one party to prove the truth of a statement without revealing underlying data. Compared with alternatives such as homomorphic encryption and secure multiparty computation, ZKPs offer distinct advantages in universality and minimal trust assumptions, with applications spanning blockchain systems and confidential verification of computational tasks. This survey provides a technical overview of ZKPs with a focus on an increasingly relevant subset called zkSNARKs. Unlike prior surveys emphasizing algorithmic and theoretical aspects, we take a broader view of practical deployments and recent use cases across multiple domains including blockchain privacy, scaling, storage, and interoperability, as well as non-blockchain applications such as voting, authentication, timelocks, and machine learning. To support consistent comparison, we provide (i) a taxonomy of application areas, (ii) evaluation criteria including proof size, prover and verifier time, memory, and setup assumptions, and (iii) comparative tables summarizing key tradeoffs and representative systems. The survey also covers supporting infrastructure, including zero-knowledge virtual machines, domain-specific languages, libraries, and frameworks. While emphasizing zkSNARKs for their prevalence in deployed systems, we compare them with zkSTARKs and Bulletproofs to clarify transparency and performance tradeoffs. We conclude with future research and application directions.
要約:
Decentralized identity systems promise user-controlled identifiers and cross-domain verification without a shared identity provider, yet authentication still reduces to possession of keys or credentials once secrets are leaked, reused, or replayed. We present BioZero, a privacy-preserving biometric authentication protocol for decentralized identity that binds an enrolled identity to a biometric witness without revealing biometric templates, while enabling publicly verifiable on-chain decisions. BioZero combines Pedersen commitment-homomorphic computation, consistency spot-checks, and Groth16 zero-knowledge proofs to achieve identity-bound authentication with succinct on-chain verification. We analyze acceptance soundness, freshness, template privacy, and non-malleability under an open decentralized threat model including replay, timing, brute-force, oracle, and forgery attacks. On an Ethereum testbed, BioZero achieves up to 67.8x lower network-adjusted total authentication latency and up to 266.4x faster client-side proving than a zk-SNARK-only baseline. Verification stays in the millisecond range (28.8-41.2 ms vs. 35.4-77.6 ms). With lambda=1 spot-checking, gas grows from 336,778 to 954,066 as N increases from 2 to 128, becomes lower than the baseline from N>=16, and is 2.59x lower at N=128. LFW experiments on 128D and 512D models show accuracy loss below 1% across practical quantization ranges. These results indicate that BioZero is a practical authentication layer for decentralized biometric identity systems.
要約:
The proliferation of real-world health data enables multi-institutional survival studies, yet privacy constraints preclude centralizing sensitive records. We present a privacy-preserving federated Kaplan--Meier framework based on threshold CKKS (Cheon-Kim-Kim-Song) homomorphic encryption that supports approximate floating-point computation and encrypted aggregation of per-time-point counts while exposing only public outputs. Sites compute aligned at-risk and event tallies on a shared time grid and encrypt compact vectors; a coordinator aggregates ciphertexts; and a decryptor committee produces partial shares fused per block to recover aggregated plaintexts without releasing per-time-point tables. We prove correctness, stability, and slot-optimal vector packing, and derive scaling laws showing that communication grows linearly with the number of sites and predictably with the number of time points. Empirically, using synthetic breast-cancer data (N=60,000) distributed across 500 sites, encrypted federated curves match the pooled oracle to numerical precision. In contrast, plaintext protocols permit trivial reconstruction by subtraction; our threshold-gated design precludes this attack under the stated threat model, enabling high-fidelity survival estimation with predictable overhead and substantially reduced privacy risk.
要約:
With the rapid adoption of large language models (LLMs), conversational AI agents have become widely deployed across real-world applications. To enhance safety, these agents are often equipped with guardrails that moderate harmful content. Identifying the guardrails in an agent thus becomes critical for adversaries to understand the system and design guard-specific attacks. In this work, we introduce AP-Test, a novel approach that leverages guard-specific adversarial prompts to detect the identity of guardrails deployed in black-box AI agents. Our method addresses key challenges in this task, including the influence of safety-aligned LLMs and other guardrails, as well as a lack of principled decision-making strategies. AP-Test employs two complementary testing strategies, input and output guard tests, and a new metric, match score, to enable robust identification. Experiments across diverse agents and four open-source guardrails demonstrate that AP-Test achieves perfect classification accuracy in multiple scenarios. Ablation studies further highlight the necessity of our proposed components. Our findings reveal a practical path toward guardrail identification in real-world AI systems.
要約:
Backdoor attacks pose a serious threat to deep neural networks (DNNs), allowing adversaries to implant triggers for hidden behaviors in inference. Defending against such vulnerabilities is especially difficult in the post-training setting, since end-users lack training data or prior knowledge of the attacks. Model merging offers a cost-effective defense; however, latest methods like weight averaging (WAG) provide reasonable protection when multiple homologous models are available, but are less effective with fewer models and place heavy demands on defenders. We propose a module-switching defense (MSD) for disrupting backdoor shortcuts. We first validate its theoretical rationale and empirical effectiveness on two-layer networks, showing its capability of achieving higher backdoor divergence than WAG, and preserving utility. For deep models, we evaluate MSD on Transformer and CNN architectures and design an evolutionary algorithm to optimize fusion strategies with selective mechanisms to identify the most effective combinations. Experiments show that MSD achieves stronger defense with fewer models in practical settings, and even under an underexplored case of collusive attacks among multiple models--where some models share the same backdoors--switching strategies by MSD deliver superior robustness against diverse attacks. Code is available at https://github.com/weijun-l/module-switching-defense.
要約:
In this work, we present MoCQ, a neuro-symbolic static analysis framework that leverages large language models (LLMs) to automatically generate vulnerability detection patterns. This approach combines the precision and scalability of pattern-based static analysis with the semantic understanding and automation capabilities of LLMs. MoCQ extracts the domain-specific languages for expressing vulnerability patterns and employs an iterative refinement loop with trace-driven symbolic validation that provides precise feedback for pattern correction. We evaluated MoCQ on 12 vulnerability types across four languages (C/C++, Java, PHP, JavaScript). MoCQ achieves detection performance comparable to expert-developed patterns while requiring only hours of generation versus weeks of manual effort. Notably, MoCQ uncovered 46 new vulnerability patterns that security experts had missed and discovered 25 previously unknown vulnerabilities in real-world applications. MoCQ also outperforms prior approaches with stronger analysis capabilities and broader applicability.
要約:
Perceptual hashing is used to detect whether an input image is similar to a reference image with a variety of security applications. Recently, they have been shown to succumb to adversarial input attacks which make small imperceptible changes to the input image yet the hashing algorithm does not detect its similarity to the original image. Property-preserving hashing (PPH) is a recent construct in cryptography, which preserves some property (predicate) of its inputs in the hash domain. Researchers have so far shown constructions of PPH for Hamming distance predicates, which, for instance, outputs 1 if two inputs are within Hamming distance $t$. A key feature of PPH is its strong correctness guarantee, i.e., the probability that the predicate will not be correctly evaluated in the hash domain is negligible. Motivated by the use case of detecting similar images under adversarial setting, we propose the first PPH construction for an $\ell_1$-distance predicate. Roughly, this predicate checks if the two one-sided $\ell_1$-distances between two images are within a threshold $t$. Since many adversarial attacks use $\ell_2$-distance (related to $\ell_1$-distance) as the objective function to perturb the input image, by appropriately choosing the threshold $t$, we can force the attacker to add considerable noise to evade detection, and hence significantly deteriorate the image quality. Our proposed scheme is highly efficient, and runs in time $O(t^2)$. For grayscale images of size $28 \times 28$, we can evaluate the predicate in $0.0784$ seconds when pixel values are perturbed by up to $1 \%$. For larger RGB images of size $224 \times 224$, by dividing the image into 1,000 blocks, we achieve times of $0.0128$ seconds per block for $1 \%$ change, and up to $0.2641$ seconds per block for $14\%$ change.
要約:
Quantum Computing (QC) threatens the cryptographic foundations of Cloud Computing (CC), exposing distributed infrastructures to novel attack vectors. This survey provides comprehensive analysis of quantum-safe cloud security, examining vulnerabilities, transition strategies, and layer-specific countermeasures across nine architectural layers (application, data, runtime, middleware, OS, virtualization, server, storage, networking). We employ STRIDE-based risk assessment aligned with NIST SP 800-30 to evaluate quantum threats through three transition phases: pre-transition (classical cryptography vulnerabilities), hybrid (migration risks), and post-transition (PQC implementation weaknesses including side-channel attacks). Our security framework integrates hybrid cryptographic strategies (algorithmic combiners, dual/composite certificates, protocol-level migration), cryptographic agility, and risk-prioritized mitigation tailored to cloud environments. We benchmark NIST-standardized PQC algorithms for performance and deployment suitability, assess side-channel and implementation vulnerabilities, and analyze quantum-safe strategies from leading CSPs (AWS, Azure, GCP). The survey delivers layer-specific threat taxonomies, likelihood-impact risk matrices, and CSP-informed deployment roadmaps for cloud architects, policymakers, and researchers. We identify six critical research directions: standardization and interoperability, hardware acceleration and performance optimization, AI-enhanced security and threat mitigation, integration with emerging cloud technologies, systemic preparedness and workforce development, and migration frameworks with crypto-agility.
要約:
Chatbot service providers (e.g., OpenAI) rely on tiered subscription plans to generate revenue, offering black-box access to basic models for free users and advanced models to paying subscribers. However, this approach is unprofitable and inflexible. A pay-to-unlock scheme for premium features (e.g., math, coding) offers a more sustainable alternative. Enabling such a scheme requires a feature-locking technique (FLoTE) that is (i) effective in refusing locked features, (ii) utility-preserving for unlocked features, (iii) robust against evasion or unauthorized credential sharing, and (iv) scalable to multiple features and clients. Existing FLoTEs (e.g., password-locked models) fail to meet these criteria. To fill this gap, we present Locket, the first robust and scalable FLoTE to enable pay-to-unlock schemes. We develop a framework for adversarial training and merging of feature-locking adapters, which enables Locket to selectively disable specific features of a model. Evaluation shows that Locket is effective ($100$% refusal rate), utility-preserving ($\leq 7$% utility degradation), robust ($\leq 5$% attack success rate), and scalable to multiple features and clients.
要約:
Software security testing, particularly when enhanced with deep learning models, has become a powerful approach for improving software quality, enabling faster detection of known flaws in source code. However, many approaches miss post-fix latent vulnerabilities that remain even after patches typically due to incomplete fixes or overlooked issues may later lead to zero-day exploits. In this paper, we propose $HYDRA$, a $Hy$brid heuristic-guided $D$eep $R$epresentation $A$rchitecture for predicting latent zero-day vulnerabilities in patched functions that combines rule-based heuristics with deep representation learning to detect latent risky code patterns that may persist after patches. It integrates static vulnerability rules, GraphCodeBERT embeddings, and a Variational Autoencoder (VAE) to uncover anomalies often missed by symbolic or neural models alone. We evaluate HYDRA in an unsupervised setting on patched functions from three diverse real-world software projects: Chrome, Android, and ImageMagick. Our results show HYDRA predicts 13.7%, 20.6%, and 24% of functions from Chrome, Android, and ImageMagick respectively as containing latent risks, including both heuristic matches and cases without heuristic matches ($None$) that may lead to zero-day vulnerabilities. It outperforms baseline models that rely solely on regex-derived features or their combination with embeddings, uncovering truly risky code variants that largely align with known heuristic patterns. These results demonstrate HYDRA's capability to surface hidden, previously undetected risks, advancing software security validation and supporting proactive zero-day vulnerabilities discovery.
要約:
Current LLM-based frameworks for text anonymization usually rely on remote API services from powerful LLMs, which creates an inherent privacy paradox: users must disclose the raw data to untrusted third parties for guaranteed privacy preservation. Moreover, directly migrating current solutions to local small-scale models (LSMs) offers a suboptimal solution with severe utility collapse. Our work argues that this failure stems not merely from the capability deficits of LSMs, but significantly from the inherent irrationality of the greedy adversarial strategies employed by current state-of-the-art (SOTA) methods. To address this drawback, we propose Rational Localized Adversarial Anonymization (RLAA), a fully localized and training-free framework featuring an Attacker-Arbitrator-Anonymizer architecture. We model the anonymization process as a trade-off between Marginal Privacy Gain (MPG) and Marginal Utility Cost (MUC), demonstrating that greedy strategies tend to drift into an irrational state. Instead, RLAA introduces an arbitrator that acts as a rationality gatekeeper, validating the attacker's inference to filter out ghost leaks. This mechanism promotes a rational early-stopping criterion, and structurally prevents utility collapse. Extensive experiments on different benchmarks demonstrate that RLAA achieves a superior privacy-utility trade-off compared to strong baselines.
要約:
Smart contracts are stateful programs deployed on blockchains; they secure over a trillion dollars in transaction value per year. High-stakes smart contracts often rely on timely alerts about external events, but prior work has not analyzed their resilience to an attacker suppressing alerts via bribery. We formalize this challenge in a cryptoeconomic setting as the \emph{alerting problem}, giving rise to a game between a bribing adversary and~$n$ rational participants, who pay a penalty if they are caught deviating from the protocol. We establish a quadratic, i.e.,~$O(n^2)$, upper bound, whereas a straightforward alerting protocol only achieves~$O(n)$ bribery cost.
We present a \emph{simultaneous game} that asymptotically achieves the quadratic upper bound and thus asymptotically-optimal bribery resistance. We then present two protocols that implement our simultaneous game: The first leverages a strong network synchrony assumption. The second relaxes this strong assumption and instead takes advantage of trusted hardware and blockchain proof-of-publication to establish a timed commitment scheme. These two protocols are constant-time but incur a linear storage overhead on the blockchain. We analyze a third, \emph{sequential alerting} protocol that optimistically incurs no on-chain storage overhead, at the expense of~$O(n)$ worst-case execution time. All three protocols achieve asymptotically-optimal bribery costs, but with different resource and performance tradeoffs. Together, they illuminate a rich design space for practical solutions to the alerting problem.
要約:
Fault injection attacks on embedded neural network models have been shown as a potent threat. Numerous works studied resilience of models from various points of view. As of now, there is no comprehensive study that would evaluate the influence of number representations used for model parameters against electromagnetic fault injection (EMFI) attacks.
In this paper, we investigate how four different number representations influence the success of an EMFI attack on embedded neural network models. We chose two common floating-point representations (32-bit, and 16-bit), and two integer representations (8-bit, and 4-bit). We deployed four common image classifiers, ResNet-18, ResNet-34, ResNet-50, and VGG-11, on an embedded memory chip, and utilized a low-cost EMFI platform to trigger faults. Beyond accuracy evaluation, we characterize the injected fault pattern by analyzing the bit error rate, the spatial distribution of corrupted bytes, and the prevalence of 0xFE/0xFF byte values across formats, identifying the mechanisms responsible for the observed differences in resilience.
Our results show that while floating-point representations exhibit almost a complete degradation in accuracy (Top-1 and Top-5) after a single fault injection, integer representations offer better resistance overall. In particular, the 8-bit representation on a relatively large network (VGG-11) retains Top-1 accuracy of around 70% and Top-5 at around 90%.
要約:
While watermarking serves as a critical mechanism for LLM provenance, existing secret-key schemes tightly couple detection with injection, requiring access to keys or provider-side scheme-specific detectors for verification. This dependency creates a fundamental barrier for real-world governance, as independent auditing becomes impossible without compromising model security or relying on the opaque claims of service providers. To resolve this dilemma, we introduce TTP-Detect, a pioneering black-box framework designed for non-intrusive, third-party watermark verification. By decoupling detection from injection, TTP-Detect reframes verification as a relative hypothesis testing problem. It employs a proxy model to amplify watermark-relevant signals and a suite of complementary relative measurements to assess the alignment of the query text with watermarked distributions. Extensive experiments across representative watermarking schemes, datasets and models demonstrate that TTP-Detect achieves superior detection performance and robustness against diverse attacks.
要約:
Given limited and costly computational infrastructure, resource efficiency is a key requirement for large language models (LLMs). Efficient LLMs increase service capacity for providers and reduce latency and API costs for users. Recent resource consumption threats induce excessive generation, degrading model efficiency and harming both service availability and economic sustainability. This survey presents a systematic review of threats to resource consumption in LLMs. We further establish a unified view of this emerging area by clarifying its scope and examining the problem along the full pipeline from threat induction to mechanism understanding and mitigation. Our goal is to clarify the problem landscape for this emerging area, thereby providing a clearer foundation for characterization and mitigation.
要約:
In Shamir's secret sharing scheme, all participants possess equal privileges. However, in many practical scenarios, it is often necessary to assign different levels of authority to different participants. To address this requirement, Hierarchical Secret Sharing (HSS) schemes were developed, which partitioned all participants into multiple subsets and assigned a distinct privilege level to each. Existing Chinese Remainder Theorem (CRT)-based HSS schemes benefit from flexible share sizes, but either exhibit security flaws or have an information rate less than $\frac{1}{2}$. In this work, we propose a disjunctive HSS scheme and a conjunctive HSS scheme by using the CRT for integer ring and one-way functions. Both schemes are asymptotically ideal and are proven to be secure.
要約:
Capture The Flag (CTF) competitions have established themselves as a highly effective pedagogical tool in cybersecurity education, offering students hands-on experience in realistic attack and defense scenarios. However, organizing and hosting these events requires considerable infrastructure effort, which frequently limits their adoption in academic settings. This paper presents the design, iterative development, and evaluation of a CTF as a Service (CaaS) platform built on Proxmox virtualization, leveraging Infrastructure as Code (IaC) tools such as Terraform and Ansible, container orchestration via Docker Swarm, and load balancing with HAProxy. The system supports both a development-centered workflow, in which challenges are automatically deployed from a Git repository through a CI/CD pipeline, and a deployment-oriented workflow for ad-hoc infrastructure provisioning. The paper describes the design decisions made, the challenges encountered during development, and the solutions implemented to achieve session persistence, external routing, and challenge replicability. The platform is designed to evolve into a CTF hosting service with commercial potential, and future lines of work are outlined regarding automatic scaling, monitoring integration, and frontend standardization.
要約:
With frequently evolving Advanced Persistent Threats (APTs) in cyberspace, traditional security solutions approaches have become inadequate for threat hunting for organizations. Moreover, SOC (Security Operation Centers) analysts are often overwhelmed and struggle to analyze the huge volume of logs received from diverse devices in organizations. To address these challenges, we propose an automated and dynamic threat hunting framework for monitoring evolving threats, adapting to changing network conditions, and performing risk-based prioritization for the mitigation of suspicious and malicious traffic. By integrating Agentic AI with Splunk, an established SIEM platform, we developed a unique threat hunting framework. The framework systematically and seamlessly integrates different threat hunting modules together, ranging from traffic ingestion to anomaly assessment using a reconstruction-based autoencoder, deep reinforcement learning (DRL) with two layers for initial triage, and a large language model (LLM) for contextual analysis. We evaluated the framework against a publicly available benchmark dataset, as well as against a simulated dataset. The experimental results show that the framework can effectively adapt to different SOC objectives autonomously and identify suspicious and malicious traffic. The framework enhances operational effectiveness by supporting SOC analysts in their decision-making to block, allow, or monitor network traffic. This study thus enhances cybersecurity and threat hunting literature by presenting the novel threat hunting framework for security decision-making, as well as promoting cumulative research efforts to develop more effective frameworks to battle continuously evolving cyber threats.
要約:
Digital devices like tablets, media players, and kiosks are increasingly deployed in U.S. prisons. These technologies can enable incarcerated people to access education, communicate with loved ones, and develop vital reentry skills. However, they can also introduce new privacy and security risks for incarcerated people who have little agency over their usage and contracts, and are currently carved out of many consumer protection safeguards. To investigate these issues, we conducted focus groups and interviews with system-impacted people (n=17), i.e., those formerly incarcerated, and their relatives, to investigate experiences with device-related security and privacy vulnerabilities and the power dynamics that affect their use. In our findings, participants describe pervasive surveillance, censorship, and usability problems with the technology available to them, including shifting and seemingly arbitrary usage policies. These policies strain relationships both inside and outside prisons and contribute to negative downstream effects for incarcerated users. We recommend ways to better balance prison security concerns with privacy-related needs of system-impacted individuals by promoting accountability for technology-related decisions, providing public oversight of digital purchasing and use policies, and designing digital tools with them -- the actual end-users -- in mind.
要約:
Modern advanced driver assistance systems (ADAS) rely on deep neural networks (DNNs) for perception and planning. Since DNNs' parameters reside in DRAM during inference, bit flips caused by cosmic radiation or low-voltage operation may corrupt DNN computations, distort driving decisions, and lead to real-world incidents. This paper presents a SpatioTemporal-Aware Fault Injection (STAFI) framework to locate critical fault sites in DNNs for ADAS efficiently. Spatially, we propose a Progressive Metric-guided Bit Search (PMBS) that efficiently identifies critical network weight bits whose corruption causes the largest deviations in driving behavior (e.g., unintended acceleration or steering). Furthermore, we develop a Critical Fault Time Identification (CFTI) mechanism that determines when to trigger these faults, taking into account the context of real-time systems and environmental states, to maximize the safety impact. Experiments on DNNs for a production ADAS demonstrate that STAFI uncovers 29.56x more hazard-inducing critical faults than the strongest baseline.
要約:
Ransomware and DDoS attacks disproportionately impact hospitals, schools, and small organizations that cannot afford enterprise security solutions. We present ML Defender (aRGus NDR), an open-source network intrusion detection system built in C++20, deployable on commodity hardware at approximately 150-200 USD. ML Defender implements a six-component pipeline over eBPF/XDP packet capture, ZeroMQ transport, and Protocol Buffers serialization, combining a rule-based Fast Detector with an embedded Random Forest classifier. The Maximum Threat Wins policy selects the arithmetic maximum of both scores, using ML inference to suppress false positives. Evaluated against the CTU-13 Neris botnet dataset: F1=0.9985, Precision=0.9969, Recall=1.0000, FPR=0.0002% (2 FP in 12,075 benign flows). The Fast Detector alone produces 6.61% FPR on benign traffic; the ML layer reduces this to zero -- a ~500-fold reduction. Per-class inference latency: 0.24-1.06 microseconds on commodity hardware. Under progressive load testing, the pipeline sustains ~34-38 Mbps with zero packet drops across 2.37 million packets. RAM stable at ~1.28 GB. The bottleneck is VirtualBox NIC emulation, not pipeline logic. All figures are conservative lower bounds; bare-metal characterization is future work. This work was developed through the Consejo de Sabios, a structured multi-LLM peer review methodology. Test-Driven Hardening (TDH) is proposed as a methodology for security-critical distributed systems. ML Defender is released under the MIT license.
要約:
We prove that no continuous, utility-preserving wrapper defense-a function $D: X\to X$ that preprocesses inputs before the model sees them-can make all outputs strictly safe for a language model with connected prompt space, and we characterize exactly where every such defense must fail. We establish three results under successively stronger hypotheses: boundary fixation-the defense must leave some threshold-level inputs unchanged; an $\epsilon$-robust constraint-under Lipschitz regularity, a positive-measure band around fixed boundary points remains near-threshold; and a persistent unsafe region under a transversality condition, a positive-measure subset of inputs remains strictly unsafe. These constitute a defense trilemma: continuity, utility preservation, and completeness cannot coexist. We prove parallel discrete results requiring no topology, and extend to multi-turn interactions, stochastic defenses, and capacity-parity settings. The results do not preclude training-time alignment, architectural changes, or defenses that sacrifice utility. The full theory is mechanically verified in Lean 4 and validated empirically on three LLMs.
要約:
While Chain-of-Thought (CoT) prompting has become a standard paradigm for eliciting complex reasoning capabilities in Large Language Models, it inadvertently exposes a new attack surface for backdoor attacks. Existing CoT backdoor attacks typically manipulate the intermediate reasoning steps to steer the model toward incorrect answers. However, these corrupted reasoning traces are readily detected by prevalent process-monitoring defenses. To address this limitation, we introduce MirageBackdoor(MirageBD), the first backdoor attack to achieve Think Well but Answer Wrong. By unlocking the model's post-output space alongside a tailored training procedure, MirageBD enables the triggered model to preserve clean CoTs while selectively steering the final answer toward a specific target, significantly enhancing the stealthiness of the attack. Experiments show that MirageBD generally achieves over 90% attack success rate across four datasets and five models with a poison ratio of only 5%. Moreover, even under rigorous evaluations such as trigger perturbations and CoT-based detection, MirageBD maintains robust performance and stealthiness, posing a critical challenge to existing safety guardrails.
要約:
With the rapid advancement of decentralized applications, smart contract security faces severe challenges, particularly regarding atomicity violations in complex logic such as Oracle and NFT contracts. Rigid rule sets often limit traditional static analyzers and lack deep contextual awareness, leading to high false-positive and false-negative rates when identifying vulnerabilities that depend on intermediate state inconsistencies. To address these limitations, this paper proposes PSR\textsuperscript{2}, a novel collaborative static analysis framework that integrates structural path searching with deterministic semantic reasoning. PSR\textsuperscript{2} utilizes a Graph Structure Analysis Module (GSAM) to identify suspicious execution sequences in control flow graphs and a Semantic Context Analysis Module (SCAM) to extract data dependencies and state facts from abstract syntax trees. A Fusion Decision Module (FDM) then performs formal cross validation to confirm vulnerabilities based on a unified atomicity inconsistency model. Experimental results on 1,600 contract samples demonstrate that PSR\textsuperscript{2} significantly outperforms pattern-matching baselines, achieving an F1-score of 94.69\% in complex ERC-721 scenarios compared to 51.86\% for existing tools. Ablation studies further confirm that our fusion logic effectively reduces the false-positive rate by nearly half compared to single module analysis.
要約:
Large language models (LLMs) have emerged as a powerful tool for synthetic data generation. A particularly important use case is producing synthetic replicas of private text, which requires carefully balancing privacy and utility. We propose Realistic and Privacy-Preserving Synthetic Data Generation (RPSG), which uses private seeds and integrates privacy-preserving strategies, including a formal differential privacy (DP) mechanism in the candidate selection, to generate realistic synthetic data. Comprehensive experiments against state-of-the-art private synthetic data generation methods demonstrate that RPSG achieves high fidelity to private data while providing strong privacy protection.
要約:
Without direct access to the client's data, federated learning (FL) is well-known for its unique strength in data privacy protection among existing distributed machine learning techniques. However, its distributive and iterative nature makes FL inherently vulnerable to various poisoning attacks. To counteract these threats, extensive defenses have been proposed to filter out malicious clients, using various detection metrics. Based on our analysis of existing attacks and defenses, we find that there is a lack of attention to model redundancy. In neural networks, various model parameters contribute differently to the model's performance. However, existing attacks in FL manipulate all the model update parameters with the same strategy, making them easily detectable by common defenses. Meanwhile, the defenses also tend to analyze the overall statistical features of the entire model updates, leaving room for sophisticated attacks. Based on these observations, this paper proposes a generic and attack-agnostic augmentation approach designed to enhance the effectiveness and stealthiness of existing FL poisoning attacks against detection in FL, pointing out the inherent flaws of existing defenses and exposing the necessity of fine-grained FL security. Specifically, we employ a three-stage methodology that strategically constructs, generates, and injects poison (generated by existing attacks) into a pill (a tiny subnet with a novel structure) during the FL training, named as pill construction, pill poisoning, and pill injection accordingly. Extensive experimental results show that FL poisoning attacks enhanced by our method can bypass all the popular defenses, and can gain an up to 7x error rate increase, as well as on average a more than 2x error rate increase on both IID and non-IID data, in both cross-silo and cross-device FL systems.
要約:
Multimodal Large Language Models (MLLMs) have shown great promise but require substantial computational resources during inference. Attackers can exploit this by inducing excessive output, leading to resource exhaustion and service degradation. Prior energy-latency attacks aim to increase generation time by broadly shifting the output token distribution away from the EOS token, but they neglect the influence of token-level Part-of-Speech (POS) characteristics on EOS and sentence-level structural patterns on output counts, limiting their efficacy. To address this, we propose LingoLoop, an attack designed to induce MLLMs to generate excessively verbose and repetitive sequences. First, we find that the POS tag of a token strongly affects the likelihood of generating an EOS token. Based on this insight, we propose a POS-Aware Delay Mechanism to postpone EOS token generation by adjusting attention weights guided by POS information. Second, we identify that constraining output diversity to induce repetitive loops is effective for sustained generation. We introduce a Generative Path Pruning Mechanism that limits the magnitude of hidden states, encouraging the model to produce persistent loops. Extensive experiments on models like Qwen2.5-VL-3B demonstrate LingoLoop's powerful ability to trap them in generative loops; it consistently drives them to their generation limits and, when those limits are relaxed, can induce outputs with up to 367x more tokens than clean inputs, triggering a commensurate surge in energy consumption. These findings expose significant MLLMs' vulnerabilities, posing challenges for their reliable deployment.
要約:
Long-context large language models (LLMs), such as Gemini-2.5-Pro and Claude-Sonnet-4, are increasingly used to empower advanced AI systems, including retrieval-augmented generation (RAG) pipelines and autonomous agents. In these systems, an LLM receives an instruction along with a context--often consisting of texts retrieved from a knowledge database or memory--and generates a response that is contextually grounded by following the instruction. Recent studies have designed solutions to trace back to a subset of texts in the context that contributes most to the response generated by the LLM. These solutions have numerous real-world applications, including performing post-attack forensic analysis and improving the interpretability and trustworthiness of LLM outputs. While significant efforts have been made, state-of-the-art solutions such as TracLLM often lead to a high computation cost, e.g., it takes TracLLM hundreds of seconds to perform traceback for a single response-context pair. In this work, we propose AttnTrace, a new context traceback method based on the attention weights produced by an LLM for a prompt. To effectively utilize attention weights, we introduce two techniques designed to enhance the effectiveness of AttnTrace, and we provide theoretical insights for our design choice. We also perform a systematic evaluation for AttnTrace. The results demonstrate that AttnTrace is more accurate and efficient than existing state-of-the-art context traceback methods. We also show that AttnTrace can improve state-of-the-art methods in detecting prompt injection under long contexts through the attribution-before-detection paradigm. As a real-world application, we demonstrate that AttnTrace can effectively pinpoint injected instructions in a paper designed to manipulate LLM-generated reviews. The code is at https://github.com/Wang-Yanting/AttnTrace.
要約:
Large language model-powered code agents are rapidly transforming software engineering, yet the security risks of their generated code have become a critical concern. Existing benchmarks have provided valuable insights, but they fail to capture scenarios in which vulnerabilities are actually introduced by human developers, making fair comparisons between humans and agents infeasible. We therefore introduce SecureVibeBench, a benchmark of 105 C/C++ secure coding tasks sourced from 41 projects in OSS-Fuzz for code agents. SecureVibeBench has the following features: (i) realistic task settings that require multi-file edits in large repositories, (ii)~aligned contexts based on real-world open-source vulnerabilities with precisely identified vulnerability introduction points, and (iii) comprehensive evaluation that combines functionality testing and security checking with both static and dynamic oracles. We evaluate 5 popular code agents like OpenHands, supported by 5 LLMs (e.g., Claude sonnet 4.5) on SecureVibeBench. Results show that current agents struggle to produce both correct and secure code, as even the best-performing one, produces merely 23.8\% correct and secure solutions on SecureVibeBench. Our code and data are on https://github.com/iCSawyer/SecureVibeBench.
要約:
"Pebble games," an abstraction from classical reversible computing, have found use in the design of quantum circuits for inherently sequential tasks. Gidney showed that allowing Hadamard basis measurements during pebble games can dramatically improve costs -- an extension termed "spooky pebble games" because the measurements leave temporary phase errors called ghosts. Separately, previous work by Blocki et al. studied the benefits of parallelism in pebble games. In this work we define and study parallel spooky pebble games, showing that parallelism and spookiness can yield impressive gains when used together. First, we show by construction that a line graph of length $\ell$ can be pebbled in depth $2\ell$ (exactly optimal) using space $\leq 2.47\log \ell$. Then, to explore pebbling schemes using even less space, we use a highly optimized $A^*$ search implemented in Julia to find the lowest-depth parallel spooky pebbling possible for a range of concrete line graph lengths $\ell$ given a constant number of pebbles $s$.
We then show that these techniques can significantly reduce the cost of the arithmetic in Regev's factoring algorithm. For example, we find that 4096-bit integers $N$ can be factored in multiplication depth 193, which outperforms the 680 required of previous variants of Regev and the 444 reported by Eker{\aa} and G\"artner for Shor's algorithm. While the space required for Shor's algorithm is considerably less than any variant of Regev's algorithm including ours, and thus Shor likely remains the best candidate for the first quantum factorization of large integers, our results show that implementations of Regev's algorithm are far from fully optimized, and Regev's algorithm may have practical importance in the future. We also believe our pebbling techniques are applicable in quantum cryptanalysis beyond integer factorization, and in quantum circuit compilation more broadly.
著者: Heng Zhao (KTH Royal Institute of Technology), Sara Saeidian (KTH Royal Institute of Technology, Inria Saclay), Tobias J. Oechtering (KTH Royal Institute of Technology)
要約:
Linear queries, as the basis of broad analysis tasks, are often released through privacy mechanisms based on differential privacy (DP), the most popular framework for privacy protection. However, DP adopts a context-free definition that operates independently of the data-generating distribution. In this paper, we revisit the privacy analysis of the Laplace mechanism through the lens of pointwise maximal leakage (PML). We demonstrate that the distribution-agnostic definition of the DP framework often mandates excessive noise. To address this, we incorporate an assumption about the prior distribution by lower-bounding the probability of any single record belonging to any specific class. With this assumption, we derive a tight, context-aware leakage bound for general linear queries, and prove that our derived bound is strictly tighter than the standard DP guarantee and converges to the DP guarantee as this probability lower bound approaches zero. Numerical evaluations demonstrate that by exploiting this prior knowledge, the required noise scale can be reduced while maintaining privacy guarantees.
要約:
Fine-tuned language models pose significant privacy risks, as they may memorize and expose sensitive information from their training data. Membership inference attacks (MIAs) provide a principled framework for auditing these risks, yet existing methods achieve limited detection rates, particularly at the low false-positive thresholds required for practical privacy auditing. We present EZ-MIA, a membership inference attack that exploits a key observation: memorization manifests most strongly at error positions, specifically tokens where the model predicts incorrectly yet still shows elevated probability for training examples. We introduce the Error Zone (EZ) score, which measures the directional imbalance of probability shifts at error positions relative to a pretrained reference model. This principled statistic requires only two forward passes per query and no model training of any kind. On WikiText with GPT-2, EZ-MIA achieves 3.8x higher detection than the previous state-of-the-art under identical conditions (66.3% versus 17.5% true positive rate at 1% false positive rate), with near-perfect discrimination (AUC 0.98). At the stringent 0.1% FPR threshold critical for real-world auditing, we achieve 8x higher detection than prior work (14.0% versus 1.8%), requiring no reference model training. These gains extend to larger architectures: on AG News with Llama-2-7B, we achieve 3x higher detection (46.7% versus 15.8% TPR at 1% FPR). These results establish that privacy risks of fine-tuned language models are substantially greater than previously understood, with implications for both privacy auditing and deployment decisions. Code is available at https://github.com/JetBrains-Research/ez-mia.
要約:
Language models remain vulnerable to prompt injection attacks despite extensive safety training. We trace this failure to role confusion: models infer the source of text based on how it sounds, not where it actually comes from. A command hidden in a webpage hijacks an agent simply because it sounds like a user instruction. This is not just behavioral: in the model's internal representations, text that sounds like a trusted source occupies the same space as text that actually is one. We design role probes which measure how models internally perceive "who is speaking", showing that attacker-controllable signals (e.g. syntactic patterns, lexical choice) control role perception. We first test this with CoT Forgery, a zero-shot attack that injects fabricated reasoning into user prompts or ingested webpages. Models mistake the text for their own thoughts, yielding 60% attack success on StrongREJECT across frontier models with near-0% baselines. Strikingly, the degree of role confusion strongly predicts attack success. We then generalize these results to standard agent prompt injections, introducing a unifying framework that reframes prompt injection not as an ad-hoc exploit but as a measurable consequence of how models represent role.
要約:
Quantum computing provides the feasible multi-layered security challenges to classical blockchain systems. Quantum blockchains that relies on quantum key distribution (QKD) to establish secure channels can address this feasible threat. Whereas, there are still architecture limitations to practical security resulted in the measurement devices while implementing the QKD-based blockchains in physical layer. This paper presents a distributed architecture in quantum blockchain to address the connectivity and distance limitations of the QKD-secured networks. A decoupled architecture is designed felicitously so that it pairs a linearly scalable measurement-device-independent (MDI) physical layer with a decentralized consensus.It can optimize the complexity of infrastructure from quadratic to linear scaling, ascribed to leveraging the twin-field (TF) QKD protocol with the MDI-structurized star topology. Additionally, the dual-key stratification strategy transforms symmetric information-theoretic security into publicly auditable forward-secret blockchain evidence. This architecture can integrate information-theoretic security with distributed consensus mechanisms, allowing the scalable system to overcome the potential rate-loss limits inherent in traditional security-weakened blockchains.
要約:
Linguistic steganography involves embedding secret messages within seemingly innocuous texts to enable covert communication. Provable security, which is a long-standing goal and key motivation, has been extended to language-model-based steganography. Previous provably secure approaches have achieved perfect imperceptibility, measured by zero Kullback-Leibler (KL) divergence, but at the expense of embedding capacity. In this paper, we attempt to directly use a classic entropy coding method (range coding) to achieve secure steganography, and then propose an efficient and provably secure linguistic steganographic method with a rotation mechanism. Experiments across various language models show that our method achieves around 100% entropy utilization (embedding efficiency) for embedding capacity, outperforming the existing baseline methods. Moreover, it achieves high embedding speeds (up to 1554.66 bits/s on GPT-2). The code is available at github.com/ryehr/RRC_steganography.