要約:
Global Navigation Satellite System (GNSS) spoofing and jamming threaten maritime navigation by corrupting positions from Automatic Identification System (AIS) transponders. Crucially, raw AIS messages contain communication-layer defects (duplicated MMSIs, timestamp errors, stale retransmissions, and multi-station rebroadcast delays) that can mimic spoofing or jamming. Thus, AIS positions are unreliable without pre-filtering. We propose a three-stage AIS-based framework that (1) uses rule-based diagnostics to discard communication faults, (2) applies an interacting multiple model filter and transmission-interval analysis to extract kinematic-consistency and continuity anomalies, and (3) applies spatiotemporal DBSCAN to group anomalies by multi-vessel coherence and temporal persistence and classify them as sensor faults, spoofing, or jamming. Tested on approximately 966 million AIS messages from Korean coastal waters, the framework detected 17 spoofing and 343 jamming clusters and reduced false alarms by 98.6% relative to naive clustering. These results show that, after rigorous pre-filtering, AIS data can enable wide-area GNSS interference detection without dedicated sensors.
要約:
AI agents that combine large language models with non-AI system components are rapidly emerging in real-world applications, offering unprecedented automation and flexibility. However, this unprecedented flexibility introduces complex security challenges fundamentally different from those in traditional software systems. This paper presents the first systematic and comprehensive survey of AI agent security, including an analysis of the design space, attack landscape, and defense mechanisms for secure AI agent systems. We further conduct case studies to point out existing gaps in securing agentic AI systems and identify open challenges in this emerging domain. Our work also introduces the first systematic framework for understanding the security risks and defense strategies of AI agents, serving as a foundation for building both secure agentic systems and advancing research in this critical area.
要約:
Communication topology is a critical factor in the utility and safety of LLM-based multi-agent systems (LLM-MAS), making it a high-value intellectual property (IP) whose confidentiality remains insufficiently studied.
%
Existing topology inference attempts rely on impractical assumptions, including control over the administrative agent and direct identity queries via jailbreaks, which are easily defeated by basic keyword-based defenses. As a result, prior analyses fail to capture the real-world threat of such attacks.
%
To bridge this realism gap, we propose \textit{WebWeaver}, an attack framework that infers the complete LLM-MAS topology by compromising only a single arbitrary agent instead of the administrative agent.
%
Unlike prior approaches, WebWeaver relies solely on agent contexts rather than agent IDs, enabling significantly stealthier inference.
%
WebWeaver further introduces a new covert jailbreak-based mechanism and a novel fully jailbreak-free diffusion design to handle cases where jailbreaks fail.
%
Additionally, we address a key challenge in diffusion-based inference by proposing a masking strategy that preserves known topology during diffusion, with theoretical guarantees of correctness.
%
Extensive experiments show that WebWeaver substantially outperforms state-of-the-art (SOTA) baselines, achieving about 60\% higher inference accuracy under active defenses with negligible overhead.
要約:
In a content delivery network (CDN), resources are strained during peak-time and underutilised in off-peak times when supplying digital content to users. Caching can help balance this. At the off-peak time some content is delivered to users' local caches. During peak time, the use of cached data to serve users' requests relieves strain on the network by reducing repeated transfer of popular content. In \emph{coded caching}, the cache content placement is designed in conjunction with the delivery techniques to optimise network throughput.
Since dissemination of information, as well as the delivery of entertainment, is reliant on CDNs, the security and privacy of cache placement, user demand, and content delivery, are paramount. In much of the literature in \emph{secure coded caching}, security is built on top of solutions that have efficiency in mind, and most current proposals focus on the security of individual parts of the process. A lack of a unifying network model also makes it difficult to compare or combine solutions.
In this survey we analyse the security and privacy requirements of secure coded caching, and evaluate existing schemes in terms of the security provided and the cost of this security provision. We also review the techniques used to achieve secure coded caching and analyse their limitations. In addition, we contextualise secure coded caching in the landscape of other secure content delivery primitives. As a result, we identify and prioritise open security and privacy challenges for the future.
要約:
The \textsc{prim-lwe} problem (Sehrawat, Yeo, and Desmedt, \emph{Theoret.\ Comput.\ Sci.}\ 886, 2021) is a variant of Learning with Errors requiring the secret matrix to have a primitive-root determinant. The dimension-uniform reduction constant is $c(p)=\inf_{m\ge 1}c_m(p)$, where $c_n(p)$ is the density of $n\times n$ matrices over~$\mathbb{F}_p$ with primitive-root determinant. Those authors asked whether $\inf_{p\text{ prime}}c(p)=0$, noting that an affirmative answer would follow from the conjectural infinitude of primorial primes.
We resolve this unconditionally using only Dirichlet's theorem and Mertens' product formula, bypassing the primorial-prime hypothesis entirely. We establish the sharp order $\min_{p\le x}c(p)\asymp 1/\log\log x$, prove that $c(p)$ possesses a continuous purely singular limiting distribution over the primes with support exactly $[0,1/2]$, and derive explicit lower bounds on $c(q)$ for primes of cryptographic interest parameterized solely by~$\omega(q{-}1)$, the number of distinct prime factors of~$q{-}1$. These bounds apply to every prime~$q$ whose predecessor has controlled factorization structure, as measured by~$\omega(q{-}1)$; this includes many NTT-friendly moduli, though NTT-friendliness alone does not imply the needed bound. For the NIST-standardized moduli $q=3329$ (ML-KEM) and $q=8380417$ (ML-DSA), the dimension-uniform expected rejection-sampling overhead~$1/c(q)$ is at most $2.17$ and $3.42$, respectively. As a simple conservative bound, for any prime $q>2^{30}$ one has $1/c(q)\le 1.79\log q$. The worst-case overhead among primes $p\le x$ is $\Theta(\log\log x)$, and pointwise $1/c(q)=O(\log\log q)$.
要約:
Network intrusion detection systems play a crucial role in the security strategy employed by organisations to detect and prevent cyberattacks. Such systems usually combine pattern detection signatures with anomaly detection techniques powered by machine learning methods. However, the commonly proposed machine learning methods present drawbacks such as over-reliance on labeled data and limited generalization capabilities. To address these issues, embedding-based methods have been introduced to learn representations from network data, such as DNS traffic, mainly due to its large availability, that generalise effectively to many downstream tasks. However, current approaches do not properly consider contextual information among DNS queries. In this paper, we tackle this issue by proposing DNS-GT, a novel Transformer-based model that learns embeddings for domain names from sequences of DNS queries. The model is first pre-trained in a self-supervised fashion in order to learn the general behavior of DNS activity. Then, it can be finetuned on specific downstream tasks, exploiting interactions with other relevant queries in a given sequence. Our experiments with real-world DNS data showcase the ability of our method to learn effective domain name representations. A quantitative evaluation on domain name classification and botnet detection tasks shows that our approach achieves better results compared to relevant baselines, creating opportunities for further exploration of large-scale language models for intrusion detection systems. Our code is available at: https://github.com/m-altieri/DNS-GT.
要約:
Large Language Models (LLMs) show remarkable capabilities in understanding natural language and generating complex code. However, as practitioners adopt CodeLLMs for increasingly critical development tasks, research reveals that these models frequently generate functionally correct yet insecure code, posing significant security risks. While multiple approaches have been proposed to improve security in AI-based code generation, combined benchmarks show these methods remain insufficient for practical use, achieving only limited improvements in both functional correctness and security. This stems from a fundamental gap in understanding the internal mechanisms of code generation and the root causes of security vulnerabilities, forcing researchers to rely on heuristics and empirical observations. In this work, we investigate the internal representation of security concepts in CodeLLMs, revealing that models are often aware of vulnerabilities as they generate insecure code. Through systematic evaluation, we demonstrate that CodeLLMs can distinguish between security subconcepts, enabling a more fine-grained analysis than prior black-box approaches. Leveraging these insights, we propose Secure Concept Steering for CodeLLMs (SCS-Code). During token generation, SCS-Code steers LLMs' internal representations toward secure and functional code output, enabling a lightweight and modular mechanism that can be integrated into existing code models. Our approach achieves superior performance compared to state-of-the-art methods across multiple secure coding benchmarks.
要約:
Smart contract-based ecosystems enable decentralized applications without trusted intermediaries, but their immutability and permissionless design also facilitate large-scale fraud. One of the most prevalent attacks is the rug pull, where project operators abruptly withdraw liquidity after artificially inflating token value. Existing detection methods primarily rely on reactive on-chain signals and often suffer from temporal data leakage, limiting their real-world reliability. This paper proposes a leakage-aware framework for early rug-pull detection that integrates on-chain behavioral metrics with temporally aligned Open Source Intelligence (OSINT) signals. We construct a hand-labeled dataset of 1,000 token projects, spanning DeFi and non-DeFi settings, with all features extracted strictly prior to any liquidity withdrawal to preserve causal validity. The dataset combines structural on-chain indicators with external attention signals derived from social media activity and search trends. Within this framework, TabPFN is employed as a core modeling component for learning from multimodal tabular data under strict temporal constraints. Experimental results show that the proposed framework achieves strong discriminative performance and improved probability calibration compared to classical baselines, while maintaining low false-negative rates. By framing rug-pull detection as a causal, multimodal forecasting problem, this work emphasizes the necessity of leakage-resilient evaluation and calibrated risk estimation for deployment in blockchain security systems.
要約:
We construct unclonable encryption (UE) in the Haar random oracle model, where all parties have query access to $U,U^\dagger,U^*,U^T$ for a Haar random unitary $U$. Our scheme satisfies the standard notion of unclonable indistinguishability security, supports reuse of the secret key, and can encrypt arbitrary-length messages. That is, we give the first evidence that (reusable) UE, which requires computational assumptions, exists in "micocrypt", a world where one-way functions may not exist.
As one of our central technical contributions, we build on the recently introduced path recording framework to prove a natural ``unitary reprogramming lemma'', which may be of independent interest.
要約:
Autonomous Large Language Model (LLM) agents, exemplified by OpenClaw, demonstrate remarkable capabilities in executing complex, long-horizon tasks. However, their tightly coupled instant-messaging interaction paradigm and high-privilege execution capabilities substantially expand the system attack surface. In this paper, we present a comprehensive security threat analysis of OpenClaw. To structure our analysis, we introduce a five-layer lifecycle-oriented security framework that captures key stages of agent operation, i.e., initialization, input, inference, decision, and execution, and systematically examine compound threats across the agent's operational lifecycle, including indirect prompt injection, skill supply chain contamination, memory poisoning, and intent drift. Through detailed case studies on OpenClaw, we demonstrate the prevalence and severity of these threats and analyze the limitations of existing defenses. Our findings reveal critical weaknesses in current point-based defense mechanisms when addressing cross-temporal and multi-stage systemic risks, highlighting the need for holistic security architectures for autonomous LLM agents. Within this framework, we further examine representative defense strategies at each lifecycle stage, including plugin vetting frameworks, context-aware instruction filtering, memory integrity validation protocols, intent verification mechanisms, and capability enforcement architectures.
要約:
Embedded software used in industrial systems frequently relies on data that ensures the correct and efficient operation of these systems. Thus, companies invest considerable resources in fine-tuning this data, making it their valuable intellectual property (IP). We present a novel protection mechanism for this IP that combines hardware fingerprints with Boolean logic. Unlike usual copy-protection approaches, unauthorised copies of the software still run on cloned devices but suboptimally. According to our security evaluation, only a complex dynamic analysis of the protected software running on the genuine target device can reveal the secret data. This makes the protection offered by our method more difficult to bypass. Notably, our approach does not require additional hardware, relying only on relatively simple updates to the software. We evaluate our protection mechanism by binding the parameters of a PID controller to a microcontroller unit (MCU) by using a physically unclonable function (PUF) based on its SRAM.
要約:
Tool-augmented LLM agents introduce security risks that extend beyond user-input filtering, including indirect prompt injection through fetched content, unsafe tool execution, credential leakage, and tampering with local control files. We present OpenClaw PRISM, a zero-fork runtime security layer for OpenClaw-based agent gateways. PRISM combines an in-process plugin with optional sidecar services and distributes enforcement across ten lifecycle hooks spanning message ingress, prompt construction, tool execution, tool-result persistence, outbound messaging, sub-agent spawning, and gateway startup. Rather than introducing a novel detection model, PRISM integrates a hybrid heuristic-plus-LLM scanning pipeline, conversation- and session-scoped risk accumulation with TTL-based decay, policy-enforced controls over tools, paths, private networks, domain tiers, and outbound secret patterns, and a tamper-evident audit and operations plane with integrity verification and hot-reloadable policy management. We outline an evaluation methodology and benchmark pipeline for measuring security effectiveness, false positives, layer contribution, runtime overhead, and operational recoverability in an agent-runtime setting, and we report current preliminary benchmark results on curated same-slice experiments and operational microbenchmarks. The system targets deployable runtime defense for real agent gateways rather than benchmark-only detection.
要約:
It is widely recognized that practical exercises are crucial for teaching cybersecurity in higher education. However, their setup is not only expensive, time-consuming, and prone to numerous errors, but also requires technical and programming skills to create attack contexts and scripts. To mitigate these drawbacks, this research work proposes an approach that automatically generates scripts and attack contexts based on informal attack scenario descriptions. To isolate business concerns from technological issues, our approach is aligned with the MDA development method. A formal language is proposed to express our Computation Independent model. We rely on the TOSCA standard to describe our Platform Independent Model. We also allow through our approach the generation of several Platform Specific Models. Hence, this research work contributes not only to the overall improvement of attack implementations for cybersecurity training but also to their reuse on various platforms.
要約:
High-privilege LLM agents that autonomously process external documentation are increasingly trusted to automate tasks by reading and executing project instructions, yet they are granted terminal access, filesystem control, and outbound network connectivity with minimal security oversight. We identify and systematically measure a fundamental vulnerability in this trust model, which we term the \emph{Trusted Executor Dilemma}: agents execute documentation-embedded instructions, including adversarial ones, at high rates because they cannot distinguish malicious directives from legitimate setup guidance. This vulnerability is a structural consequence of the instruction-following design paradigm, not an implementation bug. To structure our measurement, we formalize a three-dimensional taxonomy covering linguistic disguise, structural obfuscation, and semantic abstraction, and construct \textbf{ReadSecBench}, a benchmark of 500 real-world README files enabling reproducible evaluation. Experiments on the commercially deployed computer-use agent show end-to-end exfiltration success rates up to 85\%, consistent across five programming languages and three injection positions. Cross-model evaluation on four LLM families in a simulation environment confirms that semantic compliance with injected instructions is consistent across model families. A 15-participant user study yields a 0\% detection rate across all participants, and evaluation of 12 rule-based and 6 LLM-based defenses shows neither category achieves reliable detection without unacceptable false-positive rates. Together, these results quantify a persistent \emph{Semantic-Safety Gap} between agents' functional compliance and their security awareness, establishing that documentation-embedded instruction injection is a persistent and currently unmitigated threat to high-privilege LLM agent deployments.
要約:
Prompt injection defenses are often framed as semantic understanding problems and delegated to increasingly large neural detectors. For the first screening layer, however, the requirements are different: the detector runs on every request and therefore must be fast, deterministic, non-promptable, and auditable. We introduce Mirror, a data-curation design pattern that organizes prompt injection corpora into matched positive and negative cells so that a classifier learns control-plane attack mechanics rather than incidental corpus shortcuts. Using 5,000 strictly curated open-source samples -- the largest corpus supportable under our public-data validity contract -- we define a 32-cell mirror topology, fill 31 of those cells with public data, train a sparse character n-gram linear SVM, compile its weights into a static Rust artifact, and obtain 95.97\% recall and 92.07\% F1 on a 524-case holdout at sub-millisecond latency with no external model runtime dependencies. On the same holdout, our next line of defense, a 22-million-parameter Prompt Guard~2 model reaches 44.35\% recall and 59.14\% F1 at 49\,ms median and 324\,ms p95 latency. Linear models still leave residual semantic ambiguities such as use-versus-mention for later pipeline layers, but within that scope our results show that for L1 prompt injection screening, strict data geometry can matter more than model scale.
要約:
This paper investigates the detectability of popular imagein-image steganography schemes [1, 2, 3, 4, 5]. In this paradigm, the payload is usually an image of the same size as the Cover image, leading to very high embedding rates. We first show that the embedding yields a mixing process that is easily identifiable by independent component analysis. We then propose a simple, interpretable steganalysis method based on the first four moments of the independent components estimated from the wavelet decomposition of the images, which are used to distinguish between the distributions of Cover and Stego components. Experimental results demonstrate the efficiency of the proposed method, with eight-dimensional input vectors attaining up to 84.6% accuracy. This vulnerability analysis is supported by two other facts: the use of keyless extraction networks and the high detectability w.r.t. classical steganalysis methods, such as the SRM combined with support vector machines, which attains over 99% accuracy.
要約:
Large Language Models (LLMs) are increasingly trained to align with human values, primarily focusing on task level, i.e., refusing to execute directly harmful tasks. However, a subtle yet crucial content-level ethical question is often overlooked: when performing a seemingly benign task, will LLMs -- like morally conscious human beings -- refuse to proceed when encountering harmful content in user-provided material? In this study, we aim to understand this content-level ethical question and systematically evaluate its implications for mainstream LLMs. We first construct a harmful knowledge dataset (i.e., non-compliant with OpenAI's usage policy) to serve as the user-supplied harmful content, with 1,357 entries across ten harmful categories. We then design nine harmless tasks (i.e., compliant with OpenAI's usage policy) to simulate the real-world benign tasks, grouped into three categories according to the extent of user-supplied content required: extensive, moderate, and limited. Leveraging the harmful knowledge dataset and the set of harmless tasks, we evaluate how nine LLMs behave when exposed to user-supplied harmful content during the execution of benign tasks, and further examine how the dynamics between harmful knowledge categories and tasks affect different LLMs. Our results show that current LLMs, even the latest GPT-5.2 and Gemini-3-Pro, often fail to uphold human-aligned ethics by continuing to process harmful content in harmless tasks. Furthermore, external knowledge from the ``Violence/Graphic'' category and the ``Translation'' task is more likely to elicit harmful responses from LLMs. We also conduct extensive ablation studies to investigate potential factors affecting this novel misuse vulnerability. We hope that our study could inspire enhanced safety measures among stakeholders to mitigate this overlooked content-level ethical risk.
要約:
Backdoor attacks against pre-trained models (PTMs) have traditionally operated under an ``immediacy assumption,'' where malicious behavior manifests instantly upon trigger occurrence. This work revisits and challenges this paradigm by introducing \textit{\textbf{Delayed Backdoor Attacks (DBA)}}, a new class of threats in which activation is temporally decoupled from trigger exposure. We propose that this \textbf{temporal dimension} is the key to unlocking a previously infeasible class of attacks: those that use common, everyday words as triggers. To examine the feasibility of this paradigm, we design and implement a proof-of-concept prototype, termed \underline{D}elayed Backdoor Attacks Based on \underline{N}onlinear \underline{D}ecay (DND). DND embeds a lightweight, stateful logic module that postpones activation until a configurable threshold is reached, producing a distinct latency phase followed by a controlled outbreak. We derive a formal model to characterize this latency behavior and propose a dual-metric evaluation framework (ASR and ASR$_{delay}$) to empirically measure the delay effect. Extensive experiments on four (natural language processing)NLP benchmarks validate the core capabilities of DND: it remains dormant for a controllable duration, sustains high clean accuracy ($\ge$94\%), and achieves near-perfect post-activation attack success rates ($\approx$99\%, The average of other methods is below 95\%.). Moreover, DND exhibits resilience against several state-of-the-art defenses. This study provides the first empirical evidence that the temporal dimension constitutes a viable yet unprotected attack surface in PTMs, underscoring the need for next-generation, stateful, and time-aware defense mechanisms.
要約:
Rapid progress in generative AI has given rise to Compound AI systems - pipelines comprised of multiple large language models (LLM), software tools and database systems. Compound AI systems are constructed on a layered traditional software stack running on a distributed hardware infrastructure. Many of the diverse software components are vulnerable to traditional security flaws documented in the Common Vulnerabilities and Exposures (CVE) database, while the underlying distributed hardware infrastructure remains exposed to timing attacks, bit-flip faults, and power-based side channels. Today, research targets LLM-specific risks like model extraction, training data leakage, and unsafe generation -- overlooking the impact of traditional system vulnerabilities.
This work investigates how traditional software and hardware vulnerabilities can complement LLM-specific algorithmic attacks to compromise the integrity of a compound AI pipeline. We demonstrate two novel attacks that combine system-level vulnerabilities with algorithmic weaknesses: (1) Exploiting a software code injection flaw along with a guardrail Rowhammer attack to inject an unaltered jailbreak prompt into an LLM, resulting in an AI safety violation, and (2) Manipulating a knowledge database to redirect an LLM agent to transmit sensitive user data to a malicious application, thus breaching confidentiality. These attacks highlight the need to address traditional vulnerabilities; we systematize the attack primitives and analyze their composition by grouping vulnerabilities by their objective and mapping them to distinct stages of an attack lifecycle. This approach enables a rigorous red-teaming exercise and lays the groundwork for future defense strategies.
要約:
The Iridium Low Earth Orbit (LEO) satellite constellation remains a unique provider of global communications for critical industries, governments, and private users, serving over 2.5 million active subscribers despite recent market competition. In contrast to terrestrial wireless standards such as 3GPP, Iridium protocol specifications are proprietary and have not undergone rigorous, public, and systematic security evaluation. In this work, we present the first comprehensive security analysis of Iridium authentication and radio link protocols. We reverse engineer Iridium SIM-based authentication mechanism and demonstrate that the secret key can be extracted from the SIM card, enabling full device cloning and impersonation attacks. Leveraging a month-long dataset of Iridium up- and downlink satellite traffic, we further show that nearly all signaling and radio communication protocols currently in use lack encryption, resulting in the exposure of sensitive information in cleartext over the air such as login credentials and large volumes of personal data. Finally, we develop custom software-defined radio (SDR) tools to carry out spoofing and jamming attacks, revealing that modestly equipped adversaries can inject falsified messages or disrupt the Iridium service locally due to the absence of source authentication. Our findings uncover systemic vulnerabilities in the Iridium radio link and highlight the urgent need for users of critical applications to transition to more secure communication radio links.
要約:
Federated Language Model (FedLM) allows a collaborative learning without sharing raw data, yet it introduces a critical vulnerability, as every untrustworthy client may leak the received functional model instance. Current watermarking schemes for FedLM often require white-box access and client-side cooperation, providing only group-level proof of ownership rather than individual traceability. We propose EmbTracker, a server-side, traceable black-box watermarking framework specifically designed for FedLMs. EmbTracker achieves black-box verifiability by embedding a backdoor-based watermark detectable through simple API queries. Client-level traceability is realized by injecting unique identity-specific watermarks into the model distributed to each client. In this way, a leaked model can be attributed to a specific culprit, ensuring robustness even against non-cooperative participants. Extensive experiments on various language and vision-language models demonstrate that EmbTracker achieves robust traceability with verification rates near 100\%, high resilience against removal attacks (fine-tuning, pruning, quantization), and negligible impact on primary task performance (typically within 1-2\%).
要約:
Differential Privacy (DP) is widely adopted in data management systems to enable data sharing with formal disclosure guarantees. A central systems challenge is understanding how DP noise translates into effective protection against inference attacks, since this directly determines achievable utility. Most existing analyses focus only on membership inference -- capturing only a threat -- or rely on reconstruction robustness (ReRo). However, under realistic assumptions, we show that ReRo can yield misleading risk estimates and violate claimed bounds, limiting their usefulness for principled DP calibration and auditing.
This paper introduces reconstruction advantage, a unified risk metric that consistently captures risk across membership inference, attribute inference, and data reconstruction. We derive tight bounds that relate DP noise to adversarial advantage and characterize optimal adversarial strategies for arbitrary DP mechanisms and attacker knowledge. These results enable risk-driven noise calibration and provide a foundation for systematic DP auditing. We show that reconstruction advantage improves the accuracy and scope of DP auditing and enables more effective utility-privacy trade-offs in DP-enabled data management systems.
要約:
Applications like Enterprise Resource Planning (ERP) systems have become an indispensable part of the corporate digital infrastructure. These systems store sensitive data about customers, suppliers, and employees, and thus companies have to process these data in accordance with applicable regulations like the GDPR (the EU General Data Protection Regulation). This can be challenging due to a variety of reasons. For example, prior research has shown that developers sometimes lack knowledge about privacy.
In this work, we focus on privacy in ERP systems in the context of an international consultancy firm. We investigate the privacy awareness regarding privacy-by-design and data minimization of two important populations: developers of ERP systems and managers and consultants responsible for services related to ERP systems. Applying thematic analysis, we elicit privacy behavioral models of these two populations using Fogg's Behavioral Model (FBM) framework. Our findings provide a means to stimulate more adequate privacy-related behaviors for developers and consultants.
要約:
Online social networks facilitate user engagement and information sharing but are also rife with misinformation and deception. Research on trust modeling in online social networks focuses on developing computational models or algorithms to measure trust relationships, assess the reliability of shared content, and detect spam or malicious activities. However, most existing review papers either briefly mention the concept of trust or focus on a single category of trust models. In this paper, we offer a comprehensive categorization and review of state-of-the-art trust models developed for online social networks. First, we explore theories and models related to trust in psychology and identify several factors that influence the formation and evolution of online trust. Next, state-of-the-art trust models are categorized based on their algorithmic foundations. For each category, the modeling mechanisms are investigated, and their unique contributions to quantitative trust modeling are highlighted. Subsequently, we provide an implementation-centric trust modeling handbook, which summarizes available datasets, trust-related features, promising modeling techniques, and feasible application scenarios. Finally, the findings of the literature review are summarized, and unresolved challenges are discussed.
要約:
Large language models remain vulnerable to jailbreak attacks, yet we still lack a systematic understanding of how jailbreak success scales with attacker effort across methods, model families, and harm types. We initiate a scaling-law framework for jailbreaks by treating each attack as a compute-bounded optimization procedure and measuring progress on a shared FLOPs axis. Our systematic evaluation spans four representative jailbreak paradigms, covering optimization-based attacks, self-refinement prompting, sampling-based selection, and genetic optimization, across multiple model families and scales on a diverse set of harmful goals. We investigate scaling laws that relate attacker budget to attack success score by fitting a simple saturating exponential function to FLOPs--success trajectories, and we derive comparable efficiency summaries from the fitted curves. Empirically, prompting-based paradigms tend to be the most compute-efficient compared to optimization-based methods. To explain this gap, we cast prompt-based updates into an optimization view and show via a same-state comparison that prompt-based attacks more effectively optimize in prompt space. We also show that attacks occupy distinct success--stealthiness operating points with prompting-based methods occupying the high-success, high-stealth region. Finally, we find that vulnerability is strongly goal-dependent: harms involving misinformation are typically easier to elicit than other non-misinformation harms.
要約:
In modern transportation networks, adversaries can manipulate routing algorithms using false data injection attacks, such as simulating heavy traffic with multiple devices running crowdsourced navigation applications, to mislead vehicles toward suboptimal routes and increase congestion. To address these threats, we formulate a strategically zero-sum game between an attacker, who injects such perturbations, and a defender, who detects anomalies based on the observed travel times of network edges. We propose a computational method based on multi-agent reinforcement learning to compute a Nash equilibrium of this game, providing an optimal detection strategy, which ensures that total travel time remains within a worst-case bound, even in the presence of an attack. We present an extensive experimental evaluation that demonstrates the robustness and practical benefits of our approach, providing a powerful framework to improve the resilience of transportation networks against false data injection. In particular, we show that our approach yields approximate equilibrium policies and significantly outperforms baselines for both the attacker and the defender.
要約:
Graph-based Retrieval-Augmented Generation (GraphRAG) constructs the Knowledge Graph (KG) from external databases to enhance the timeliness and accuracy of Large Language Model (LLM) generations.However,this reliance on external data introduces new attack surfaces.Attackers can inject poisoned texts into databases to manipulate LLMs into producing harmful target responses for attacker-chosen queries.Existing research primarily focuses on attacking conventional RAG systems.However,such methods are ineffective against GraphRAG.This robustness derives from the KG abstraction of GraphRAG,which reorganizes injected text into a graph before retrieval,thereby enabling the LLM to reason based on the restructured context instead of raw poisoned passages.To expose latent security vulnerabilities in GraphRAG,we propose Knowledge Evolution Poison (KEPo),a novel poisoning attack method specifically designed for GraphRAG.For each target query,KEPo first generates a toxic event containing poisoned knowledge based on the target answer.By fabricating event backgrounds and forging knowledge evolution paths from original facts to the toxic event,it then poisons the KG and misleads the LLM into treating the poisoned knowledge as the final result.In multi-target attack scenarios,KEPo further connects multiple attack corpora,enabling their poisoned knowledge to mutually reinforce while expanding the scale of poisoned communities,thereby amplifying attack effectiveness.Experimental results across multiple datasets demonstrate that KEPo achieves state-of-the-art attack success rates for both single-target and multi-target attacks,significantly outperforming previous methods.
要約:
This paper establishes the strict optimality in precision for frequency estimation under local differential privacy (LDP). We prove that a frequency estimator with a symmetric and extremal configuration, and a constant support size equal to an optimized value, is sufficient to achieve maximum precision. Furthermore, we derive that the communication cost of such an optimal estimator can be as low as $\log_2(\frac{d(d-1)}{2}+1)$, where $d$ denotes the dictionary size, and propose an algorithm to generate this optimal estimator.
In addition, we introduce a modified Count-Mean Sketch and demonstrate that it is practically indistinguishable from theoretical optimality with a sufficiently large dictionary size (e.g., $d=100$ for a privacy factor of $\epsilon = 1$). We compare existing methods with our proposed optimal estimator to provide selection guidelines for practical deployment. Finally, the performance of these estimators is evaluated experimentally, showing that the empirical results are consistent with our theoretical derivations.
要約:
Membership inference attacks (MIAs) are becoming standard tools for auditing the privacy of machine learning models. The leading attacks -- LiRA (Carlini et al., 2022) and RMIA (Zarifzadeh et al., 2024) -- appear to use distinct scoring strategies, while the recently proposed BASE (Lassila et al., 2025) was shown to be equivalent to RMIA, making it difficult for practitioners to choose among them. We show that all three are instances of a single exponential-family log-likelihood ratio framework, differing only in their distributional assumptions and the number of parameters estimated per data point. This unification reveals a hierarchy (BASE1-4) that connects RMIA and LiRA as endpoints of a spectrum of increasing model complexity. Within this framework, we identify variance estimation as the key bottleneck at small shadow-model budgets and propose BaVarIA, a Bayesian variance inference attack that replaces threshold-based parameter switching with conjugate normal-inverse-gamma priors. BaVarIA yields a Student-t predictive (BaVarIA-t) or a Gaussian with stabilized variance (BaVarIA-n), providing stable performance without additional hyperparameter tuning. Across 12 datasets and 7 shadow-model budgets, BaVarIA matches or improves upon LiRA and RMIA, with the largest gains in the practically important low-shadow-model and offline regimes.
要約:
The rapid evolution of embodied agents has accelerated the deployment of household robots in real-world environments. However, unlike structured industrial settings, household spaces introduce unpredictable safety risks, where system limitations such as perception latency and lack of common sense knowledge can lead to dangerous errors. Current safety evaluations, often restricted to static images, text, or general hazards, fail to adequately benchmark dynamic unsafe action detection in these specific contexts. To bridge this gap, we introduce \textbf{HomeSafe-Bench}, a challenging benchmark designed to evaluate Vision-Language Models (VLMs) on unsafe action detection in household scenarios. HomeSafe-Bench is contrusted via a hybrid pipeline combining physical simulation with advanced video generation and features 438 diverse cases across six functional areas with fine-grained multidimensional annotations. Beyond benchmarking, we propose \textbf{Hierarchical Dual-Brain Guard for Household Safety (HD-Guard)}, a hierarchical streaming architecture for real-time safety monitoring. HD-Guard coordinates a lightweight FastBrain for continuous high-frequency screening with an asynchronous large-scale SlowBrain for deep multimodal reasoning, effectively balancing inference efficiency with detection accuracy. Evaluations demonstrate that HD-Guard achieves a superior trade-off between latency and performance, while our analysis identifies critical bottlenecks in current VLM-based safety detection.
要約:
Crypto Key Opinion Leaders (KOLs) shape Web3 narratives and retail investment behaviour. In volatile, high-risk markets, their credibility becomes a key determinant of their influence on followers. Yet prior research has focused on lifestyle influencers or generic financial commentary, leaving crypto KOLs' understandings of motivation, credibility, and responsibility underexplored. Drawing on interviews with 13 KOLs and self-determination theory (SDT), we examine how psychological needs are negotiated alongside monetisation and community expectations. Whereas prior work treats finfluencer credibility as a set of static credentials, our findings reveal it to be a self-determined, ethically enacted practice. We identify four community-recognised markers of credibility: self-regulation, bounded epistemic competence, accountability, and reflexive self-correction. This reframes credibility as socio-technical performance, extending SDT into high-risk crypto ecosystems. Methodologically, we employ a hybrid human-LLM thematic analysis. The study surfaces implications for designing credibility signals that prioritise transparency over hype.
要約:
This article, a lightly adapted version of Perplexity's response to NIST/CAISI Request for Information 2025-0035, details our observations and recommendations concerning the security of frontier AI agents. These insights are informed by Perplexity's experience operating general-purpose agentic systems used by millions of users and thousands of enterprises in both controlled and open-world environments. Agent architectures change core assumptions around code-data separation, authority boundaries, and execution predictability, creating new confidentiality, integrity, and availability failure modes. We map principal attack surfaces across tools, connectors, hosting boundaries, and multi-agent coordination, with particular emphasis on indirect prompt injection, confused-deputy behavior, and cascading failures in long-running workflows. We then assess current defenses as a layered stack: input-level and model-level mitigations, sandboxed execution, and deterministic policy enforcement for high-consequence actions. Finally, we identify standards and research gaps, including adaptive security benchmarks, policy models for delegation and privilege control, and guidance for secure multi-agent system design aligned with NIST risk management principles.
要約:
We present STAMP (Selective Task-Aware Mechanism for Text Privacy), a new framework for task-aware text privatization that achieves an improved privacy-utility trade-off. STAMP selectively allocates privacy budgets across tokens by jointly considering (i) each token's importance to the downstream task (as measured via a task- or query-specific representation), and (ii) its privacy sensitivity (e.g., names, dates, identifiers). This token-level partitioning enables fine-grained, group-wise control over the level of noise applied to different parts of the input, balancing privacy protection with task relevance. To privatize individual token embeddings, we introduce the polar mechanism, which perturbs only the direction of embeddings on the unit sphere while preserving their magnitude. Decoding is performed via cosine nearest-neighbor search, aligning the perturbation geometry with the decoding geometry. Unlike isotropic noise mechanisms, the polar mechanism maintains semantic neighborhoods in the embedding space and better preserves downstream utility. Experimental evaluations on SQuAD, Yelp, and AG News datasets demonstrate that STAMP, when combined with the normalized polar mechanism, consistently achieves superior privacy-utility trade-offs across varying per-token privacy budgets.
要約:
The cumulative distribution function (CDF) is fundamental for characterizing random variables, making it essential in applications that require privacy-preserving data analysis. This paper introduces a novel framework for constructing differentially private CDFs inspired by functional analysis and the functional mechanism. We develop two variants: a polynomial projection method, which projects the empirical CDF into a polynomial space, and a sparse approximation method via matching pursuit, which projects it into arbitrary function spaces constructed from dictionaries. In both cases, the empirical CDF is approximated within the chosen space, and the corresponding coefficients are privatized to guarantee differential privacy. Compared with existing approaches such as histogram queries and adaptive quantiles, our methods achieve comparable or superior performance. Our methods are particularly well-suited to decentralized settings and scenarios where CDFs must be efficiently updated with newly collected or streaming data. In addition, we investigate the influence of parameters such as dictionary size and systematically evaluate different dictionary constructions, including Legendre polynomials, B-splines, and distribution-based functions. Overall, our contributions advance the development of practical and reliable methods for privacy-preserving CDF estimation.
要約:
Ransomware's escalating sophistication necessitates tamper-resistant, off-host detection solutions that capture deep disk activity beyond the reach of a compromised operating system. Existing detection systems use host/kernel signals or rely on coarse block-I/O statistics, which are easy to evade and miss filesystem semantics. The filesystem layer itself remains underexplored as a source of robust indicators for storage-controller-level defense. To address this, we present SHIELD: a Secure Host-Independent Extensible Metric Logging Framework for Tamper-Proof Detection and Real-Time Mitigation of Ransomware Threats. SHIELD parses and logs filesystem-level features that cannot be evaded or obfuscated to expose deep disk activity for real-time ML-based detection and mitigation. We evaluate the efficacy of these metrics through experiments with both binary (benign vs. malicious behavior) and multiclass (ransomware strain identification) classifiers. In evaluations across diverse ransomware families, the best binary classifier achieves 97.29% accuracy in identifying malicious disk behavior. A hardware-only feature set that excludes all transport-layer metrics retains 95.97% accuracy, confirming feasibility for FPGA/ASIC deployment within the storage controller datapath. In a proof-of-concept closed-loop deployment, SHIELD halts disk operations within tens of disk actions, limiting targeted files affected to <0.4% for zero-shot strains at small action-windows, while maintaining low false-positive rates (<3.6%) on unseen benign applications. Results demonstrate that filesystem-aware, off-host telemetry enables accurate, resilient ransomware detection, including intermittent/partial encryption, and is practical for embedded integration in storage controllers or alongside other defense mechanisms.
要約:
Jailbreak attacks pose a serious threat to Large Language Models (LLMs) by bypassing their safety mechanisms. A truly advanced jailbreak is defined not only by its effectiveness but, more critically, by its stealthiness. However, existing methods face a fundamental trade-off between semantic stealth (hiding malicious intent) and linguistic stealth (appearing natural), leaving them vulnerable to detection. To resolve this trade-off, we propose StegoAttack, a framework that leverages steganography. The core insight is to embed a harmful query within a benign, semantically coherent paragraph. This design provides semantic stealth by concealing the existence of malicious content and ensures linguistic stealth by maintaining the natural fluency of the cover paragraph. We evaluate StegoAttack on four state-of-the-art, safety-aligned LLMs, including GPT-5 and Gemini-3, and benchmark it against eight leading jailbreak methods. Our results show that StegoAttack achieves an average attack success rate (ASR) of 95.50%, outperforming existing baselines across all four models. Critically, its ASR drops by less than 27.00% under external detectors, while maintaining natural language distribution. This demonstrates that steganography effectively decouples linguistic and semantic stealth, thereby posing a fully concealed yet highly effective security threat. The code is available at https://github.com/GenggengSvan/StegoAttack
要約:
The advent of quantum computing poses a critical threat to RSA cryptography, as Shor's algorithm can factor integers in polynomial time. While post-quantum cryptography standards offer long-term solutions, their deployment faces significant compatibility and infrastructure challenges. This paper proposes the Constrained R\'enyi Entropy Optimization (CREO) framework, a mathematical approach to potentially enhance the quantum resistance of RSA while maintaining full backward compatibility. By constraining the proximity of RSA primes ($|p-q| < \gamma \sqrt{pq}$), CREO reduces the distinguishability of quantum states in Shor's algorithm, as quantified by R\'enyi entropy. Our analysis demonstrates that for a $k$-bit modulus with $\gamma = k^{-1/2+\epsilon}$, the number of quantum measurements required for reliable period extraction scales as $\Omega(k^{2+\epsilon})$, compared to $\mathcal{O}(k^3)$ for standard RSA under idealized assumptions. This represents a systematic increase in quantum resource requirements. The framework is supported by constructive existence proofs for such primes using prime gap theorems and establishes conceptual security connections to lattice-based problems. CREO provides a new research direction for exploring backward-compatible cryptographic enhancements during the extended transition to post-quantum standards, offering a mathematically grounded pathway to harden widely deployed RSA infrastructure without requiring immediate protocol or infrastructure replacement.
要約:
Software Bills of Materials (SBOMs) have become a regulatory requirement for improving software supply chain security and trust by means of transparency regarding components that make up software artifacts. However, enterprise and regulated software vendors commonly wish to restrict who can view confidential software metadata recorded in their SBOMs due to intellectual property or security vulnerability information. To address this tension between transparency and confidentiality, we propose Petra, an SBOM exchange system that empowers software vendors to interoperably compose and distribute redacted SBOM data using selective encryption. Petra enables software consumers to search redacted SBOMs for answers to specific security questions without revealing information they are not authorized to access. Petra leverages a format-agnostic, tamper-evident SBOM representation to generate efficient and confidentiality-preserving integrity proofs, allowing interested parties to cryptographically audit and establish trust in redacted SBOMs. Exchanging redacted SBOMs in our Petra prototype requires less than 1 extra KB per SBOM, and SBOM decryption accounts for at most 1% of the performance overhead during an SBOM query
要約:
The rapid growth of research in LLM safety makes it hard to track all advances. Benchmarks are therefore crucial for capturing key trends and enabling systematic comparisons. Yet, it remains unclear why certain benchmarks gain prominence, and no systematic assessment has been conducted on their academic influence or code quality. This paper fills this gap by presenting the first multi-dimensional evaluation of the influence (based on five metrics) and code quality (based on both automated and human assessment) on LLM safety benchmarks, analyzing 31 benchmarks and 382 non-benchmarks across prompt injection, jailbreak, and hallucination. We find that benchmark papers show no significant advantage in academic influence (e.g., citation count and density) over non-benchmark papers. We uncover a key misalignment: while author prominence correlates with paper influence, neither author prominence nor paper influence shows a significant correlation with code quality. Our results also indicate substantial room for improvement in code and supplementary materials: only 39% of repositories are ready-to-use, 16% include flawless installation guides, and a mere 6% address ethical considerations. Given that the work of prominent researchers tends to attract greater attention, they need to lead the effort in setting higher standards.
要約:
Self-Sovereign Digital Identity (SSDI) enables individuals to control their own identity assertions and data, rather than relying on centralized or federated systems prone to large-scale data breaches. By eliminating centralized databases maintained by service providers and identity brokers, SSDIs offer enhanced security and privacy. However, adoption remains slow, and research in this area lacks systematization and uniformity. To address these gaps, we present a comprehensive systematization of knowledge on self-sovereign digital identities, with a primary focus on identifying the challenges that impede real-world adoption. We survey 80 academic and non-academic sources and identify six major challenges: (i) binding a single identity to one individual or organization, (ii) the absence of mature cryptographic and communication protocols, (iii) significant usability barriers, (iv) regulatory and oversight gaps, (v) bootstrapping to critical-mass adoption, and (vi) dependence on a permissionless, decentralized, yet singular infrastructure that may expose unforeseen vulnerabilities over time. We then analyze 47 scientific publications and find that the vast majority focus on blockchain-based solutions rather than generalized SSDI architectures. Additionally, we catalog 12 real-world, production-grade SSDI applications. Our evaluation of these solutions reveals that self-sovereignty is, in practice, a spectrum rather than a binary property. Finally, we explore the frontiers of SSDI by identifying major trends, open problems, and opportunities for future research. We hope this systematization will help advance the shift from centralized to self-sovereign digital identities in a disciplined and impactful way.
要約:
Within the Strongly Connected Components (SCCs) formed during the temporal evolution of a Cloud permission graph, we use the Burau Lyapunov exponent LE as an algebraic probe to locate the boundary between two risks regimes. We prove that no Abelian statistic (edge counts, net privilege flow, gate-firing rates) can determine LE. The non-commutation advantage is small, but actionable: we show how to leverage it to discriminate the two outstanding risk regimes, that we call dispersed and focused, for automating classification and governing remediation of risky Cloud permission flows.
要約:
As open-weights generative AI rapidly proliferates, the ability to synthesize hyper-realistic media has introduced profound challenges to digital trust. Automated disinformation and AI-generated imagery have made robust digital provenance a critical cybersecurity imperative. Currently, state-of-the-art invisible watermarks operate within one of two primary mathematical manifolds: the spatial domain (post-generation pixel embedding) or the latent domain (pre-generation frequency embedding). While existing literature frequently evaluates these models against isolated, classical distortions, there is a critical lack of rigorous, comparative benchmarking against modern generative AI editing tools. In this study, we empirically evaluate two leading representative paradigms, RivaGAN (Spatial) and Tree-Ring (Latent), utilizing an automated Attack Simulation Engine across 30 intensity intervals of geometric and generative perturbations. We formalize an "Adversarial Evasion Region" (AER) framework to measure cryptographic degradation against semantic visual retention (OpenCLIP > 75.0). Our statistical analysis ($n=100$ per interval, $MOE = \pm 3.92\%$) reveals that these domains possess mutually exclusive, mathematically orthogonal vulnerabilities. Spatial watermarks experience severe cryptographic degradation under algorithmic pixel-rewriting (exhibiting a 67.47% AER evasion rate under Img2Img translation), whereas latent watermarks exhibit profound fragility against geometric misalignment (yielding a 43.20% AER evasion rate under static cropping). By proving that single-domain watermarking is fundamentally insufficient against modern adversarial toolsets, this research exposes a systemic vulnerability in current digital provenance standards and establishes the foundational exigence for future multi-domain cryptographic architectures.
要約:
The expansion of Internet of Things (IoT) devices has increased the attack surface of networks, necessitating a robust and adaptive intrusion detection systems. Machine learning based systems have been considered promising in enhancing the detection performance. Federated learning settings enabled us to train models from network intrusion data collected from clients in a privacy preserving manner. However, the effectiveness of these systems can degrade over time due to concept drift, where patterns in data evolve as attackers develop new techniques. Realistic detection models should be non-stationary, so they can be continuously updated with new intrusion data while maintaining their detection capability for older data. As IoT environments are resource constrained, updates should consume minimal computational resources. This study provides a comprehensive performance analysis of incremental federated learning in enhancing the long term performance of non stationary IDS models in IoT networks. Specifically, we propose LSTM models within a federated learning setting to evaluate incremental learning approaches that utilize data and model-based measures against catastrophic learning under drift conditions. Using the CICIoMT2024 dataset, which includes various attack variants across five major categories, we conduct both binary and multiclass classification to provide a granular analysis of the intrusion detection task. Our results show that cumulative incremental learning and representative learning provide the most stable performance under drift, while retention-based methods offer a strong accuracy and latency trade off. The study offers new insights into the interplay between training strategy performance and latency in dynamic IoT environments, aiming to inform the development of more resilient IDS solutions considering the resource constraints in IoT devices.
要約:
We initiate an investigation of learning tasks in a setting where the learner is given access to two competing provers, only one of which is honest. Specifically, we consider the power of such learners in assessing purported properties of opaque models. Following prior work in complexity theory that considers the power of competing provers in various settings, we call this setting refereed learning.
After formulating a general definition of refereed learning tasks, we show refereed learning protocols that obtain a level of accuracy that far exceeds what is obtainable at comparable cost without provers, or even with a single prover. We concentrate on the task of choosing the better one out of two black-box models, with respect to some ground truth. While we consider a range of parameters, perhaps our most notable result is in the high-precision range: For all $\varepsilon>0$ and ambient dimension $d$, our learner makes only one query to the ground truth function, communicates only $(1+\frac{1}{\varepsilon^2})\cdot\text{poly}(d)$ bits with the provers, and outputs a model whose loss is within a multiplicative factor of $(1+\varepsilon)$ of the best model's loss. Obtaining comparable loss with a single prover would require the learner to access the ground truth at almost all of the points in the domain.
We also present lower bounds that demonstrate the optimality of our protocols in a number of respects, including prover complexity, number of samples, and need for query access.
要約:
The rapid proliferation of pretrained models and open repositories has made model merging a convenient yet risky practice, allowing free-riders to combine fine-tuned models into a new multi-capability model without authorization. Such unauthorized model merging not only violates intellectual property rights but also undermines model ownership and accountability. To address this issue, we present MergeGuard, a proactive dual-stage weight protection framework that disrupts merging compatibility while maintaining task fidelity. In the first stage, we redistribute task-relevant information across layers via L2-regularized optimization, ensuring that important gradients are evenly dispersed. In the second stage, we inject structured perturbations to misalign task subspaces, breaking curvature compatibility in the loss landscape. Together, these stages reshape the model's parameter geometry such that merged models collapse into destructive interference while the protected model remains fully functional. Extensive experiments on both vision (ViT-L-14) and language (Llama2, Gemma2, Mistral) models demonstrate that MergeGuard reduces merged model accuracy by up to 90% with less than 1.5% performance loss on the protected model.
要約:
We present a text-reconstruction attack on mixture-of-experts (MoE) language models that recovers tokens from expert selections alone. In MoE models, each token is routed to a subset of expert subnetworks; we show these routing decisions leak substantially more information than previously understood. Prior work using logistic regression achieves limited reconstruction; we show that a 3-layer MLP improves this to 63.1% top-1 accuracy, and that a transformer-based sequence decoder recovers 91.2% of tokens top-1 (94.8% top-10) on 32-token sequences from OpenWebText after training on 100M tokens. These results connect MoE routing to the broader literature on embedding inversion. We outline practical leakage scenarios (e.g., distributed inference and side channels) and show that adding noise reduces but does not eliminate reconstruction. Our findings suggest that expert selections in MoE deployments should be treated as sensitive as the underlying text.