要約:
Domain fine-tuning is a common path to deploy small instruction-tuned language models as customer-support assistants, yet its effects on safety-aligned behavior and privacy are not well understood. In real deployments, such assistants receive a mixture of benign in-domain requests and out-of-domain user queries that are emotional, philosophical, or adversarial. Even when the target domain is benign, specialization may shift model behavior in ways that weaken refusal, increase harmful compliance, and induce privacy leakage.
We present a controlled empirical study of how training data composition (presence vs.\ removal of PII) and fine-tuning configuration (role-swapping (RS)) shape safety and out-of-domain behavior in open-source chat models up to 8B parameters. We fine-tune each model on 5{,}000 real booking-support message pairs under three settings: \textsc{NoPII-NoRS}, \textsc{PII-NoRS}, and \textsc{PII-RS} (role-swapped). We evaluate safety using \textsc{SORRY-Bench}~\cite{xie2024sorry} adversarial prompts and assess out-of-domain behavior using a suite of philosophical questions~\cite{betley2025emergent}.
Across models, domain fine-tuning causes a large distributional shift from high-quality refusals toward harmful compliance on \textsc{SORRY-Bench}, with the most severe degradation when PII is present in the fine-tuning data. For example, macro-averaged strong refusal drops from $42.6\%$ in base models to single digits after fine-tuning, while PII-bearing runs additionally exhibit double-digit rates of harmful responses with PII leakage. On philosophical queries, fine-tuned models frequently exhibit domain anchoring and, when trained with PII, leak sensitive identifiers in irrelevant contexts. Role-swapping partially mitigates PII leakage but does not reliably restore refusal behavior.
要約:
We introduce Reverse CAPTCHA, an evaluation framework that tests whether large language models follow invisible Unicode-encoded instructions embedded in otherwise normal-looking text. Unlike traditional CAPTCHAs that distinguish humans from machines, our benchmark exploits a capability gap: models can perceive Unicode control characters that are invisible to human readers. We evaluate five models from two providers across two encoding schemes (zero-width binary and Unicode Tags), four hint levels, two payload framings, and with tool use enabled or disabled. Across 8,308 model outputs, we find that tool use dramatically amplifies compliance (Cohen's h up to 1.37, a large effect), that models exhibit provider-specific encoding preferences (OpenAI models decode zero-width binary; Anthropic models prefer Unicode Tags), and that explicit decoding instructions increase compliance by up to 95 percentage points within a single model and encoding. All pairwise model differences are statistically significant (p < 0.05, Bonferroni-corrected). These results highlight an underexplored attack surface for prompt injection via invisible Unicode payloads.
要約:
Retrieval-augmented generation (RAG) has emerged as a powerful paradigm for enhancing multimodal large language models by grounding their responses in external, factual knowledge and thus mitigating hallucinations. However, the integration of externally sourced knowledge bases introduces a critical attack surface. Adversaries can inject malicious multimodal content capable of influencing both retrieval and downstream generation. In this work, we present MM-MEPA, a multimodal poisoning attack that targets the metadata components of image-text entries while leaving the associated visual content unaltered. By only manipulating the metadata, MM-MEPA can still steer multimodal retrieval and induce attacker-desired model responses. We evaluate the attack across multiple benchmark settings and demonstrate its severity. MM-MEPA achieves an attack success rate of up to 91\% consistently disrupting system behaviors across four retrievers and two multimodal generators. Additionally, we assess representative defense strategies and find them largely ineffective against this form of metadata-only poisoning. Our findings expose a critical vulnerability in multimodal RAG and underscore the urgent need for more robust, defense-aware retrieval and knowledge integration methods.
要約:
The proliferation of AI-generated text has intensified the need for reliable authorship verification, yet current output-based methods are increasingly unreliable. We observe that the ordinary typing interface captures rich cognitive signatures, measurable patterns in keystroke timing that reflect the planning, translating, and revising stages of genuine composition. Drawing on large-scale keystroke datasets comprising over 136 million events, we define the Cognitive Load Correlation (CLC) and show it distinguishes genuine composition from mechanical transcription. We present a non-intrusive verification framework that operates within existing writing interfaces, collecting only timing metadata to preserve privacy. Our analytical evaluation estimates 85 to 95 percent discrimination accuracy under stated assumptions, while limiting biometric leakage via evidence quantization. We analyze the adversarial robustness of cognitive signatures, showing they resist timing-forgery attacks that defeat motor-level authentication because the cognitive channel is entangled with semantic content. We conclude that reframing authorship verification as a human-computer interaction problem provides a privacy-preserving alternative to invasive surveillance.
要約:
Process attestation systems verify that a continuous physical process, such as human authorship, actually occurred, rather than merely checking system state. These systems face a fundamental dependability challenge: the evidence collection infrastructure must remain available and tamper-resistant even when the attesting party controls the platform. Trusted Execution Environments (TEEs) provide hardware-enforced isolation that can address this challenge, but their integration with continuous process attestation introduces novel resilience requirements not addressed by existing frameworks. We present the first architecture for continuous process attestation evidence collection inside TEEs, providing hardware-backed tamper resistance against trust-inverted adversaries with graduated input assurance from software-channel integrity (Tier 1) through hardware-bound input (Tier 3). We develop a Markov-chain dependability model quantifying Evidence Chain Availability (ECA), Mean Time Between Evidence Gaps (MTBEG), and Recovery Time Objectives (RTO). We introduce a resilient evidence chain protocol maintaining chain integrity across TEE crashes, network partitions, and enclave migration. Our security analysis derives formal bounds under combined threat models including trust inversion and TEE side channels, parameterized by a conjectural side-channel leakage bound esc that requires empirical validation. Evaluation on Intel SGX demonstrates under 25% per-checkpoint CPU overhead (<0.3% of the 30 s checkpoint interval), >99.5% Evidence Chain Availability (ECA) (the fraction of session time with active evidence collection) in Monte Carlo simulation under Poisson failure models, and sealed-state recovery under 200 ms.
要約:
Process attestation verifies human authorship by collecting behavioral biometric evidence, including keystroke dynamics, typing patterns, and editing behavior, during the creative process. However, the very data needed to prove authenticity can reveal intimate details about an author's cognitive state, health conditions, and identity, constituting sensitive biometric data under GDPR Article 9. We resolve this privacy-attestation paradox using zero-knowledge proofs. We present ZK-PoP, a construction that allows a verifier to confirm that (a) sequential work function chains were computed correctly, (b) behavioral feature vectors fall within human population distributions, and (c) content evolution is consistent with incremental human editing, all without learning the underlying behavioral data, exact timing, or intermediate content. Our construction uses Groth16 proofs over arithmetic circuits with Pedersen commitments and Bulletproof range proofs. We prove that ZK-PoP is computationally zero-knowledge, computationally sound, and achieves unlinkability across sessions. Evaluation shows proof generation in under 30 seconds for a 1-hour writing session, with 192-byte proofs verifiable in 8.2 ms, while incurring less than 5% accuracy loss in simulation at practical privacy levels (epsilon >= 1.0) compared to non-private baselines.
要約:
Intrusion detection in IoT and industrial networks requires models that can detect rare attacks at low false-positive rates while remaining reliable under evolving traffic and limited labels. Existing IDS solutions often report strong in-distribution accuracy, but they may degrade when evaluated on future traffic, unseen (zero-day) attack families, or adversarial feature manipulations, and many systems provide limited evidence to support analyst triage. To address these gaps, we propose ThreatFormer- IDS, a Transformer-based sequence modeling framework that converts flow records into time-ordered windows and learns contextual representations for robust intrusion screening. The method combines (i) weighted supervised learning for imbalanced detection, (ii) masked self-supervised learning to improve representation stability under drift and sparse labels, (iii) PGDbased adversarial training with scale-normalized perturbations to strengthen resilience against feature-level evasion, and (iv) Integrated Gradients attribution to highlight influential time steps and features for each alert. On the ToN IoT benchmark with chronological evaluation, ThreatFormer-IDS achieves AUCROC 0.994, AUC-PR 0.956, and Recall@1%FPR 0.910, outperforming strong tree-based and sequence baselines. Under a zero-day protocol with held-out attack families, it maintains superior generalization (AUC-PR 0.721, Recall@1%FPR 0.783). Robustness tests further show slower degradation in AUCPR as the adversarial budget increases, confirming improved stability under bounded perturbations. Overall, ThreatFormer- IDS provides a unified, deployment-oriented IDS pipeline that balances detection quality, zero-day behavior, robustness, and explainability.
要約:
Financial systems run nonstop and must stay reliable even during cyber incidents. Modern attacks move across many services (apps, APIs, identity, payment rails), so defenders must make a sequence of actions under time pressure. Most security tools still use fixed rules or static playbooks, which can be slow to adapt when the attacker changes behavior. Reinforcement learning (RL) is a good fit for sequential decisions, but much of the RL-in-finance literature targets trading and does not model real cyber response limits such as action cost, service disruption, and defender coordination across many assets. This paper proposes RLShield, a practical multi-agent RL pipeline for financial cyber defense. We model the enterprise attack surface as a Markov decision process (MDP) where states summarize alerts, asset exposure, and service health, and actions represent real response steps (e.g., isolate a host, rotate credentials, ratelimit an API, block an account, or trigger recovery). RLShield learns coordinated policies across multiple agents (assets or service groups) and optimizes a risk-sensitive objective that balances containment speed, business disruption, and response cost. We also include a game-aware evaluation that tests policies against adaptive attackers and reports operational outcomes, not only reward. Experiments show that RLShield reduces time-to-containment and residual exposure while keeping disruption within a fixed response budget, outperforming static rule baselines and single-agent RL under the same constraints. These results suggest that multi-agent, cost-aware RL can provide a deployable layer for automated response in financial security operations.
要約:
The rapid proliferation of agentic AI skill ecosystems -- exemplified by OpenClaw (228,000 GitHub stars) and Anthropic Agent Skills (75,600 stars) -- has introduced a critical supply chain attack surface. The ClawHavoc campaign (January-February 2026) infiltrated over 1,200 malicious skills into the OpenClaw marketplace, while MalTool catalogued 6,487 malicious tools that evade conventional detection. In response, twelve reactive security tools emerged, yet all rely on heuristic methods that provide no formal guarantees. We present SkillFortify, the first formal analysis framework for agent skill supply chains, with six contributions: (1) the DY-Skill attacker model, a Dolev-Yao adaptation to the five-phase skill lifecycle with a maximality proof; (2) a sound static analysis framework grounded in abstract interpretation; (3) capability-based sandboxing with a confinement proof; (4) an Agent Dependency Graph with SAT-based resolution and lockfile semantics; (5) a trust score algebra with formal monotonicity; and (6) SkillFortifyBench, a 540-skill benchmark. SkillFortify achieves 96.95% F1 (95% CI: [95.1%, 98.4%]) with 100% precision and 0% false positive rate on 540 skills, while SAT-based resolution handles 1,000-node graphs in under 100 ms.
要約:
The increasing reliance on cloud-hosted Large Language Models (LLMs) exposes sensitive client data, such as prompts and responses, to potential privacy breaches by service providers. Existing approaches fail to ensure privacy, maintain model performance, and preserve computational efficiency simultaneously. To address this challenge, we propose Talaria, a confidential inference framework that partitions the LLM pipeline to protect client data without compromising the cloud's model intellectual property or inference quality. Talaria executes sensitive, weight-independent operations within a client-controlled Confidential Virtual Machine (CVM) while offloading weight-dependent computations to the cloud GPUs. The interaction between these environments is secured by our Reversible Masked Outsourcing (ReMO) protocol, which uses a hybrid masking technique to reversibly obscure intermediate data before outsourcing computations. Extensive evaluations show that Talaria can defend against state-of-the-art token inference attacks, reducing token reconstruction accuracy from over 97.5% to an average of 1.34%, all while being a lossless mechanism that guarantees output identical to the original model without significantly decreasing efficiency and scalability. To the best of our knowledge, this is the first work that ensures clients' prompts and responses remain inaccessible to the cloud, while also preserving model privacy, performance, and efficiency.
要約:
The rapid evolution of sophisticated cyberattacks has strained modern Security Operations Centers (SOC), which traditionally rely on rule-based or signature-driven detection systems. These legacy frameworks often generate high volumes of technical alerts that lack organizational context, leading to analyst fatigue and delayed incident responses. This paper presents LiaisonAgent, an autonomous multi-agent system designed to bridge the gap between technical risk detection and business-level risk governance. Built upon the QWQ-32B large reasoning model, LiaisonAgent integrates specialized sub-agents, including human-computer interaction agents, comprehensive judgment agents, and automated disposal agents-to execute end-to-end investigation workflows. The system leverages a hybrid planning architecture that combines deterministic workflows for compliance with autonomous reasoning based on the ReAct paradigm to handle ambiguous operational scenarios. Experimental evaluations across diverse security contexts, such as large-scale data exfiltration and unauthorized account borrowing, achieve an end-to-end tool-calling success rate of 97.8% and a risk judgment accuracy of 95%. Furthermore, the system exhibits significant resilience against out-of-distribution noise and adversarial prompt injections, while achieving a 92.7% reduction in manual investigation overhead.
要約:
The Forensics Investigations Network in Digital Sciences (FINDS) Research Center of Excellence (CoE), funded by the U.S. Army Research Laboratory, advances Digital Forensic Engineering Education (DFEE) through an integrated research education framework for AI enabled cybersecurity workforce development. FINDS combines high performance computing (HPC), secure software engineering, adversarial analytics, and experiential learning to address emerging cyber and synthetic media threats. This paper introduces the Multidependency Capacity Building Skills Graph (MCBSG), a directed acyclic graph based model that encodes hierarchical and cross domain dependencies among competencies in AI-driven forensic programming, statistical inference, digital evidence processing, and threat detection. The MCBSG enables structured modeling of skill acquisition pathways and quantitative capacity assessment. Supervised machine learning methods, including entropy-based Decision Tree Classifiers and regression modeling, are applied to longitudinal multi cohort datasets capturing mentoring interactions, laboratory performance metrics, curriculum artifacts, and workshop participation. Feature importance analysis and cross validation identify key predictors of technical proficiency and research readiness. Three year statistical evaluation demonstrates significant gains in forensic programming accuracy, adversarial reasoning, and HPC-enabled investigative workflows. Results validate the MCBSG as a scalable, interpretable framework for data-driven, inclusive cybersecurity education aligned with national defense workforce priorities.
要約:
As AI agents increasingly perform economic tasks on behalf of humans, a fundamental tension arises between agent autonomy and human control over financial assets. We present the Agent Economic Sovereignty Protocol (AESP), a layered protocol in which agents transact autonomously at machine speed on crypto-native infrastructure while remaining cryptographically bound to human-defined governance boundaries. AESP enforces the invariant that agents are economically capable but never economically sovereign through five mechanisms: (1) a deterministic eight-check policy engine with tiered escalation; (2) human-in-the-loop review with automatic, explicit, and biometric tiers; (3) EIP-712 dual-signed commitments with escrow; (4) HKDF-based context-isolated privacy with batched consolidation; and (5) an ACE-GF-based cryptographic substrate. We formalize two testable hypotheses on security coverage and latency overhead, and specify a complete evaluation methodology with baselines and ablation design. The protocol is implemented as an open-source TypeScript SDK (208 tests, ten modules) with interoperability via MCP and A2A.
要約:
Data valuation methods quantify how individual training examples contribute to a model's behavior, and are increasingly used for dataset curation, auditing, and emerging data markets. As these techniques become operational, they raise serious privacy concerns: valuation scores can reveal whether a person's data was included in training, whether it was unusually influential, or what sensitive patterns exist in proprietary datasets. This motivates the study of privacy-preserving data valuation. However, privacy is fundamentally in tension with valuation utility under differential privacy (DP). DP requires outputs to be insensitive to any single record, while valuation methods are explicitly designed to measure per-record influence. As a result, naive privatization often destroys the fine-grained distinctions needed to rank or attribute value, particularly in heterogeneous datasets where rare examples exert outsized effects. In this work, we analyze the feasibility of DP-compatible data valuation. We identify the core algorithmic primitives across common valuation frameworks that induce prohibitive sensitivity, explaining why straightforward DP mechanisms fail. We further derive design principles for more privacy-amenable valuation procedures and empirically characterize how privacy constraints degrade ranking fidelity across representative methods and datasets. Our results clarify the limits of current approaches and provide a foundation for developing valuation methods that remain useful under rigorous privacy guarantees.
要約:
With the increase in Internet censorship globally, various circumvention tools have been designed and developed. However, the monetary cost of these tools deeply impacts both user choice and the sustainability of provider operations. Recent developments in censorship circumvention research attempted to achieve cost efficiency by utilizing Infrastructure-as-a-Service (IaaS) spot instances as bridges, but still incurred substantial expenses related to network connectivity and instance maintenance. In this work, we present CensorLess, a circumvention proxy built leveraging the unique benefits of a serverless platform. CensorLess comprises three components: a local proxy that handles client-side communication and ensures compliance with serverless functions' security restrictions, a function refresher that periodically regenerates bridges, and a live migration mechanism that maintains continuous connectivity. CensorLess inherits the serverless platform's cost efficiency, ephemerality, scalability, concurrency, and performance. Compared to existing low-cost, state-of-the-art circumvention techniques, CensorLess reduces costs by 97%, while simultaneously enabling robust censorship resistance by employing bridge rotation.
要約:
Colluding language-model agents can hide coordination in messages that remain policy-compliant at the surface level. We present CLBC, a protocol where generation and admission are separated: a message is admitted to transcript state only if a small verifier accepts a proof-bound envelope under a pinned predicate $\Pi$. The predicate binds policy hash, public randomness schedule, transcript chaining, latent schema constraints, canonical metadata/tool fields, and deterministic rejection codes. We show how this protocol yields an upper bound on transcript leakage in terms of latent leakage plus explicit residual channels, derive adaptive composition guarantees, and state a semantic lower bound when policy-valid alternatives remain choosable. We report extensive empirically grounded evidence: aggregate evaluation satisfies all prespecified thresholds; strict lane decoder advantage is bounded at 0.0000 with MI proxy 0.0636; adaptive-colluder stress tests remain below attacker thresholds; and baseline separation shows large gaps between reject-by-default semantics and audit-only controls. We further quantify operational tradeoffs. Strict full-proof mode has median turn latency 27.53s (p95 28.08s), while sampled proving reduces non-proved-turn latency to 0.327ms. The central finding is that bottlenecks alone are insufficient: security claims depend on verifiable admission semantics that are online, deterministic, and fail-closed.
要約:
The growing deployment of Internet of Things (IoT) devices in smart cities and industrial environments increases vulnerability to stealthy, multi-stage advanced persistent threats (APTs) that exploit wireless communication. Detection is challenging due to severe class imbalance in network traffic, which limits the effectiveness of traditional deep learning approaches and their lack of explainability in classification decisions. To address these challenges, this paper proposes a neurosymbolic architecture that integrates an optimized BERT model with logic tensor networks (LTN) for explainable APT detection in wireless IoT networks. The proposed method addresses the challenges of mobile IoT environments through efficient feature encoding that transforms network flow data into BERT-compatible sequences while preserving temporal dependencies critical for APT stage identification. Severe class imbalance is mitigated using focal loss, hierarchical classification that separates normal traffic detection from attack categorization, and adaptive sampling strategies. Evaluation on the SCVIC-APT2021 dataset demonstrates an operationally viable binary classification F1 score of 95.27% with a false positive rate of 0.14%, and a 76.75% macro F1 score for multi-class attack categorization. Furthermore, a novel explainability analysis statistically validates the importance of distinct network features. These results demonstrate that neurosymbolic learning enables high-performance, interpretable, and operationally viable APT detection for IoT network monitoring architectures.
要約:
Unmanned aerial vehicle (UAV) networks are increasingly deployed for complex missions, including disaster response, intelligent logistics, and environmental monitoring. These missions generally require coordinated collaboration among multiple UAVs across distinct administrative domains. To support such cross-domain cooperation, service function chains (SFCs) are constructed, where complex workflows are decomposed into ordered service functions assigned to appropriate UAVs along the mission path. However, it is challenging to ensure secure, trustworthy, and low-latency cross-domain SFC orchestration in identity management, authentication, and resilience to node failures. To address these issues, this paper proposes a consortium blockchain-based trust architecture for cross-domain decentralized identity verification, auditable task execution, and dynamic service-aware orchestrator selection. The framework employs a hierarchical four-phase cross-domain authentication protocol covering the credential pre-verification, intra-domain execution, secure relay, and audit logging. The use case analysis confirms that the proposed framework achieves substantial reductions in authentication latency and significant improvements in system throughput against centralized and static schemes. The open challenges in scalability, adaptive trust assessment, interoperability, and energy efficiency are discussed, thereby providing directions for future researches on secure and efficient cross-domain UAV service orchestration.
要約:
Browser-use agents are widely used for everyday tasks. They enable automated interaction with web pages through structured DOM based interfaces or vision language models operating on page screenshots. However, web pages often change between planning and execution, causing agents to execute actions based on stale assumptions. We view this temporal mismatch as a time of check to time of use (TOCTOU) vulnerability in browser-use agents. Dynamic or adversarial web content can exploit this window to induce unintended actions. We present a large scale empirical study of TOCTOU vulnerabilities in browser-use agents using a benchmark that spans synthesized and real world websites. Using this benchmark, we evaluate 10 popular open source agents and show that TOCTOU vulnerabilities are widespread. We design a lightweight mitigation based on pre-execution validation. It monitors DOM and layout changes during planning and validates the page state immediately before action execution. This approach reduces the risk of insecure execution and mitigates unintended side effects in browser-use agents.
要約:
Federated Instruction Tuning (FIT) enables collaborative instruction tuning of large language models across multiple organizations (clients) in a cross-silo setting without requiring the sharing of private instructions. Recent findings on natural backdoors and the existing training data collection method suggest that poisoned samples may be pervasive and inadvertently embedded in real-world datasets, potentially distributed across all clients, even if the clients are benign. This work systematically examine this threat in FIT, demonstrating that existing defenses are ineffective when poisoned data is interspersed among all clients. Addressing this challenge entails two major difficulties: identifying the distinctive characteristics of poisoned samples at each client and enabling collaborative defense when some clients are heavily dominated by poisoned samples. To address these difficulties, we identify gradients in the frequency domain as a robust signal to distinguish poisoned data. We further propose a global secondary clustering mechanism that facilitates collaborative identification of poisoned samples across clients. In summary, this paper introduces ProtegoFed, the first backdoor-free FIT framework that accurately detects, removes, and even purifies interspersed poisoned data across clients during the training. Experimental results on four FL datasets show that ProtegoFed identifies $92.00\% \sim 100.00\%$ of poisoned samples, reduces the attack success rate to almost zero, and maintains utility on the main task. Code is available at https://github.com/dongdongzhaoUP/ProtegoFed.
要約:
Reliable grid operation depends on accurate and timely telemetry, making modern power systems vulnerable to communication layer cyberattacks. This paper evaluates how Denial of Service (DoS), Denial of Data (DoD), and False Data Injection (FDI) attacks disrupt the IEEE 14 bus system using a MATLAB only, time stepped simulation framework built on MATPOWER. The framework emulates a 24 hour operating cycle with sinusoidal load variation, introduces attack specific manipulation of load and voltage data, and performs full AC power flow solves with reactive limit enforcement (PV PQ switching). At each timestep, the system logs true and measured voltages, generator P/Q output, system losses, and voltage limit violations to capture transient cyber physical effects. Results show that DoD causes the largest physical distortions and reactive power stress, DoS masks natural variability and degrades situational awareness, and FDI creates significant discrepancies between true and perceived voltages. The study provides a compact, reproducible benchmark for analyzing cyber induced instability and informing future defense strategies.
要約:
One-time programs (OTPs) aim to let a user evaluate a program on a single input while revealing nothing else. Classical OTPs require hardware assumptions, and even with quantum information, OTPs for deterministic functionalities remain impossible due to gentle-measurement attacks (Broadbent, Gutoski and Stebila, 2013). While recent works achieve positive results for certain randomized functionalities, the fundamental limits and the strongest achievable security notions remain poorly understood.
In this paper, we ask for a "best-possible" OTP that achieves the strongest one-time security achievable by any OTP construction. We first show that a generic best-possible one-time compiler cannot exist, even for classical randomized functionalities (assuming lossy encryption schemes exist).
Given this impossibility, we introduce a natural subclass of one-time compilers called "testable one-time program" compilers, which output quantum states augmented with reflection oracles for these program states. We show that best-possible testable OTP compilers are achievable by (1) formulating a generalized Single-Effective-Query (SEQ) simulation security notion for quantum channels and show that SEQ security implies best-possible testable one-time security, and (2) constructing SEQ-secure OTPs for all quantum functionalities in the classical oracle model. This yields the first OTP for arbitrary quantum channels beyond classical randomized functionalities.
Finally, we propose stateful quantum indistinguishability obfuscation (stateful quantum iO) -- quantum state obfuscation for stateful quantum programs. We show that (1) stateful quantum iO implies best-possible testable OTPs and (2) stateful quantum iO is also achievable in the classical oracle model. These results identify stateful quantum iO as a promising approach towards best-possible testable OTPs.
要約:
Voting is the primary mechanism through which Decentralised Autonomous Organisations (DAO) reach decisions. Although transparent, the voting process can be complex: it can involve many interacting smart contracts. The nexus of the decision-making process can be relocated and the true voter demographic obfuscated. Furthermore, DAOs can govern other DAOs -- metagovernance. We present a method for identifying DAO-to-DAO metagovernance on the Ethereum blockchain. We focus on the links between DAOs and token contracts. We employ a signature-matching algorithm to flexibly handle a variety of DAO frameworks and voting schemes. Once we establish token-to-DAO relationships, we gather and process voting data to produce a list of metagovernance relationships. We apply this algorithm to an initial set of sixteen DAOs and we extend the dataset as more DAOs are identified. We produce a metagovernance network with 61 DAOs and 72 metagovernance relationships. We examine three case studies that show metagovernance of various forms: strategic, decisive, and centralised where a DAO becomes a nexus for metagovernance. We demonstrate that metagovernance obscures voting context and introduces entities driven by self-interest that can significantly influence governance. We highlight instances of metagovernance between DAOs operating on the Ethereum blockchain where current governance tools inadequately reveal such dynamics. To preserve the transparency-centric ethos of DAOs and mitigate risks associated with metagovernance, there is a pressing need for enhanced tools to address such issues.
要約:
Backdoor attacks pose a critical threat to the security of deep neural networks, yet existing efforts on universal backdoors often rely on visually salient patterns, making them easier to detect and less practical at scale. In this work, we introduce a novel imperceptible universal backdoor attack that simultaneously controls all target classes with minimal poisoning while preserving stealth. Our key idea is to leverage graph convolutional networks (GCNs) to model inter-class relationships and generate class-specific perturbations that are both effective and visually invisible. The proposed framework optimizes a dual-objective loss that balances stealthiness (measured by perceptual similarity metrics such as PSNR) and attack success rate (ASR), enabling scalable, multi-target backdoor injection. Extensive experiments on ImageNet-1K with ResNet architectures demonstrate that our method achieves high ASR (up to 91.3%) under poisoning rates as low as 0.16%, while maintaining benign accuracy and evading state-of-the-art defenses. These results highlight the emerging risks of invisible universal backdoors and call for more robust detection and mitigation strategies.
要約:
This article argues that security is not enough to fully capture what is at stake in government exceptional access to encrypted data. A conception of privacy as security has little to say about ``lawful-surveillance protocols'' -- an active research agenda in cryptography that aims to enable government exceptional access without compromising systemic security. But the limitations are not contingent on the success of this agenda. The normative landscape today cannot be explained if security is all there is to privacy. And fundamental objections to Apple's abandoned client-side scanning system gesture beyond security. This article's contribution is modest: to show that there must be more to privacy than the security mold it has taken. A richer understanding is needed both to assess policy and to guide research on lawful-surveillance protocols.
要約:
Machine learning based network intrusion detection systems are vulnerable to adversarial attacks that degrade classification performance under both gradient-based and distribution shift threat models. Existing defenses typically apply uniform detection strategies, which may not account for heterogeneous attack characteristics. This paper proposes an attack-aware multi-stage defense framework that learns attack-specific detection strategies through a weighted combination of ensemble disagreement, predictive uncertainty, and distributional anomaly signals. Empirical analysis across seven adversarial attack types reveals distinct detection signatures, enabling a two-stage adaptive detection mechanism. Experimental evaluation on a benchmark intrusion detection dataset indicates that the proposed system attains 94.2% area under the receiver operating characteristic curve and improves classification accuracy by 4.5 percentage points and F1-score by 9.0 points over adversarially trained ensembles. Under adaptive white-box attacks with full architectural knowledge, the system appears to maintain 94.4% accuracy with a 4.2% attack success rate, though this evaluation is limited to two adaptive variants and does not constitute a formal robustness guarantee. Cross-dataset validation further suggests that defense effectiveness depends on baseline classifier competence and may vary with feature dimensionality. These results suggest that attack-specific optimization combined with multi-signal integration can provide a practical approach to improving adversarial robustness in machine learning-based intrusion detection systems.
要約:
Smart contracts underpin high-value ecosystems such as decentralized finance (DeFi), yet recurring vulnerabilities continue to cause losses worth billions of dollars. Although numerous security analyzers that detect such flaws exist, real-world attacks remain frequent, raising the question of whether these tools are truly effective or simply under-used due to low developer trust. Prior benchmarks have evaluated analyzers on synthetic or vulnerable-only contract datasets, limiting their ability to measure false positives, false negatives, and usability factors that drive adoption. To close this gap, we present a mixed-methods study that combines large-scale benchmarking with practitioner insights. We evaluate six widely used analyzers (i.e., Confuzzius, Dlva, Mythril, Osiris, Oyente, and Slither) on 653 real-world smart contracts that cover three high-impact vulnerability classes from the OWASP Smart Contract Top Ten (i.e., reentrancy, suicidal contract termination, and integer arithmetic errors). Our results show substantial variation in accuracy (F1 = 31.2 to 94.6%), high false-positive rates (up to 32.6%), and runtimes exceeding 700 seconds per contract. We then survey 150 professional developers and auditors to understand how they use and perceive these tools. Our findings reveal that excessive false positives, vague explanations, and long analysis times are the main barriers to trust and adoption in practice. By linking measurable performance gaps to developer perceptions, we provide concrete recommendations for improving the precision, explainability, and usability of smart-contract security analyzers.
要約:
Modern generative agents such as OpenClaw - an open-source, self-hosted personal assistant with a community skill ecosystem, are gaining attention and are used pervasively. However, the openness and rapid growth of these ecosystems often outpace systematic security evaluation. In this paper, we design, implement, and evaluate Clawdrain, a Trojanized skill that induces a multi-turn "Segmented Verification Protocol" via injected SKILL.md instructions and a companion script that returns PROGRESS/REPAIR/TERMINAL signals. We deploy Clawdrain in a production-like OpenClaw instance with real API billing and a production model (Gemini 2.5 Pro), and we measure 6-7x token amplification over a benign baseline, with a costly, failure configuration reaching approximately 9x. We observe a deployment-only phenomenon: the agent autonomously composes general-purpose tools (e.g., shell/Python) to route around brittle protocol steps, reducing amplification and altering attack dynamics. Finally, we identify production vectors enabled by OpenClaw's architecture, including SKILL.md prompt bloat, persistent tool-output pollution, cron/heartbeat frequency amplification, and behavioral instruction injection. Overall, we demonstrate that token-drain attacks remain feasible in real deployments, but their magnitude and observability are shaped by tool composition, recovery behavior, and interface design.
要約:
As quantum computing platforms increasingly adopt cloud-based execution, users submit quantum circuits to remote compilers and backends, trusting that what they submit is exactly what will be run. This shift introduces new trust assumptions in the submission pipeline, which remain largely unexamined. In this paper, we present QSpy, the first proof-of-concept Quantum Remote Access Trojan capable of intercepting quantum circuits in transit. Once deployed on a user's machine, QSpy silently installs a rogue certificate authority and proxies outgoing API traffic, enabling a man-in-the-middle (MITM) attack on submitted quantum circuits. We show that the intercepted quantum circuits may be forwarded to a remote server, which is capable of categorizing, storing, and analyzing them, without disrupting execution or triggering authentication failures. Our prototype targets IBM Qiskit APIs on a Windows system, but the attack model generalizes to other delegated quantum computing workflows. This work highlights the urgent need for submission-layer protections and demonstrates how even classical attack primitives can pose critical threats to quantum workloads.
要約:
Modern web applications are increasingly produced through AI-assisted development and rapid no-code deployment pipelines, widening the gap between accelerating software velocity and the limited adaptability of existing security tooling. Pattern-driven scanners fail to reason about novel contexts, while emerging LLM-based penetration testers rely on unconstrained exploration, yielding high cost, unstable behavior, and poor reproducibility.
We introduce AWE, a memory-augmented multi-agent framework for autonomous web penetration testing that embeds structured, vulnerability-specific analysis pipelines within a lightweight LLM orchestration layer. Unlike general-purpose agents, AWE couples context aware payload mutations and generations with persistent memory and browser-backed verification to produce deterministic, exploitation-driven results.
Evaluated on the 104-challenge XBOW benchmark, AWE achieves substantial gains on injection-class vulnerabilities - 87% XSS success (+30.5% over MAPTA) and 66.7% blind SQL injection success (+33.3%) - while being much faster, cheaper, and more token-efficient than MAPTA, despite using a midtier model (Claude Sonnet 4) versus MAPTA's GPT-5. MAPTA retains higher overall coverage due to broader exploratory capabilities, underscoring the complementary strengths of specialized and general-purpose architectures. Our results demonstrate that architecture matters as much as model reasoning capabilities: integrating LLMs into principled, vulnerability-aware pipelines yields substantial gains in accuracy, efficiency, and determinism for injection-class exploits. The source code for AWE is available at: https://github.com/stuxlabs/AWE
要約:
Self-supervised diffusion models learn high-quality visual representations via latent space denoising. However, their representation layer poses a distinct threat: unlike traditional attacks targeting generative outputs, its unconstrained latent semantic space allows for stealthy backdoors, permitting malicious control upon triggering. In this paper, we propose BadRSSD, the first backdoor attack targeting the representation layer of self-supervised diffusion models. Specifically, it hijacks the semantic representations of poisoned samples with triggers in Principal Component Analysis (PCA) space toward those of a target image, then controls the denoising trajectory during diffusion by applying coordinated constraints across latent, pixel, and feature distribution spaces to steer the model toward generating the specified target. Additionally, we integrate representation dispersion regularization into the constraint framework to maintain feature space uniformity, significantly enhancing attack stealth. This approach preserves normal model functionality (high utility) while achieving precise target generation upon trigger activation (high specificity). Experiments on multiple benchmark datasets demonstrate that BadRSSD substantially outperforms existing attacks in both FID and MSE metrics, reliably establishing backdoors across different architectures and configurations, and effectively resisting state-of-the-art backdoor defenses.
要約:
Dataset distillation compresses a large real dataset into a small synthetic one, enabling models trained on the synthetic data to achieve performance comparable to those trained on the real data. Although synthetic datasets are assumed to be privacy-preserving, we show that existing distillation methods can cause severe privacy leakage because synthetic datasets implicitly encode the weight trajectories of the distilled model, they become over-informative and exploitable by adversaries. To expose this risk, we introduce the Information Revelation Attack (IRA) against state-of-the-art distillation techniques. Experiments show that IRA accurately predicts both the distillation algorithm and model architecture, and can successfully infer membership and recover sensitive samples from the real dataset.
要約:
Watermarking has emerged as a key defense against the misuse of machine-generated images (MGIs). Yet the robustness of these protections remains underexplored. To reveal the limits of SOTA proactive image watermarking defenses, we propose HIDE&SEEK (HS), a suite of versatile and cost-effective attacks that reliably remove embedded watermarks while preserving high visual fidelity.
要約:
Harvest-now, decrypt-later (HN-DL) attacks threaten today's encrypted communications by archiving ciphertext until a quantum computer can break the underlying key exchange. This paper reframes HN-DL as an economic problem, quantifying adversary costs across Transport Layer Security (TLS) 1.2, TLS 1.3, QUIC, and Secure Shell (SSH) with an open-source testbed that reproduces the full attack sequence. Our model shows that retaining intercepted traffic is economically trivial, shifting the defensive question from whether an adversary can archive to how much decryption will cost. We evaluate protocol configuration strategies that act along two independent cost axes: storage overhead and quantum workload. Beyond the ongoing migration to post-quantum cryptography, these strategies provide defense in depth with current infrastructure. Encrypted Client Hello forces indiscriminate bulk collection, inflating the archive the adversary must retain, while aggressive rekeying and larger key exchange parameters multiply the quantum computations required to recover plaintext. Because storage inflation penalizes both sides while quantum cost inflation targets the adversary alone, rekeying and key size selection offer the strongest defensive levers.
要約:
Static Application Security Testing (SAST) tools often suffer from high false positive rates, leading to alert fatigue that consumes valuable auditing resources. Recent efforts leveraging Large Language Models (LLMs) as filters offer limited improvements; however, these methods treat LLMs as passive, stateless classifiers, which lack project-wide context and the ability to learn from analyses to discover unknown, similar vulnerabilities.In this paper, we propose vEcho, a novel framework that transforms the LLM from a passive filter into a virtual security expert capable of learning, memory, and reasoning. vEcho equips its core reasoning engine with a robust developer tool suite for deep, context-aware verification. More importantly, we introduce a novel Echoic Vulnerability Propagation (EVP) mechanism. Driven by a Cognitive Memory Module that simulates human learning, EVP enables vEcho to learn from verified vulnerabilities and proactively infer unknown, analogous flaws, achieving a paradigm shift from passive verification to active discovery.Extensive experiments on the CWE-Bench-Java dataset demonstrate vEcho's dual advantages over the state-of-the-art baseline, IRIS. Specifically, vEcho achieves a 65% detection rate, marking a 41.8% relative improvement over IRIS's 45.83%. Crucially, it simultaneously addresses alert fatigue by reducing the false positive rate to 59.78%, a 28.3% relative reduction from IRIS's 84.82%. Furthermore, vEcho proactively identified 37 additional known vulnerabilities beyond the 120 documented in the dataset, and has discovered 51 novel 0-day vulnerabilities in open-source projects.
要約:
This work presents ATLAS, an LLM-driven framework that bridges standardized threat modeling and property-based formal verification for System-on-Chip (SoC) security. Starting from vulnerability knowledge bases such as Common Weakness Enumeration (CWE), ATLAS identifies SoC-specific assets, maps relevant weaknesses, and generates assertion-based security properties and JasperGold scripts for verification. By combining asset-centric analysis with standardized threat model templates and multi-source SoC context, ATLAS automates the transformation from vulnerability reasoning to formal proof. Evaluated on three HACK@DAC benchmarks, ATLAS detected 39/48 CWEs and generated correct properties for 33 of those bugs, advancing automated, knowledge-driven SoC security verification toward a secure-by-design paradigm.
要約:
An Adaptive Cruise Control (ACC) system automatically adjusts the host vehicle's speed to maintain a safe following distance from a lead vehicle. In typical implementations, a feedback controller (e.g., a Proportional-Integral-Derivative (PID) controller) computes the host vehicle's acceleration using a target speed and a spacing error, defined as the difference between the measured inter-vehicle distance and a desired safe distance. ACC is often assumed to be resilient to fault-injection attacks because a Kalman filter (KF) can smooth noisy speed measurements. However, we show--through analytical proofs and simulation results--that a KF can tolerate injected speed values only up to a bounded threshold. When injected values exceed this threshold, the filter can be driven off track, causing the ACC controller to make unsafe acceleration decisions and potentially leading to collisions. Our main contribution is to augment the PID-based controller with Intrusion Detection System (IDS) outputs, yielding Intrusion Detection Systems-Based Adaptive Cruise Control (ACC-IDS). The ACC-IDS controller is simple and implementable: a binary intrusion flag switches the control law to emergency braking. We prove that augmenting ACC with an IDS, under assumed detection-performance and latency constraints, can mitigate these attacks and help preserve ACC's collision-avoidance guarantees.
要約:
Safety alignment in large language models (LLMs), particularly for cybersecurity tasks, primarily focuses on preventing misuse. While this approach reduces direct harm, it obscures a complementary failure mode: denial of assistance to legitimate defenders. We study Defensive Refusal Bias -- the tendency of safety-tuned frontier LLMs to refuse assistance for authorized defensive cybersecurity tasks when those tasks include similar language to an offensive cyber task. Based on 2,390 real-world examples from the National Collegiate Cyber Defense Competition (NCCDC), we find that LLMs refuse defensive requests containing security-sensitive keywords at $2.72\times$ the rate of semantically equivalent neutral requests ($p < 0.001$). The highest refusal rates occur in the most operationally critical tasks: system hardening (43.8%) and malware analysis (34.3%). Interestingly, explicit authorization, where the user directly instructs the model that they have authority to complete the target task, increases refusal rates, suggesting models interpret justifications as adversarial rather than exculpatory. These findings are urgent for interactive use and critical for autonomous defensive agents, which cannot rephrase refused queries or retry. Our findings suggest that current LLM cybersecurity alignment relies on semantic similarity to harmful content rather than reasoning about intent or authorization. We call for mitigations that analyze intent to maximize defensive capabilities while still preventing harmful compliance.
要約:
Large language models (LLMs) have shown promise for automated patching, but their effectiveness depends strongly on how they are integrated into patching systems. While prior work explores prompting strategies and individual agent designs, the field lacks a systematic comparison of patching architectures. In this paper, we present a controlled evaluation of four LLM-based patching paradigms -- fixed workflow, single-agent system, multi-agent system, and general-purpose code agents -- using a unified benchmark and evaluation framework. We analyze patch correctness, failure modes, token usage, and execution time across real-world vulnerability tasks. Our results reveal clear architectural trade-offs: fixed workflows are efficient but brittle, single-agent systems balance flexibility and cost, and multi-agent designs improve generalization at the expense of substantially higher overhead and increased risk of reasoning drift on complex tasks. Surprisingly, general-purpose code agents achieve the strongest overall patching performance, benefiting from general-purpose tool interfaces that support effective adaptation across vulnerability types. Overall, we show that architectural design and iteration depth, rather than model capability alone, dominate the reliability and cost of LLM-based automated patching.
要約:
Automating network security analysis, particularly the identification of potential attack paths, presents significant challenges. Due in part to the sequential, interconnected, and evolutionary nature of system events which most artificial intelligence (AI) techniques struggle to model effectively. This paper proposes a Reinforcement Learning (RL) environment generation framework that simulates the sequence of processes executed on a Windows operating system, enabling dynamic modeling of malicious processes on a system. This methodology models operating system state and transitions using a graph representation. This graph is derived from open-source System Monitor (Sysmon) logs. To address the variety in system event types, fields, and log formats, a mechanism was developed to capture and model parent-child processes from Sysmon logs. A Gymnasium environment (SubstratumGraphEnv) was constructed to establish the perceptible basis for an RL environment, and a customized PyTorch interface was also built (SubstratumBridge) to translate Gymnasium graphs into Deep Reinforcement Learning (DRL) observations and discrete actions. Graph Convolutional Networks (GCNs) concretize the graph's local and global state, which feed the distinct policy and critic heads of an Advantage Actor-Critic (A2C) model. This work's central contribution lies in the design of a novel deep graphical RL environment that automates translation of sequential user and system events, furnishing crucial context for cybersecurity analysis. This work provides a foundation for future research into shaping training parameters and advanced reward shaping, while also offering insight into which system events attributes are critical to training autonomous RL agents.
要約:
The rapid development of large language models (LLMs) has driven the widespread adoption of cloud-based LLM inference services, while also bringing prominent privacy risks associated with the transmission and processing of private data in remote inference. For privacy-preserving LLM inference technologies to be practically applied in industrial scenarios, three core requirements must be satisfied simultaneously: (1) Accuracy and efficiency losses should be minimized to mitigate degradation in service experience. (2) The inference process can be run on large-scale clusters consist of heterogeneous legacy xPUs. (3) Compatibility with existing LLM infrastructures should be ensured to reuse their engineering optimizations. To the best of our knowledge, none of the existing privacy-preserving LLM inference methods satisfy all the above constraints while delivering meaningful privacy guarantees. In this paper, we propose AloePri, the first privacy-preserving LLM inference method for industrial applications. AloePri protects both the input and output data by covariant obfuscation, which jointly transforms data and model parameters to achieve better accuracy and privacy. We carefully design the transformation for each model component to ensure inference accuracy and data privacy while keeping full compatibility with existing infrastructures of Language Model as a Service. AloePri has been integrated into an industrial system for the evaluation of mainstream LLMs. The evaluation on Deepseek-V3.1-Terminus model (671B parameters) demonstrates that AloePri causes accuracy loss of 0.0%~3.5% and exhibits efficiency equivalent to that of plaintext inference. Meanwhile, AloePri successfully resists state-of-the-art attacks, with less than 5\% of tokens recovered. To the best of our knowledge, AloePri is the first method to exhibit practical applicability to large-scale models in real-world systems.
要約:
Large Language Models (LLMs) are increasingly deployed as agentic systems that plan, memorize, and act in open-world environments. This shift brings new security problems: failures are no longer only unsafe text generation, but can become real harm through tool use, persistent memory, and interaction with untrusted web content. In this survey, we provide a transition-oriented view from Secure Agentic AI to a Secure Agentic Web. We first summarize a component-aligned threat taxonomy covering prompt abuse, environment injection, memory attacks, toolchain abuse, model tampering, and agent network attacks. We then review defense strategies, including prompt hardening, safety-aware decoding, privilege control for tools and APIs, runtime monitoring, continuous red-teaming, and protocol-level security mechanisms. We further discuss how these threats and mitigations escalate in the Agentic Web, where delegation chains, cross-domain interactions, and protocol-mediated ecosystems amplify risks via propagation and composition. Finally, we highlight open challenges for web-scale deployment, such as interoperable identity and authorization, provenance and traceability, ecosystem-level response, and scalable evaluation under adaptive adversaries. Our goal is to connect recent empirical findings with system-level requirements, and to outline practical research directions toward trustworthy agent ecosystems.
要約:
Recent intelligent systems integrate powerful Large Language Models (LLMs) through APIs, but their trustworthiness may be critically undermined by targeted attacks like backdoor and prompt injection attacks, which secretly force LLMs to generate specific malicious sequences. Existing defensive approaches for such threats typically rely on high access rights, impose prohibitive costs, and hinder normal inference, rendering them impractical for real-world scenarios. To solve these limitations, we introduce DualSentinel, a lightweight and unified defense framework that can accurately and promptly detect the activation of targeted attacks alongside the LLM generation process. We first identify a characteristic of compromised LLMs, termed Entropy Lull: when a targeted attack successfully hijacks the generation process, the LLM exhibits a distinct period of abnormally low and stable token probability entropy, indicating it is following a fixed path rather than making creative choices. DualSentinel leverages this pattern by developing an innovative dual-check approach. It first employs a magnitude and trend-aware monitoring method to proactively and sensitively flag an entropy lull pattern at runtime. Upon such flagging, it triggers a lightweight yet powerful secondary verification based on task-flipping. An attack is confirmed only if the entropy lull pattern persists across both the original and the flipped task, proving that the LLM's output is coercively controlled. Extensive evaluations show that DualSentinel is both highly effective (superior detection accuracy with near-zero false positives) and remarkably efficient (negligible additional cost), offering a truly practical path toward securing deployed LLMs. The source code can be accessed at https://doi.org/10.5281/zenodo.18479273.
要約:
Digital twins (DTs) are increasingly used to monitor and secure Industrial Control Systems (ICS), yet detecting stealthy False Data Injection Attacks (FDIAs) that manipulate system states within normal physical bounds remains challenging. Deep learning anomaly detectors often over-generalize such subtle manipulations, while classical fault detection methods do not scale well in highly correlated multivariate systems. We propose a closed-loop Information-Theoretic Digital Twin (IT-DT) framework for real-time anomaly detection. N4SID identification is combined with steady-state Kalman filtering to quantify residual distribution shifts via closed-form KL divergence, capturing both mean deviations and malicious cross-covariance shifts. Evaluations on the SWaT and WADI datasets show that IT-DT achieves F1-scores of 0.832 and 0.615, respectively, with better precision than deep learning baselines such as TranAD. Computational profiling indicates that the analytical approach requires minimal memory and provides approximately a 600x inference speedup over transformer-based methods on CPU hardware. This makes the framework suitable for resource-constrained industrial edge controllers without GPU acceleration.
要約:
Adversarial behavior plays a central role in aligning large language models with human values. However, existing alignment methods largely rely on static adversarial settings, which fundamentally limit robustness, particularly in multimodal settings with a larger attack surface. In this work, we move beyond static adversarial supervision and introduce co-evolutionary alignment with evolving attacks, instantiated by CEMMA (Co-Evolutionary Multi-Modal Alignment), an automated and adaptive framework for multimodal safety alignment. We introduce an Evolutionary Attacker that decomposes adversarial prompts into method templates and harmful intents. By employing genetic operators, including mutation, crossover, and differential evolution, it enables simple seed attacks to inherit the structural efficacy of sophisticated jailbreaks. The Adaptive Defender is iteratively updated on the synthesized hard negatives, forming a closed-loop process that adapts alignment to evolving attacks. Experiments show that the Evolutionary Attacker substantially increases red-teaming jailbreak attack success rate (ASR), while the Adaptive Defender improves robustness and generalization across benchmarks with higher data efficiency, without inducing excessive benign refusal, and remains compatible with inference-time defenses such as AdaShield.
要約:
This is the Replicated Computational Results (RCR) Report for the paper ``Can LLMs Hack Enterprise Networks?" The paper empirically investigates the efficacy and effectiveness of different LLMs for penetration-testing enterprise networks, i.e., Microsoft Active Directory Assumed-Breach Simulations. This RCR report describes the artifacts used in the paper, how to create an evaluation setup, and highlights the analysis scripts provided within our prototype.
要約:
Phishing remains the most pervasive threat to the Web, enabling large-scale credential theft and financial fraud through deceptive webpages. While recent reference-based and generative-AI-driven phishing detectors achieve strong accuracy, their reliance on external knowledge bases, cloud services, and complex multimodal pipelines fundamentally limits practicality, scalability, and reproducibility. In contrast, conventional deep learning approaches often fail to generalize to evolving phishing campaigns. We introduce SpecularNet, a novel lightweight framework for reference-free web phishing detection that demonstrates how carefully designed compact architectures can rival heavyweight systems. SpecularNet operates solely on the domain name and HTML structure, modeling the Document Object Model (DOM) as a tree and leveraging a hierarchical graph autoencoding architecture with directional, level-wise message passing. This design captures higher-order structural invariants of phishing webpages while enabling fast, end-to-end inference on standard CPUs. Extensive evaluation against 13 state of the art phishing detectors, including leading reference-based systems, shows that SpecularNet achieves competitive detection performance with dramatically lower computational cost. On benchmark datasets, it reaches an F1 score of 93.9%, trailing the best reference-based method slightly while reducing inference time from several seconds to approximately 20 milliseconds per webpage. Field and robustness evaluations further validate SpecularNet in real-world deployments, on a newly collected 2026 open-world dataset, and against adversarial attacks.
要約:
The proliferation of IoT and V2X systems generates unprecedented sensitive data at the network edge, demanding privacy-preserving architectures that enable secure sharing without exposing raw information. Contemporary solutions face a fundamental privacy-efficiency-trust trilemma: achieving strong privacy guarantees, computational efficiency for resource-constrained devices, and decentralized trust simultaneously remains intractable with single-paradigm approaches. This survey systematically analyzes 75 technical papers (2007--2025) through a novel three-dimensional taxonomy classifying architectures into Decentralized Computation, Cryptography-based, and Distributed Ledger approaches. Temporal analysis reveals dramatic acceleration during 2024--2025, with 48% of all papers published in this period -- Decentralized Computation dominates at 44% of contributions and 59% of 2025 publications. Comprehensive Security Threat Mapping and Technology Maturity Assessment demonstrate that mature solutions occupy narrow design regions excelling in one or two dimensions while compromising others, conclusively validating the trilemma hypothesis. We identify emerging hybrid architectures combining complementary paradigms as the essential path forward. Critical challenges including security guarantee composition across layers, multi-layer coordination overhead minimization, and post-quantum security integration must be addressed for practical deployment in next-generation intelligent transportation systems and IoT ecosystems.
要約:
Access to frontier large language models (LLMs), such as GPT-5 and Gemini-2.5, is often hindered by high pricing, payment barriers, and regional restrictions. These limitations drive the proliferation of $\textit{shadow APIs}$, third-party services that claim to provide access to official model services without regional limitations via indirect access. Despite their widespread use, it remains unclear whether shadow APIs deliver outputs consistent with those of the official APIs, raising concerns about the reliability of downstream applications and the validity of research findings that depend on them. In this paper, we present the first systematic audit between official LLM APIs and corresponding shadow APIs. We first identify 17 shadow APIs that have been utilized in 187 academic papers, with the most popular one reaching 5,966 citations and 58,639 GitHub stars by December 6, 2025. Through multidimensional auditing of three representative shadow APIs across utility, safety, and model verification, we uncover both indirect and direct evidence of deception practices in shadow APIs. Specifically, we reveal performance divergence reaching up to $47.21\%$, significant unpredictability in safety behaviors, and identity verification failures in $45.83\%$ of fingerprint tests. These deceptive practices critically undermine the reproducibility and validity of scientific research, harm the interests of shadow API users, and damage the reputation of official model providers.
要約:
We study the problem of computing a U-statistic with a kernel function f of degree k $\ge$ 2, i.e., the average of some function f over all k-tuples of instances, in a federated learning setting. Ustatistics of degree 2 include several useful statistics such as Kendall's $\tau$ coefficient, the Area under the Receiver-Operator Curve and the Gini mean difference. Existing methods provide solutions only under the lower-utility local differential privacy model and/or scale poorly in the size of the domain discretization. In this work, we propose a protocol that securely computes U-statistics of degree k $\ge$ 2 under central differential privacy by leveraging Multi Party Computation (MPC). Our method substantially improves accuracy when compared to prior solutions. We provide a detailed theoretical analysis of its accuracy, communication and computational properties. We evaluate its performance empirically, obtaining favorable results, e.g., for Kendall's $\tau$ coefficient, our approach reduces the Mean Squared Error by up to four orders of magnitude over existing baselines.
要約:
Federated Learning (FL) was initially proposed as a privacy-preserving machine learning paradigm. However, FL has been shown to be susceptible to a series of privacy attacks. Recently, there has been concern about the Source Inference Attack (SIA), where an honest-but-curious central server attempts to identify exactly which client owns a given data point which was used in the training phase. Alarmingly, standard gradient obfuscation techniques with Differential Privacy have been shown to be ineffective against SIAs, at least without severely diminishing the accuracy.
In this work, we propose a defense against SIAs within the widely studied shuffle model of FL, where an honest shuffler acts as an intermediary between the clients and the server. First, we demonstrate that standard naive shuffling alone is insufficient to prevent SIAs. To effectively defend against SIAs, shuffling needs to be applied at a more granular level; we propose a novel combination of parameter-level shuffling with the residue number system (RNS). Our approach provides robust protection against SIAs without affecting the accuracy of the joint model and can be seamlessly integrated into other privacy protection mechanisms.
We conduct experiments on a series of models and datasets, confirming that standard shuffling approaches fail to prevent SIAs and that, in contrast, our proposed method reduce the attack's accuracy to the level of random guessing.
要約:
Micro-Controller Units (MCUs) are widely used in safety-critical systems, making them attractive targets for attacks. This calls for lightweight defenses that remain effective despite software compromise. Control Flow Auditing (CFAud) is one such mechanism wherein a remote verifier (Vrf) is guaranteed to received evidence about the control flow path taken on a prover (Prv) MCU, even when Prv software is compromised. Despite promising benefits, current CFAud architectures unfortunately require a ``busy-wait'' phase where a hardware-anchored root-of-trust (RoT) in Prv retains execution control to ensure delivery of control flow evidence to Vrf. This drastically reduces the CPU utilization on Prv.
In this work, we addresses this limitation with an architecture for Contention Avoidance in Runtime Auditing with Minimized Execution Latency (CARAMEL). CARAMEL is a hardware-software RoT co-design that enables Prv applications to resume while control flow evidence is transmitted to Vrf. This significantly reduces contention due to transmission delays and improves CPU utilization without giving up on security. Key to CARAMEL is our design of a new RoT with a self-contained (and minimal) dedicated communication interface. CARAMEL's implementation and accompanying evaluation are made open-source. Our results show substantially improved CPU utilization at a modest hardware cost.
要約:
We present a real-world multi-scenario unmanned aerial vehicle (UAV) radio frequency (RF) dataset, namely DRFF-R2, which is collected using a dedicated acquisition platform under diverse operational conditions. All signals are acquired within a unified framework to ensure consistency in hardware configuration and environmental settings. The dataset is systematically organized into seven well-defined subsets corresponding to different operational and signal composition scenarios to facilitate structured experimentation. Each file follows a clearly annotated naming convention to enable convenient data indexing and reproducible analysis. The dataset contains RF recordings from 26 UAV units spanning 8 distinct models, captured across varying flight states, altitudes, speeds, acquisition days, and receiver configurations. By covering diverse acquisition settings and signal compositions, the dataset provides a comprehensive resource for future UAV RF signal research, including RF fingerprinting (RFF) identification, model-level recognition, flight state analysis, time-varying RFF study, and interference-aware signal processing.
要約:
The rise of text-to-video generation models has raised growing concerns over content authenticity, copyright protection, and malicious misuse. Watermarking serves as an effective mechanism for regulating such AI-generated content, where high fidelity and strong robustness are particularly critical. Recent generative image watermarking methods provide a promising foundation by leveraging watermark information and pseudo-random keys to control the initial sampling noise, enabling lossless embedding. However, directly extending these techniques to videos introduces two key limitations: Existing designs implicitly rely on strict alignment between video frames and frame-dependent pseudo-random binary sequences used for watermark encryption. Once this alignment is disrupted, subsequent watermark extraction becomes unreliable; and Video-specific distortions, such as inter-frame compression, significantly degrade watermark reliability. To address these issues, we propose SKeDA, a generative watermarking framework tailored for text-to-video diffusion models. SKeDA consists of two components: (1) Shuffle-Key-based Distribution-preserving Sampling (SKe) employs a single base pseudo-random binary sequence for watermark encryption and derives frame-level encryption sequences through permutation. This design transforms watermark extraction from synchronization-sensitive sequence decoding into permutation-tolerant set-level aggregation, substantially improving robustness against frame reordering and loss; and (2) Differential Attention (DA), which computes inter-frame differences and dynamically adjusts attention weights during extraction, enhancing robustness against temporal distortions. Extensive experiments demonstrate that SKeDA preserves high video generation quality and watermark robustness.
要約:
This paper studies how well Naturalistic Adversarial Patches (NAPs) transfer to a physical traffic sign setting when the detector is trained on a customized dataset for an autonomous vehicle (AV) environment. We construct a composite dataset, CompGTSRB (which is customized dataset for AV environment), by pasting traffic sign instances from the German Traffic Sign Recognition Benchmark (GTSRB) onto undistorted backgrounds captured from the target platform. CompGTSRB is used to train a YOLOv5 model and generate patches using a Generative Adversarial Network (GAN) with latent space optimization, following existing NAP methods. We carried out a series of experiments on our Quanser QCar testbed utilizing the front CSI camera provided in QCar. Across configurations, NAPs reduce the detector's STOP class confidence. Different configurations include distance, patch sizes, and patch placement. These results along with a detailed step-by-step methodology indicate the utility of CompGTSRB dataset and the proposed systematic physical protocols for credible patch evaluation. The research further motivate researching the defenses that address localized patch corruption in embedded perception pipelines.
要約:
Software engineers use regular expressions (regexes) across a wide range of domains and tasks. To support regexes, software projects must integrate a regex engine, whether provided natively by the language runtime (e.g., Python's re) or included as an external dependency (e.g., PCRE). However, these engines may contain bugs and introduce vulnerabilities. A common strategy for testing regex engines involves differential testing -- comparing outputs across different implementations. However, this approach is concerning because regex syntax and semantics vary significantly between dialects (e.g., POSIX vs. PCRE). Fuzzing is also utilized to ease testing of feature-rich regex implementations to expose defects, but naive byte-level mutations generate syntactically invalid inputs that exercise only parsing logic, not matching internals.
In this work, we describe our progress towards ReTest, a framework that systematically tests regular expression engines by combining grammar-aware fuzzing for high code coverage with metamorphic testing to generate dialect-independent test oracles. So far, we have surveyed testing practices across 22 regex engines, analyzed 1,007 regex engine bugs and 156 CVEs to characterize failure modes, and curated 16 metamorphic relations for regexes derived from Kleene algebra. Our preliminary evaluation on PCRE shows that ReTest achieves 3x higher edge coverage than existing fuzzing approaches and has identified three new memory safety defects. We conclude by describing our next steps toward our ultimate goal: helping regex engine developers identify bugs without depending on a consistent cross-implementation standard.
要約:
Multimodal Large Language Models (MLLMs) have achieved remarkable performance but remain vulnerable to jailbreak attacks that can induce harmful content and undermine their secure deployment. Previous studies have shown that introducing additional inference steps, which disrupt security attention, can make MLLMs more susceptible to being misled into generating malicious content. However, these methods rely on single-image masking or isolated visual cues, which only modestly extend reasoning paths and thus achieve limited effectiveness, particularly against strongly aligned commercial closed-source models. To address this problem, in this paper, we propose Multi-Image Dispersion and Semantic Reconstruction (MIDAS), a multimodal jailbreak framework that decomposes harmful semantics into risk-bearing subunits, disperses them across multiple visual clues, and leverages cross-image reasoning to gradually reconstruct the malicious intent, thereby bypassing existing safety mechanisms. The proposed MIDAS enforces longer and more structured multi-image chained reasoning, substantially increases the model's reliance on visual cues while delaying the exposure of malicious semantics and significantly reducing the model's security attention, thereby improving the performance of jailbreak against advanced MLLMs. Extensive experiments across different datasets and MLLMs demonstrate that the proposed MIDAS outperforms state-of-the-art jailbreak attacks for MLLMs and achieves an average attack success rate of 81.46% across 4 closed-source MLLMs. Our code is available at this [link](https://github.com/Winnie-Lian/MIDAS).
要約:
Agent-native social platforms such as Moltbook are rapidly emerging, yet they inherit and amplify classical influence and abuse attacks, where coordinated agents strategically comment and upvote to manipulate visibility and propagate narratives across communities. However, rigorous measurement and learning-based monitoring remain constrained by the absence of longitudinal, graph-native datasets for agentic social networks that jointly capture heterogeneous interactions, temporal drift, and visibility signals needed to connect coordination behavior to downstream exposure. We introduce MoltGraph as a realistic longitudinal agentic social-network graph dataset for studying how agents behave, coordinate, and evolve in the wild, enabling reproducible measurement on emerging multi-agent social ecosystems.
Using MoltGraph, we provide the first graph-centric characterization of Moltbook as a dynamic network: (i) heavy-tailed connectivity with power-law exponents in the range alpha in [1.86, 2.72], (ii) accelerating hub formation and attention centralization where the top 1% agents account for 29.00% of engagements, (iii) bursty, short-lived coordination episodes, 98.33% last under 24 hours, and (iv) measurable exposure effects across submolts. In matched analyses, posts receiving coordinated engagement exhibit 506.35% higher early interaction rates (within H=5 days) and 242.63% higher downstream exposure in feeds than non-coordinated controls.
要約:
The current state, emerging trends, and practical challenges of optical fiber-based power network SCADA quantum communication must be addressed to fully utilise the technological platform's potential in real-world power system SCADA communications involving massive volumes of real-time data, as well as in managing, encoding, and applications such as quantum cryptography. Quantum key distribution (QKD) is an essential part of the cybersecurity paradigm for quantum communication. Even though quantum computing with individual circuits yields probabilistic outcomes for the problem at hand, real-world datasets are complex and challenging to handle, even with telemetry. When using the cybersecurity triad of availability, confidentiality, and integrity (CIA) in reverse order (AIC), availability is given priority in electric power networks. This research assesses the use of the BB84, E91, B92, and SARG04 cryptographic protocols by applying them to large, multivariate power-system SCADA datasets and comparing the outcomes. By leveraging the variety of QKD protocols available with quantum electronics hardware, this simulation work provides a promising avenue for developing frameworks and deploying SCADA/PMU networks in actual power systems.
要約:
The rapid proliferation of autonomous AI agents is driving a shift toward Machine-to-Machine (M2M) commerce, where software agents are expected to autonomously invoke and pay for Web 2.0 services. While Web 3.0 payments offer a programmable foundation for such interactions, the recently proposed x402 standard fails to enforce end-to-end atomicity across service execution, payment, and result delivery.
In this paper, we present A402, a trust-minimized payment architecture that securely binds Web 3.0 payments to Web 2.0 services. A402 introduces Atomic Service Channels (ASCs), a new channel protocol that integrates service execution into payment channels, enabling real-time, high-frequency micropayments for M2M commerce. Within each ASC, A402 employs an atomic exchange protocol based on TEE-assisted adaptor signatures, ensuring that payments are finalized if and only if the requested service is correctly executed and the corresponding result is delivered. To further ensure privacy, A402 incorporates a TEE-based Liquidity Vault that privately manages the lifecycle of ASCs and aggregates their settlements into a single on-chain transaction, revealing only aggregated balances. We implement A402 and evaluate it against x402 with integrations on both Bitcoin and Ethereum. Our results show that A402 delivers orders-of-magnitude performance and on-chain cost improvements over x402 while providing trust-minimized security guarantees.
要約:
Fine-tuning large language models (LLMs) on custom datasets has become a standard approach for adapting these models to specific domains and applications. However, recent studies have shown that such fine-tuning can lead to significant degradation in the model's safety. Existing defense methods operate at the sample level and often suffer from an unsatisfactory trade-off between safety and utility. To address this limitation, we perform a systematic token-level diagnosis of safety degradation during fine-tuning. Based on this, we propose token-level data selection for safe LLM fine-tuning (TOSS), a novel framework that quantifies the safety risk of each token by measuring the loss difference between a safety-degraded model and a utility-oriented model. This token-level granularity enables accurate identification and removal of unsafe tokens, thereby preserving valuable task-specific information. In addition, we introduce a progressive refinement strategy, TOSS-Pro, which iteratively enhances the safety-degraded model's ability to identify unsafe tokens. Extensive experiments demonstrate that our approach robustly safeguards LLMs during fine-tuning while achieving superior downstream task performance, significantly outperforming existing sample-level defense methods. Our code is available at https://github.com/Polly-LYP/TOSS.
要約:
Hotaru Beam is a logic puzzle which objective is to connect circles placed on a grid by drawing only lines with specified starting points and numbers of bends. A zero-knowledge proof is a communication protocol that allows one player to persuade the other that they are in possession of a certain piece of information without actually revealing it. We show that Hotaru Beam is NP-complete and present a physical zero-knowledge proof (i.e. implementable using physical items) for proving that one knows a solution to the puzzle.
要約:
Large Language Models (LLMs) are increasingly deployed for code generation in high-stakes software development, yet their limited transparency in security reasoning and brittleness to evolving vulnerability patterns raise critical trustworthiness concerns. Models trained on static datasets cannot readily adapt to newly discovered vulnerabilities or changing security standards without retraining, leading to the repeated generation of unsafe code.
We present a principled approach to trustworthy code generation by design that operates as an inference-time safety mechanism. Our approach employs retrieval-augmented generation to surface relevant security risks in generated code and retrieve related security discussions from a curated Stack Overflow knowledge base, which are then used to guide an LLM during code revision. This design emphasizes three aspects relevant to trustworthiness: (1) interpretability, through transparent safety interventions grounded in expert community explanations; (2) robustness, by allowing adaptation to evolving security practices without model retraining; and (3) safety alignment, through real-time intervention before unsafe code reaches deployment.
Across real-world and benchmark datasets, our approach improves the security of LLM-generated code compared to prompting alone, while introducing no new vulnerabilities as measured by static analysis. These results suggest that principled, retrieval-augmented inference-time interventions can serve as a complementary mechanism for improving the safety of LLM-based code generation, and highlight the ongoing value of community knowledge in supporting trustworthy AI deployment.
要約:
Compliance as code is an emerging idea about automating compliance through programmed compliance controls and checks. Given scant existing research thus far, the paper presents an empirical analysis of a compliance as code project addressing open source software (OSS) projects and products. The dataset examined covers a little over 1,500 unique compliance rules designed and implemented for 14 Linux distribution releases from five vendors. According to the results, (1) the coverage of the rules varies across the five vendors. Then, (2) the brief rationales provided for the rules do not exhibit statistical similarities but the short code snippets for these do show similarities to some extent. Furthermore, (3) as many as 24 controls are covered from over 10 different organizations, among them governmental agencies, standardization organizations, and non-profit associations. Finally, (4) the rules can be mapped to the essential cyber security requirements of the Cyber Resilience Act (CRA), although only modest agreement exists among the three authors regarding individual mappings. This observation supports an argument that the compliance as code project studied could be updated with new compliance checks. Given that also operating systems are in the CRA's scope when used in a network-connected product, such an updating would have also practical relevance in the nearby future.
要約:
Every year, millions of functional systems become e-waste because users are pressured to send their systems to landfills due to a lack of vendor support and difficulty in recycling. Vendors cite ``cybersecurity'' as the driver for short product support periods, leading to a prevalent, but uninterrogated, belief that cybersecurity and environmental sustainability are fundamentally contradictory; i.e., it is difficult, if not impossible, to build products that are secure, long-lasting, and reusable. To understand the nuanced relationship between security and sustainability, we systematically analyze 29 papers and distill 155 sustainability guidelines into 12 sustainability themes. These themes enable us to compare the sustainable HCI and sustainable software engineering guidance with that of cybersecurity, identifying points of alignment and tension. We find little evidence of a fundamental tension between these two domains; the few instances of tension can be mitigated through thoughtful consideration of security and sustainability objectives. We also find that sustainability, like usable security, struggles with the myth of users as the weakest link and the individualization of responsibility. Building on these parallels, we argue that the usable security community is well-positioned to integrate sustainability considerations, as both fields share challenges in shifting responsibility from individuals to systemic design.
要約:
We consider an asynchronous network of $n$ parties connected to each other via secure channels, up to $t$ of which are byzantine. We study common coin tossing, a task where the parties try to agree on an unpredictable random value, with some chance of failure due to the byzantine parties' influence. Coin tossing is a well known and often studied task due to its use in byzantine agreement.
In this work, we present an adaptively secure committee-based method to roughly speaking turn strong but costly common coins into cheaper but lower-quality ones. For all $k > 2$ and $\varepsilon > 0$, we show how to use a strong (very rarely failing) coin that costs $\widetilde{O}(n^k)$ bits of communication to get a cheaper coin that costs $\widetilde{O}(\varepsilon^{-2k}n^{3 - 2/k})$ bits of communication. This latter coin tolerates $\varepsilon n$ fewer byzantine parties than the former, and it fails with an arbitrarily small constant probability.
For any $\varepsilon > 0$, our method allows us to get a perfectly secure binary coin that tolerates $t \leq (\frac{1}{4} - \varepsilon)n$ faults with $O(n^{2.5}(\varepsilon^{-8} + \log n))$ messages of size $O(\log n)$, as well as a setup-free cryptographically secure binary coin that tolerates $t \leq (\frac{1}{3} - \varepsilon)n$ faults with $O(n^{7/3}\varepsilon^{-6}\kappa\log n)$ bits of communication (where $\kappa = \Omega(\log n)$ is a cryptographic security paramater). These coins both have $O(\log n)$ latency. They are to our knowledge the first setup-free coins that cost $o(n^3)$ bits of communication but still succeed with at least constant probability against $t = \Theta(n)$ adaptive byzantine faults. As such, they for the first time enable setup-free (and even perfectly secure) asynchronous byzantine agreement with $o(n^3)$ communication against $\Theta(n)$ adaptive byzantine faults.
要約:
Template engines play a pivotal role in modern web application development by enabling the dynamic rendering of content, products, and user interfaces. Today, they are essential for any website that handles dynamic data, from e-commerce to social media. However, their widespread adoption also makes them attractive targets for attackers seeking to exploit vulnerabilities and gain unauthorized access to web servers.
This paper presents a comprehensive assessment of the risks associated with template engines, with a particular focus on the consequences of Server-Side Template Injection (SSTI) and the ease with which such vulnerabilities can escalate to Remote Code Execution (RCE), a critical security concern in web application development.
要約:
Diffusion model-based generative image steganography (DM-GIS) is an emerging paradigm that leverages the generative power of diffusion models to conceal secret messages without requiring pre-existing cover images. In this paper, we identify a fundamental trade-off between stego image quality, steganographic security, and extraction reliability within the DM-GIS framework. Drawing on this insight, we propose \textbf{PA-B2G}, a \textbf{P}rovable and \textbf{A}djustable \textbf{B}it-to-\textbf{G}aussian mapping. Theoretically, PA-B2G guarantees the reversible encoding of arbitrary-length bit sequences into pure Gaussian noise; practically, it enables fine-grained control over the balance between image fidelity, security, and extraction accuracy. By integrating PA-B2G with probability-flow ordinary differential equations (PF-ODEs), we establish a theoretically invertible mapping between secret bitstreams and stego images. PA-B2G is model-agnostic and can be seamlessly integrated into mainstream diffusion models without additional training or fine-tuning, making it also suitable for diffusion model watermarking. Extensive experiments validate our theoretical analysis of the inherent DM-GIS trade-offs and demonstrate that our method flexibly supports arbitrary payloads while achieving competitive image quality and security. Furthermore, our method exhibits strong resilience to lossy processing in watermarking applications, highlighting its practical utility.
要約:
Cyber-attacks are becoming increasingly sophisticated and frequent, highlighting the importance of network intrusion detection systems. This paper explores the potential and challenges of using deep reinforcement learning (DRL) in network intrusion detection. It begins by introducing key DRL concepts and frameworks, such as deep Q-networks and actor-critic algorithms, and reviews recent research utilizing DRL for intrusion detection. The study evaluates challenges related to model training efficiency, detection of minority and unknown class attacks, feature selection, and handling unbalanced datasets. The performance of DRL models is comprehensively analyzed, showing that while DRL holds promise, many recent technologies remain underexplored. Some DRL models achieve state-of-the-art results on public datasets, occasionally outperforming traditional deep learning methods. The paper concludes with recommendations for enhancing DRL deployment and testing in real-world network scenarios, with a focus on Internet of Things intrusion detection. It discusses recent DRL architectures and suggests future policy functions for DRL-based intrusion detection. Finally, the paper proposes integrating DRL with generative methods to further improve performance, addressing current gaps and supporting more robust and adaptive network intrusion detection systems.
要約:
Trajectory collection is essential for location-based services, yet it can reveal highly sensitive information about users, such as daily routines and activities, raising serious privacy concerns. Local Differential Privacy (LDP) offers strong privacy guarantees for users even when the data collector is untrusted. However, existing trajectory collection methods under LDP are largely confined to discrete location spaces, where the size of the location space affects both privacy guarantees and trajectory utility. Moreover, many real-world applications, such as flying trajectories or wearable-sensor traces, naturally operate in continuous spaces, making these discrete-space methods inadequate.
This paper shifts the focus from discrete to continuous spaces for trajectory collection under LDP. We propose two methods: TraCS-D, which perturbs the direction and distance of locations, and TraCS-C, which perturbs the Cartesian coordinates of locations. Both methods are theoretically and experimentally analyzed for trajectory utility in continuous spaces. TraCS can also be applied to discrete spaces by rounding perturbed locations to any discrete space embedded in the continuous space. In this case, the privacy and utility guarantees of TraCS are independent of the number of locations in the space, and each perturbation requires only $\Theta(1)$ time complexity. Evaluation results on discrete location spaces validate the efficiency advantage and demonstrate that TraCS outperforms state-of-the-art methods with improved trajectory utility, particularly for large privacy parameters.
要約:
Large Vision-Language Models (LVLMs) exhibit impressive potential across various tasks but also face significant privacy risks, limiting their practical applications. Current researches on privacy assessment for LVLMs is limited in scope, with gaps in both assessment dimensions and privacy categories. To bridge this gap, we propose Multi-PA, a comprehensive benchmark for evaluating the privacy preservation capabilities of LVLMs in terms of privacy awareness and leakage. Privacy awareness measures the model's ability to recognize the privacy sensitivity of input data, while privacy leakage assesses the risk of the model unintentionally disclosing privacy information in its output. We design a range of sub-tasks to thoroughly evaluate the model's privacy protection offered by LVLMs. Multi-PA covers 26 categories of personal privacy, 15 categories of trade secrets, and 18 categories of state secrets, totaling 31,962 samples. Based on Multi-PA, we evaluate the privacy preservation capabilities of 21 open-source and 2 closed-source LVLMs. Our results reveal that current LVLMs generally pose a high risk of facilitating privacy breaches, with vulnerabilities varying across personal privacy, trade secret, and state secret.
要約:
Gradient sparsification, while mitigating communication bottlenecks in Federated Learning (FL), fundamentally alters the geometric landscape of model updates. We reveal that the resultant high-dimensional orthogonality renders traditional Euclidean-based robust aggregation metrics mathematically ambiguous, creating a 'sparsity-robustness trade-off' that adversaries exploit to bypass detection. To resolve this structural dissonance, we propose SafeSparse, a consensus restoration framework that decouples defense into topological and semantic dimensions. Unlike prior arts that treat sparsification and security orthogonally, SafeSparse introduces: (1) a Structure-Aware Calibration mechanism utilizing Jaccard similarity to filter topological outliers induced by index poisoning; and (2) a Directional Semantic Alignment module employing density-based clustering on update signs to neutralize magnitude-invariant attacks. Theoretically, we establish convergence guarantees for SafeSparse. Extensive experiments across multiple datasets and attack scenarios demonstrate that SafeSparse recovers up to 25.7% global accuracy under coordinated poisoning, effectively closing the vulnerability gap in communication-efficient FL.
要約:
Matrix factorization mechanisms for differentially private training have emerged as a promising approach to improve model utility under privacy constraints. In practical settings, models are typically trained over multiple epochs, requiring matrix factorizations that account for repeated participation. Existing theoretical upper and lower bounds on multi-epoch factorization error leave a significant gap. In this work, we introduce a new explicit factorization method, Banded Inverse Square Root (BISR), which imposes a banded structure on the inverse correlation matrix. This factorization enables us to derive an explicit and tight characterization of the multi-epoch error. We further prove that BISR achieves asymptotically optimal error by matching the upper and lower bounds. Empirically, BISR performs on par with state-of-the-art factorization methods, while being simpler to implement, computationally efficient, and easier to analyze.
要約:
In the era of rapid generative AI development, interactions with large language models (LLMs) pose increasing risks of misuse. Prior research has primarily focused on attacks using template-based prompts and optimization-oriented methods, while overlooking the fact that LLMs possess strong unconstrained deceptive capabilities to attack other LLMs. This paper introduces a novel jailbreaking method inspired by the Chain-of-Thought mechanism. The attacker employs mission transfer to conceal harmful user intent within dialogue and generates a progressive chain of lure questions without relying on predefined templates, enabling successful jailbreaks. To further improve the attack's strength, we incorporate a helper LLM model that performs randomized narrative optimization over multi-turn interactions, enhancing the attack performance while preserving alignment with the original intent. We also propose a toxicity-based framework using third-party LLMs to evaluate harmful content and its alignment with malicious intent. Extensive experiments demonstrate that our method consistently achieves high attack success rates and elevated toxicity scores across diverse types of LLMs under black-box API settings. These findings reveal the intrinsic potential of LLMs to perform unrestricted attacks in the absence of robust alignment constraints. Our approach offers data-driven insights to inform the design of future alignment mechanisms. Finally, we propose two concrete defense strategies to support the development of safer generative models. Our code is available at https://github.com/ChangWenhan/chain-of-lure-official
要約:
Large Audio Language Models (LALMs) have made significant progress. While increasingly deployed in real-world applications, LALMs face growing safety risks from jailbreak attacks that bypass safety alignment. However, there remains a lack of an adversarial audio dataset and a unified framework specifically designed to evaluate and compare jailbreak attacks against them. To address this gap, we introduce JALMBench, a comprehensive benchmark that assesses LALM safety against jailbreak attacks, comprising 11,316 text samples and 245,355 audio samples (>1,000 hours). JALMBench supports 12 mainstream LALMs, 8 attack methods (4 text-transferred and 4 audio-originated), and 5 defenses. We conduct in-depth analysis on attack efficiency, topic sensitivity, voice diversity, and model architecture. Additionally, we explore mitigation strategies for the attacks at both the prompt and response levels. Our systematic evaluation reveals that LALMs' safety is strongly influenced by modality and architectural choices: text-based safety alignment can partially transfer to audio inputs, and interleaved audio-text strategies enable more robust cross-modal generalization. Existing general-purpose moderation methods only slightly improve security, highlighting the need for defense methods specifically designed for LALMs. We hope our work can shed light on the design principles for building more robust LALMs.
要約:
As Retrieval-Augmented Generation (RAG) evolves into service-oriented platforms (Rag-as-a-Service) with shared knowledge bases, protecting the copyright of contributed data becomes essential. Existing watermarking methods in RAG focus solely on textual knowledge, leaving image knowledge unprotected. In this work, we propose AQUA, the first watermark framework for image knowledge protection in Multimodal RAG systems. AQUA embeds semantic signals into synthetic images using two complementary methods: acronym-based triggers and spatial relationship cues. These techniques ensure watermark signals survive indirect watermark propagation from image retriever to textual generator, being efficient, effective and imperceptible. Experiments across diverse models and datasets show that AQUA enables robust, stealthy, and reliable copyright tracing, filling a key gap in multimodal RAG protection.
要約:
Local differential privacy (LDP) offers rigorous, quantifiable privacy guarantees for personal data by introducing perturbations at the data source. Understanding how these perturbations affect classifier utility is crucial for both designers and users. However, a general theoretical framework for quantifying this impact is lacking and also challenging, especially for complex or black-box classifiers.
This paper presents a unified framework for theoretically quantifying classifier utility under LDP mechanisms. The key insight is that LDP perturbations are concentrated around the original data with a specific probability, allowing utility analysis to be reframed as robustness analysis within this concentrated region. Our framework thus connects the concentration properties of LDP mechanisms with the robustness of classifiers, treating LDP mechanisms as general distributional functions and classifiers as black boxes. This generality enables applicability to any LDP mechanism and classifier. A direct application of our utility quantification is guiding the selection of LDP mechanisms and privacy parameters for a given classifier. Notably, our analysis shows that a piecewise-based mechanism often yields better utility than alternatives in common scenarios.
Beyond the core framework, we introduce two novel refinement techniques that further improve utility quantification. We then present case studies illustrating utility quantification for various combinations of LDP mechanisms and classifiers. Results demonstrate that our theoretical quantification closely matches empirical observations, particularly when classifiers operate in lower-dimensional input spaces.
要約:
The widespread deployment of LLMs across enterprise services has created a critical security blind spot. Organizations operate multiple LLM services handling billions of queries daily, yet regulatory compliance boundaries prevent these services from sharing threat intelligence about prompt injection attacks, the top security risk for LLMs. When an attack is detected in one service, the same threat may persist undetected in others for months, as privacy regulations prohibit sharing user prompts across compliance boundaries.
We present BinaryShield, \emph{the first privacy-preserving threat intelligence system that enables secure sharing of attack fingerprints across compliance boundaries.} BinaryShield transforms suspicious prompts through a unique pipeline combining PII redaction, semantic embedding, binary quantization, and randomized response mechanism to potentially generate privacy-preserving fingerprints that preserve attack patterns while providing privacy. Our evaluations demonstrate that BinaryShield achieves an F1-score of 0.94, significantly outperforming SimHash (0.77), the privacy-preserving baseline, while achieving storage reduction and 38x faster similarity search compared to dense embeddings.
要約:
Semantic-level watermarking (SWM) for large language models (LLMs) enhances watermarking robustness against text modifications and paraphrasing attacks by treating the sentence as the fundamental unit. However, existing methods still lack strong theoretical guarantees of robustness, and reject-sampling-based generation often introduces significant distribution distortions compared with unwatermarked outputs. In this work, we introduce a new theoretical framework on SWM through the concept of proxy functions (PFs) $\unicode{x2013}$ functions that map sentences to scalar values. Building on this framework, we propose PMark, a simple yet powerful SWM method that estimates the PF median for the next sentence dynamically through sampling while enforcing multiple PF constraints (which we call channels) to strengthen watermark evidence. Equipped with solid theoretical guarantees, PMark achieves the desired distortion-free property and improves the robustness against paraphrasing-style attacks. We also provide an empirically optimized version that further removes the requirement for dynamical median estimation for better sampling efficiency. Experimental results show that PMark consistently outperforms existing SWM baselines in both text quality and robustness, offering a more effective paradigm for detecting machine-generated text. Our code will be released at [this URL](https://github.com/PMark-repo/PMark).
要約:
Leaderboards for LRMs have turned evaluation into a competition, incentivizing developers to optimize directly on benchmark suites. A shortcut to achieving higher rankings is to incorporate evaluation benchmarks into the training data, thereby yielding inflated performance, known as benchmark contamination. Surprisingly, our studies find that evading contamination detections for LRMs is alarmingly easy. We focus on the two scenarios where contamination may occur in practice: (I) when the base model evolves into LRM via SFT and RL, we find that contamination during SFT can be originally identified by contamination detection methods. Yet, even a brief GRPO training can markedly conceal contamination signals that most detection methods rely on. Further empirical experiments and theoretical analysis indicate that PPO style importance sampling and clipping objectives are the root cause of this detection concealment, indicating that a broad class of RL methods may inherently exhibit similar concealment capability; (II) when SFT contamination with CoT is applied to advanced LRMs as the final stage, most contamination detection methods perform near random guesses. Without exposure to non-members, contaminated LRMs would still have more confidence when responding to those unseen samples that share similar distributions to the training set, and thus, evade existing memorization-based detection methods. Together, our findings reveal the unique vulnerability of LRMs evaluations: Model developers could easily contaminate LRMs to achieve inflated leaderboards performance while leaving minimal traces of contamination, thereby strongly undermining the fairness of evaluation and threatening the integrity of public leaderboards. This underscores the urgent need for advanced contamination detection methods and trustworthy evaluation protocols tailored to LRMs.
要約:
Existing gradient-based jailbreak attacks on Large Language Models (LLMs) typically optimize adversarial suffixes to align the LLM output with predefined target responses. However, restricting the objective as inducing fixed targets inherently constrains the adversarial search space, limiting the overall attack efficacy. Furthermore, existing methods typically require numerous optimization iterations to fulfill the large gap between the fixed target and the original LLM output, resulting in low attack efficiency. To overcome these limitations, we propose the first gradient-based untargeted jailbreak attack (UJA), which relies on an untargeted objective to maximize the unsafety probability of the LLM output, without enforcing any response patterns. For tractable optimization, we further decompose this objective into two differentiable sub-objectives to search the optimal harmful response and the corresponding adversarial prompt, with a theoretical analysis to validate the decomposition. In contrast to existing attacks, UJA's unrestricted objective significantly expands the search space, enabling more flexible and efficient exploration of LLM vulnerabilities. Extensive evaluations show that UJA achieves over 80\% attack success rates against recent safety-aligned LLMs with only 100 optimization iterations, outperforming the state-of-the-art gradient-based attacks by over 30\%.
要約:
We analyze answers generated by generative engines (GEs) from the perspectives of citation publishers and the content-injection barrier, defined as the difficulty for attackers to manipulate answers to user prompts by placing malicious content on the web. GEs integrate two functions: web search and answer generation that cites web pages using large language models. Because anyone can publish information on the web, GEs are vulnerable to poisoning attacks. Existing studies of citation evaluation focus on how faithfully answer content reflects cited sources, leaving unexamined which web sources should be selected as citations to defend against poisoning attacks. To fill this gap, we introduce evaluation criteria that assess poisoning threats using the citation information contained in answers. Our criteria classify the publisher attributes of citations to estimate the content-injection barrier thereby revealing the threat of poisoning attacks in current GEs. We conduct experiments in political domains in Japan and the United States (U.S.) using our criteria and show that citations from official party websites (primary sources) are approximately \(25\%\)--\(45\%\) in the U.S. and \(60\%\)--\(65\%\) in Japan, indicating that U.S. political answers are at higher risk of poisoning attacks. We also find that sources with low content-injection barriers are frequently cited yet are poorly reflected in answer content. To mitigate this threat, we discuss how publishers of primary sources can increase exposure of their web content in answers and show that well-known techniques are limited by language differences.
要約:
Web measurements are a well-established methodology for assessing the security and privacy landscape of the Internet. However, existing top lists of popular websites commonly used as measurement targets are unlabeled and lack semantic information about the nature of the sites they include. This limitation makes targeted measurements challenging, as researchers often need to rely on ad-hoc techniques to bias their datasets toward specific categories of interest. In this paper, we investigate the use of Large Language Models (LLMs) as a means to enable targeted web measurement studies through their semantic understanding capabilities. Building on prior literature, we identify key website classification tasks relevant to web measurements and construct datasets to systematically evaluate the performance of different LLMs on these tasks. Our results demonstrate that LLMs may achieve strong performance across multiple classification scenarios. We then conduct LLM-assisted web measurement studies inspired by prior work and rigorously assess the validity of the resulting research inferences. Our results demonstrate that LLMs can serve as a practical tool for analyzing security and privacy trends on the Web.
要約:
The Synthetic Minority Over-sampling Technique (SMOTE) is one of the most widely used methods for addressing class imbalance and generating synthetic data. Despite its popularity, little attention has been paid to its privacy implications; yet, it is used in the wild in many privacy-sensitive applications. In this work, we conduct the first systematic study of privacy leakage in SMOTE: we begin by showing that prevailing evaluation practices, i.e., naive distinguishing and distance-to-closest-record metrics, completely fail to detect any leakage and that membership inference attacks (MIAs) can be instantiated with high accuracy. Then, by exploiting SMOTE's geometric properties, we build two novel attacks with very limited assumptions: DistinSMOTE, which perfectly distinguishes real from synthetic records in augmented datasets, and ReconSMOTE, which reconstructs real minority records from synthetic datasets with perfect precision and recall approaching one under realistic imbalance ratios. We also provide theoretical guarantees for both attacks. Experiments on eight standard imbalanced datasets confirm the practicality and effectiveness of these attacks. Overall, our work reveals that SMOTE is inherently non-private and disproportionately exposes minority records, highlighting the need to reconsider its use in privacy-sensitive applications and as a baseline for assessing the privacy of modern generative models.
要約:
Neural networks increasingly run on hardware outside the user's control (cloud GPUs, inference marketplaces). Yet ML-as-a-Service reveals little about what actually ran or whether returned outputs faithfully reflect the intended inputs. Users lack recourse against service downgrades (model swaps, quantization, graph rewrites, or discrepancies like altered ad embeddings). Verifying outputs is hard because floating-point(FP) execution on heterogeneous accelerators is inherently nondeterministic. Existing approaches are either impractical for real FP neural networks or reintroduce vendor trust. We present TAO: a Tolerance Aware Optimistic verification protocol that accepts outputs within principled operator-level acceptance regions rather than requiring bitwise equality. TAO combines two error models: (i) sound per-operator IEEE-754 worst-case bounds and (ii) tight empirical percentile profiles calibrated across hardware. Discrepancies trigger a Merkle-anchored, threshold-guided dispute game that recursively partitions the computation graph until one operator remains, where adjudication reduces to a lightweight theoretical-bound check or a small honest-majority vote against empirical thresholds. Unchallenged results finalize after a challenge window, without requiring trusted hardware or deterministic kernels. We implement TAO as a PyTorch-compatible runtime and a contract layer currently deployed on Ethereum Holesky testnet. The runtime instruments graphs, computes per-operator bounds, and runs unmodified vendor kernels in FP32 with negligible overhead (0.3% on Qwen3-8B). Across CNNs, Transformers and diffusion models on A100, H100, RTX6000, RTX4090, empirical thresholds are $10^2-10^3$ times tighter than theoretical bounds, and bound-aware adversarial attacks achieve 0% success. Together, TAO reconciles scalability with verifiability for real-world heterogeneous ML compute.
要約:
AI agents can autonomously perform tasks and, often without explicit user consent, collect or disclose users' sensitive local data, which raises serious privacy concerns. Although AI agents' privacy policies describe their intended data practices, there remains limited transparency and accountability about whether runtime behavior matches those policies. To bridge this gap, we present AudAgent, a tool that continuously monitors AI agents' data practices in real time and guards compliance with their stated privacy policies.
AudAgent comprises four components for automated privacy auditing of AI agents. (i) Policy formalization: a novel cross-LLM voting mechanism that ensures high-confidence parsing of privacy policies into formal models. (ii) Runtime annotation: a lightweight Presidio-based analyzer that detects sensitive data and annotates data practices based on the AI agent's context and the formalized privacy policy model. (iii) Compliance auditing: ontology graphs and automata-based checking that link the privacy policy model with runtime annotations, enabling on-the-fly compliance verification. (iv) User interface: an infrastructure-independent implementation that visualizes the real-time execution trace of AI agents alongside detected privacy violations, providing user-friendly transparency and accountability.
We evaluate AudAgent on AI agents built with mainstream frameworks, demonstrating its effectiveness in detecting and visualizing privacy policy violations. Using AudAgent, we further find that many privacy policies lack explicit safeguards for highly sensitive data such as SSNs, whose misuse violates legal requirements, and that many agents, including those powered by Claude, Gemini, and DeepSeek,do not refuse to process such data via third-party tools. AudAgent proactively blocks operations on such data, overriding the agents' original privacy policies and behavior.
要約:
A Covert Channel (CC) exploits legitimate communication mechanisms to stealthily transmit information, often bypassing traditional security controls. Among these, a novel paradigm called History Covert Channels (HCC) leverages past network events as reference points to embed covert messages. Unlike traditional timing- or storage-based CCs, which directly manipulate traffic patterns or packet contents, HCCs minimize detectability by encoding information through small pointers to historical data. This approach enables them to amplify the size of transmitted covert data by referring to more bits than are actually embedded. Recent research has explored the feasibility of such methods, demonstrating their potential to evade detection by repurposing naturally occurring network behaviors as a covert transmission medium.
This paper introduces a novel method for establishing and maintaining covert communication links using relative pointers to network timing patterns, which minimizes the reliance of the HCC on centralized timekeeping and reduces the likelihood of being detected by standard network monitoring tools. We also explore the tailoring of HCCs to optimize their robustness and undetectability characteristics. Our experiments reveal a better bitrate compared to previous work.
要約:
Federated Learning (FL) enables collaborative, privacy-preserving model training, but supporting the "Right to be Forgotten" is especially challenging because data influences the model through distributed and interleaved client updates. Existing exact unlearning methods typically require frequent retraining from scratch, resulting in high communication cost and long service downtime. To address this, we propose Federated Sequential Group-based Training (FedSGT), an exact unlearning framework for FL. FedSGT partitions the data into uniform groups, and each client may participate in multiple groups. To control communication overhead, each client can limit the number of groups it contributes to. FedSGT then trains multiple sequences of Parameter-Efficient Fine-Tuning (PEFT) modules, each corresponding to a different group permutation. Since the PEFT modules are lightweight and maintained server-side, FedSGT isolates the influence of different data groups into independent modules without incurring significant storage overhead and communication cost. Exact unlearning is thus achieved instantly by deactivating the modules corresponding to the group containing the unlearned data. Furthermore, using multiple training sequences helps maintain high model utility as deletion requests accumulate. We provide a rigorous theoretical analysis of both the deletion rate -- expected number of deletions before retraining is needed -- and the expected model performance. Experiments on various tasks demonstrate that FedSGT achieves a significantly longer service maintenance under multiple unlearning requests while maintaining comparable learning performance and training efficiency to other exact unlearning baselines. Extensive ablation studies validate the robustness of our method across a wide range of parameter settings.
要約:
Cyber threat intelligence (CTI) encoded in STIX and structured according to the MITRE ATT&CK framework has become a global reference for describing adversary behavior. However, ATT&CK was designed as a descriptive knowledge base rather than a procedural model. We therefore ask whether its structured artifacts contain sufficient behavioral detail to support multi-stage adversary emulation. Through systematic measurements of the ATT&CK Enterprise bundle, we show that campaign objects encode just fragmented slices of behavior. Only 35.6% of techniques appear in at least one campaign, and neither clustering nor sequence analysis reveals any reusable behavioral structure under technique overlap or LCS-based analyses. Intrusion sets cover a broader portion of the technique space, yet omit the procedural semantics required to transform behavioral knowledge into executable chains, including ordering, preconditions, and environmental assumptions. These findings reveal a procedural semantic gap in current CTI standards: they describe what adversaries do, but not exactly how that behavior was operationalized.
To assess how far this gap can be bridged in practice, we introduce a three-stage methodology that translates behavioral information from structured CTI into executable steps and makes the necessary environmental assumptions explicit. We demonstrate its viability by instantiating the resulting steps as operations in the MITRE Caldera framework. Case studies of ShadowRay and Soft Cell show that structured CTI can enable the emulation of multi-stage APT campaigns, but only when analyst-supplied parameters and assumptions are explicitly recorded. This, in turn, exposes the precise points at which current standards fail to support automation. Our results clarify the boundary between descriptive and machine-actionable CTI for adversary emulation.
要約:
The Intelligence Advanced Research Projects Activity (IARPA) launched the TrojAI program to confront an emerging vulnerability in modern artificial intelligence: the threat of AI Trojans. These AI trojans are malicious, hidden backdoors intentionally embedded within an AI model that can cause a system to fail in unexpected ways, or allow a malicious actor to hijack the AI model at will. This multi-year initiative helped to map out the complex nature of the threat, pioneered foundational detection methods, and identified unsolved challenges that require ongoing attention by the burgeoning AI security field. This report synthesizes the program's key findings, including methodologies for detection through weight analysis and trigger inversion, as well as approaches for mitigating Trojan risks in deployed models. Comprehensive test and evaluation results highlight detector performance, sensitivity, and the prevalence of "natural" Trojans. The report concludes with lessons learned and recommendations for advancing AI security research.
要約:
GPU memory errors are a critical threat to deep learning (DL) frameworks, leading to crashes or even security issues. We introduce GPU-Fuzz, a fuzzer locating these issues efficiently by modeling operator parameters as formal constraints. GPU-Fuzz utilizes a constraint solver to generate test cases that systematically probe error-prone boundary conditions in GPU kernels. Applied to PyTorch, TensorFlow, and PaddlePaddle, we uncovered 13 unknown bugs, demonstrating the effectiveness of GPU-Fuzz in finding memory errors.
要約:
Database-driven cognitive radio networks (DB-CRNs) enable dynamic spectrum sharing through geolocation databases but introduce critical security and privacy challenges, including mandatory location disclosure, susceptibility to location spoofing, and denial-of-service (DoS) attacks on centralized services. Existing approaches address these issues in isolation and lack a unified, regulation-compliant solution under realistic adversarial conditions. In this work, we present a unified security framework for DB-CRNs that simultaneously provides location privacy, user anonymity, verifiable location, and DoS resilience. Our framework, denoted as SLAPX, enables privacy-preserving spectrum queries using delegatable anonymous credentials, supports adaptive location verification without revealing precise user location, and mitigates DoS attacks through verifiable delay functions (VDFs) combined with RLRS-based rate limiting. Extensive cryptographic benchmarking and network simulations demonstrate that SLAPX achieves significantly lower latency and communication overhead than existing solutions while effectively resisting location spoofing and DoS attacks. These results show that SLAPX is practical and well-suited for secure next-generation DB-CRN deployments.
要約:
The prevalence of malicious packages in open-source repositories, such as PyPI, poses a critical threat to the software supply chain. While Large Language Models (LLMs) have emerged as a promising tool for automated security tasks, their effectiveness in detecting malicious packages and indicators remains underexplored. This paper presents a systematic evaluation of 13 LLMs for detecting malicious software packages. Using a curated dataset of 4,070 packages (3,700 benign and 370 malicious), we evaluate model performance across two tasks: binary classification (package detection) and multi-label classification (identification of specific malicious indicators). We further investigate the impact of prompting strategies, temperature settings, and model specifications on detection accuracy. We find a significant "granularity gap" in LLMs' capabilities. While GPT-4.1 achieves near-perfect performance in binary detection (F1 $\approx$ 0.99), performance degrades by approximately 41\% when the task shifts to identifying specific malicious indicators. We observe that general models are best for filtering out the majority of threats, while specialized coder models are better at detecting attacks that follow a strict, predictable code structure. Our correlation analysis indicates that parameter size and context width have negligible explanatory power regarding detection accuracy. We conclude that while LLMs are powerful detectors at the package level, they lack the semantic depth required for precise identification at the granular indicator level.
要約:
Image-generative models are widely deployed across industries. Recent studies show that they can be exploited to produce policy-violating content. Existing mitigation strategies primarily operate at the pre- or mid-generation stages through techniques such as prompt filtering and safety-aware training/fine-tuning. Prior work shows that these approaches can be bypassed and often degrade generative quality. In this work, we propose ReVision, a training-free, prompt-based, post-hoc safety framework for image-generation pipeline. ReVision acts as a last-line defense by analyzing generated images and selectively editing unsafe concepts without altering the underlying generator. It uses the Gemini-2.5-Flash model as a generic policy-violating concept detector, avoiding reliance on multiple category-specific detectors, and performs localized semantic editing to replace unsafe content. Prior post-hoc editing methods often rely on imprecise spatial localization, that undermines usability and limits deployability, particularly in multi-concept scenes. To address this limitation, ReVision introduces a VLM-assisted spatial gating mechanism that enforces instance-consistent localization, enabling precise edits while preserving scene integrity. We evaluate ReVision on a 245-image benchmark covering both single- and multi-concept scenarios. Results show that ReVision (i) improves CLIP-based alignment toward safe prompts by +$0.121$ on average; (ii) significantly improves multi-concept background fidelity (LPIPS $0.166 \rightarrow 0.058$); (iii) achieves near-complete suppression on category-specific detectors (e.g., NudeNet $70.51 \rightarrow 0$); and (iv) reduces policy-violating content recognizability in a human moderation study from $95.99\%$ to $10.16\%$.
要約:
The learning parity with noise (LPN) problem is a well-established computational challenge whose difficulty is critical to the security of several post-quantum cryptographic primitives such as HQC and Classic McEliece. Classically, the best-known attacks involve information set decoding methods which are exponential in complexity for parameterisations of interest. In this paper we investigate whether quantum methods might offer alternative approaches. The line of inquiry is inspired by Regev's relating of certain lattice problems to the hidden dihedral subgroup problem. We use neighbourhoods of binary fields to produce a function close to fulfilling Simon's promise with difference equal to the secret parity vector. Although unlikely to recover the secret parity vector directly, running Simon's algorithm essentially produces new LPN samples. This gives the hope that we might be able to produce enough new samples to ignore one or more variables and iteratively reduce the problem.
We make no claim that these methods will necessarily be competitive with existing approaches, merely that they warrant deeper investigation.
要約:
Jailbreak techniques for large language models (LLMs) evolve faster than benchmarks, making robustness estimates stale and difficult to compare across papers due to drift in datasets, harnesses, and judging protocols. We introduce JAILBREAK FOUNDRY (JBF), a system that addresses this gap via a multi-agent workflow to translate jailbreak papers into executable modules for immediate evaluation within a unified harness. JBF features three core components: (i) JBF-LIB for shared contracts and reusable utilities; (ii) JBF-FORGE for the multi-agent paper-to-module translation; and (iii) JBF-EVAL for standardizing evaluations. Across 30 reproduced attacks, JBF achieves high fidelity with a mean (reproduced-reported) attack success rate (ASR) deviation of +0.26 percentage points. By leveraging shared infrastructure, JBF reduces attack-specific implementation code by nearly half relative to original repositories and achieves an 82.5% mean reused-code ratio. This system enables a standardized AdvBench evaluation of all 30 attacks across 10 victim models using a consistent GPT-4o judge. By automating both attack integration and standardized evaluation, JBF offers a scalable solution for creating living benchmarks that keep pace with the rapidly shifting security landscape.
要約:
In federated learning (FL), profiling and verifying each client is inherently difficult, which introduces a significant security vulnerability: malicious clients, commonly referred to as Byzantines, can degrade the accuracy of the global model by submitting poisoned updates during training. To mitigate this, the aggregation process at the parameter server must be robust against such adversarial behaviour. Most existing defences approach the Byzantine problem from an outlier detection perspective, treating malicious updates as statistical anomalies and ignoring the internal structure of the trained neural network (NN). Motivated by this, this work highlights the potential of leveraging side information tied to the NN architecture to design stronger, more targeted attacks. In particular, inspired by insights from sparse NNs, we introduce a hybrid sparse Byzantine attack. The attack consists of two coordinated components: (i) A sparse attack component that selectively manipulates parameters with higher sensitivity in the NN, aiming to cause maximum disruption with minimal visibility; (ii) A slow-accumulating attack component that silently poisons parameters over multiple rounds to evade detection. Together, these components create a strong but imperceptible attack strategy that can bypass common defences. We evaluate the proposed attack through extensive simulations and demonstrate its effectiveness against eight state-of-the-art defence mechanisms.
要約:
Large Language Models (LLMs) are widely used for automated code generation. Their reliance on infrequently updated pretraining data leaves them unaware of newly discovered vulnerabilities and evolving security standards, making them prone to producing insecure code. In contrast, developer communities on Stack Overflow (SO) provide an ever-evolving repository of knowledge, where security vulnerabilities are actively discussed and addressed through collective expertise. These community-driven insights remain largely untapped by LLMs. This paper introduces SOSecure, a Retrieval-Augmented Generation (RAG) system that leverages the collective security expertise found in SO discussions to improve the security of LLM-generated code. We build a security-focused knowledge base by extracting SO answers and comments that explicitly identify vulnerabilities. Unlike common uses of RAG, SOSecure triggers after code has been generated to find discussions that identify flaws in similar code. These are used in a prompt to an LLM to consider revising the code. Evaluation across three datasets (SALLM, LLMSecEval, and LMSys) show that SOSecure achieves strong fix rates of 71.7%, 91.3%, and 96.7% respectively, compared to prompting GPT-4 without relevant discussions (49.1%, 56.5%, and 37.5%), and outperforms multiple other baselines. SOSecure operates as a language-agnostic complement to existing LLMs, without requiring retraining or fine-tuning, making it easy to deploy. Our results underscore the importance of maintaining active developer forums, which have dropped substantially in usage with LLM adoptions.
要約:
Harmful fine-tuning attacks pose a major threat to the security of large language models (LLMs), allowing adversaries to compromise safety guardrails with minimal harmful data. While existing defenses attempt to reinforce LLM alignment, they fail to address models' inherent "trainability" on harmful data, leaving them vulnerable to stronger attacks with increased learning rates or larger harmful datasets. To overcome this critical limitation, we introduce SEAM, a novel alignment-enhancing defense that transforms LLMs into self-destructive models with intrinsic resilience to misalignment attempts. Specifically, these models retain their capabilities for legitimate tasks while exhibiting substantial performance degradation when fine-tuned on harmful data. The protection is achieved through a novel loss function that couples the optimization trajectories of benign and harmful data, enhanced with adversarial gradient ascent to amplify the self-destructive effect. To enable practical training, we develop an efficient Hessian-free gradient estimate with theoretical error bounds. Extensive evaluation across LLMs and datasets demonstrates that SEAM creates a no-win situation for adversaries: the self-destructive models achieve state-of-the-art robustness against low-intensity attacks and undergo catastrophic performance collapse under high-intensity attacks, rendering them effectively unusable. The code is available: https://github.com/ZJUWYH/seam. (Warning: this paper contains potentially harmful content generated by LLMs.)
要約:
Computer-Use Agents (CUAs) with full system access enable powerful task automation but pose significant security and privacy risks due to their ability to manipulate files, access user data, and execute arbitrary commands. While prior work has focused on browser-based agents and HTML-level attacks, the vulnerabilities of CUAs remain underexplored. In this paper, we investigate Visual Prompt Injection (VPI) attacks, where malicious instructions are visually embedded within rendered user interfaces, and examine their impact on both CUAs and Browser-Use Agents (BUAs). We propose VPI-Bench, a benchmark of 306 test cases across five widely used platforms, to evaluate agent robustness under VPI threats. Each test case is a variant of a web platform, designed to be interactive, deployed in a realistic environment, and containing a visually embedded malicious prompt. Our empirical study shows that current CUAs and BUAs can be deceived at rates of up to 51% and 100%, respectively, on certain platforms. The experimental results also indicate that system prompt defenses offer only limited improvements. These findings highlight the need for robust, context-aware defenses to ensure the safe deployment of multimodal AI agents in real-world environments. The code and dataset are available at: https://github.com/cua-framework/agents
要約:
With a growing interest in securing user data within the internet-of-things (IoT), embedded encryption has become of paramount importance, requiring light-weight high-quality Random Number Generators (RNGs). Emerging stochastic device technologies produce random numbers from stochastic physical processes at high quality, however, their generated random number streams are adversely affected by process and supply voltage variations, which can lead to bias in the generated streams. In this work, we present an adaptive variation-resilient RNG capable of extracting unbiased encryption-grade random number streams from physically driven entropy sources, for embedded cryptography applications. The system's key feature is its adaptive digitizer with an adaptive reference voltage. As a proof of concept, we employ a stochastic magnetic tunnel junction (sMTJ) device as an entropy source. The impact of variations in the sMTJ is mitigated by the adaptive digitizer, which generates an adaptive short-term average reference voltage that dynamically tracks any stochastic signal drift or deviation, leading to unbiased random bit stream generation. The generated bit streams, due to their higher entropy, then only need to undergo simplified post-processing. A prototype of the adaptive RNG system was experimentally implemented using discrete electronic components and an FPGA for entropy source emulation. Statistical randomness tests based on the National Institute of Standards and Technology (NIST) test suite are conducted on bit streams obtained using the simulations as well as the discrete electronic component implementation, demonstrating that the bit streams consistently pass all 16 tests of the NIST SP 800-22 test suite with a 100% pass rate. Leveraging its simplified post-processing, the adaptive RNG shows consistent operation across a wide range of throughputs from 5 to 182 Mbps.
要約:
Large Language Models (LLMs) are gaining traction as a method to generate consensus statements and aggregate preferences in digital democracy experiments. Yet, LLMs could introduce critical vulnerabilities in these systems. Here, we examine the vulnerability and robustness of off-the-shelf consensus-generating LLMs to prompt-injection attacks, in which texts are injected to amplify particular viewpoints, erase certain opinions, or divert consensus toward unrelated or irrelevant topics. We construct attack-free and adversarial variants of prompts containing public policy questions and opinion texts, classify opinion and consensus valences with a fine-tuned BERT model, and estimate LLM-human majority agreement rates. Across topics, default LLaMA 3.1 8B Instruct, GPT-4.1 Nano, and Apertus 8B exhibit widespread vulnerability, specially when disagreement and disagreement are finely balanced, for attacks that shift consensus toward positions aligned with GB-unionist conservative manifestos relative to pro-independence left manifestos, and for rational, instruction-like rhetorical strategies. A robustness pipeline combining GPT-OSS-SafeGuard injection detection, structured opinion representations, and GSPO-based reinforcement learning substantially reduces directional failures whenever the underlying consensus has a clear positive or negative valence. These findings advance our understanding of both the vulnerabilities and the potential defenses of consensus-generating LLMs in digital democracy applications.
要約:
AI control protocols serve as a defense mechanism to stop untrusted LLM agents from causing harm in autonomous settings. Prior work treats this as a security problem, stress testing with exploits that use the deployment context to subtly complete harmful side tasks, such as backdoor insertion. In practice, most AI control protocols are fundamentally based on LLM monitors, which can become a central point of failure. We study adaptive attacks by an untrusted model that knows the protocol and the monitor model, which is plausible if the untrusted model was trained with a later knowledge cutoff or can search for this information autonomously. We instantiate a simple adaptive attack vector by which the attacker embeds publicly known or zero-shot prompt injections in the model outputs. Using this tactic, frontier models consistently evade diverse monitors and complete malicious tasks on two main AI control benchmarks. The attack works universally against current protocols that rely on a monitor. Furthermore, the recent Defer-to-Resample protocol even backfires, as its resampling amplifies the prompt injection and effectively reframes it as a best-of-$n$ attack. In general, adaptive attacks on monitor models represent a major blind spot in current control protocols and should become a standard component of evaluations for future AI control mechanisms.
要約:
Large Language Models (LLMs) are increasingly being used to autonomously evaluate the quality of content in communication systems, e.g., to assess responses in telecom customer support chatbots. However, the impartiality of these AI "judges" is not guaranteed, and any biases in their evaluation criteria could skew outcomes and undermine user trust. In this paper, we systematically investigate judgment biases across 6 LLM-as-a-judge models spanning both prompt-based and fine-tuned judges under the pointwise scoring setting, encompassing 11 types of biases that cover both implicit and explicit forms. We observed that state-of-the-art LLM judges demonstrate robustness to biased inputs, generally assigning them lower scores than the corresponding clean samples. We further found that fine-tuning an LLM on high-scoring yet biased responses can significantly degrade its performance, highlighting the risk of training on biased data. We also discovered that the judged scores correlate with task difficulty: a challenging dataset like GPQA yields lower average scores, whereas an open-ended reasoning dataset (e.g., JudgeLM-val) sees higher average scores. Finally, we proposed four potential mitigation strategies to ensure fair and reliable AI judging in practical communication scenarios.
要約:
Shuffling is a powerful way to amplify privacy of a local randomizer in private distributed data analysis. Most existing analyses of how shuffling amplifies privacy are based on the pure local differential privacy (DP) parameter $\varepsilon_0$. This paper raises the question of whether $\varepsilon_0$ adequately captures the privacy amplification. For example, since the Gaussian mechanism does not satisfy pure local DP for any finite $\varepsilon_0$, does it follow that shuffling yields weak amplification? To solve this problem, we revisit the privacy blanket bound of Balle et al. (the blanket divergence) and develop a direct asymptotic analysis that bypasses $\varepsilon_0$. Our key finding is that, asymptotically, the blanket divergence depends on the local mechanism only through a single scalar parameter $\chi$ and that this dependence is monotonic. Therefore, this parameter serves as a proxy for shuffling efficiency, which we call the shuffle index. By applying this analysis to both upper and lower bounds of the shuffled mechanism's privacy profile, we obtain a band for its privacy guarantee through shuffle indices. Furthermore, we derive a simple structural, necessary and sufficient condition on the local randomizer under which this band collapses asymptotically. $k$-RR families with $k\ge3$ satisfy this condition, while for generalized Gaussian mechanisms the condition may not hold but the resulting band remains tight. Finally, we complement the asymptotic theory with an FFT-based algorithm for computing the blanket divergence at finite $n$, which offers rigorously controlled relative error and near-linear running time in $n$, providing a practical numerical analysis for shuffle DP.
要約:
Regression models are widely used in industrial processes, engineering, and in natural and physical sciences, yet their robustness to poisoning has received less attention. When it has, studies often assume unrealistic threat models and are thus less useful in practice. In this paper, we propose a novel optimal stealthy attack formulation that considers different degrees of detectability and show that it bypasses state-of-the-art defenses. We further propose a new methodology based on normalization of objectives to evaluate different trade-offs between effectiveness and detectability. Finally, we develop a novel defense (BayesClean) against stealthy attacks. BayesClean improves on previous defenses when attacks are stealthy and the number of poisoning points is significant.
要約:
Recent advancements in large language models (LLMs) have significantly enhanced the capabilities of collaborative multi-agent systems, enabling them to address complex challenges. However, within these multi-agent systems, the susceptibility of agents to collective cognitive biases remains an underexplored issue. A compelling example is the Mandela effect, a phenomenon where groups collectively misremember past events as a result of false details reinforced through social influence and internalized misinformation. This vulnerability limits our understanding of memory bias in multi-agent systems and raises ethical concerns about the potential spread of misinformation. In this paper, we conduct a comprehensive study on the Mandela effect in LLM-based multi-agent systems, focusing on its existence, causing factors, and mitigation strategies. We propose MANBENCH, a novel benchmark designed to evaluate agent behaviors across four common task types that are susceptible to the Mandela effect, using five interaction protocols that vary in agent roles and memory timescales. We evaluate agents powered by several LLMs on MANBENCH to quantify the Mandela effect and analyze how different factors affect it. Moreover, we propose strategies to mitigate this effect, including prompt-level defenses (e.g., cognitive anchoring and source scrutiny) and model-level alignment-based defense, achieving an average 74.40% reduction in the Mandela effect compared to the baseline. Our findings provide valuable insights for developing more resilient and ethically aligned collaborative multi-agent systems. Code and dataset are available at https://github.com/bluedream02/Mandela-Effect.