要約:
This paper presents C8s, a confidential computing architecture for Kubernetes that provides cryptographically rooted confidentiality, integrity, and verifiability guarantees for Kubernetes clusters from infrastructure operators. These guarantees are cryptographically provable to any independent third party verifier. The architecture is built on hardware Trusted Execution Environments (TEEs), specifically AMD SEV-SNP, Intel TDX, and NVIDIA Confidential Computing support, to establish an attestation-rooted trust boundary around confidential VMs. This design is compatible with managed Kubernetes services such as Amazon EKS, Google GKE, and Microsoft AKS, where the control plane cannot be attested. Under this boundary, three groups gain guarantees that are absent from conventional deployments. Data and artifact owners can deploy sensitive workloads and proprietary artifacts on third-party infrastructure without risking exfiltration. Compute providers can offer execution services without revealing workloads to cloud operators. End users can submit requests that remain opaque to all parties except the attested TEE processing them. Representative workloads include AI inference, securing AI model weights, and training or fine-tuning on sensitive data.
要約:
Autonomous AI agent ecosystems require stronger mechanisms for secure discovery, identity verification, capability attestation, and policy governance. Current deployments frequently lack (1) uniform agent discovery, (2) cryptographic agent authentication, (3) capability proofs that protect secrets, and (4) enforceable policy controls. This paper presents an implementation-oriented proof of concept for the Agent Name Service (ANS), a DNS-inspired trust layer for AI agent discovery and interoperability in Kubernetes, grounded in the ANS protocol specification~\cite{huang2025ans}. The implementation uses Decentralized Identifiers (DIDs), Verifiable Credentials (VCs), policy-as-code enforcement with Open Policy Agent (OPA), and Kubernetes-native integration patterns (CRDs, admission controls, service mesh integration). In a demo research environment (3-node cluster, 50-agent workflow simulation), we observe sub-10ms response in demonstrated service paths and full success for scripted demo deployment scenarios. We explicitly scope these findings as proof-of-concept evidence rather than production certification. We further provide a threat model, assumptions, and limitations to separate implemented evidence from protocol-defined and roadmap capabilities. The result is an evidence-grounded pathway from ANS protocol concepts to reproducible engineering practice for secure multi-agent systems.
要約:
Developers and organizations are using Large Language Models (LLMs) to generate security-critical code more frequently than ever, including cryptographic solutions for their products. This study presents an empirical evaluation of cryptographic security in 240 Rust code samples for two crypto algorithms (AES-256-GCM and ChaCha20-Poly1305) generated by three LLMs (Gemini 2.5 Pro, GPT-4o, and DeepSeek Coder) using four different prompt strategies. For each successfully compiled code sample, CodeQL static analysis and our rule-based crypto-specific analyzer were used to detect vulnerabilities, which are also associated with Common Weakness Enumeration (CWE). The evaluation results revealed that only 23.3% of the generated code samples were successfully compiled. Among the compiled code, CodeQL produced only two false positives, while our rule-based crypto-specific analyzer identified vulnerabilities in 57% of the compiled samples with zero false positives. This demonstrates that general-purpose analysis tools are insufficient for code validation for the experimented crypto algorithms. The compilation success of the two algorithms varied significantly (AES-256-GCM 34.2% versus ChaCha20-Poly1305 12.5%), showing a gap in code generation capabilities. While model choice had no significant effect on compilation success, prompt strategy significantly influenced outcomes (P = 0.002), with chain-of-thought prompting performing 5 times worse than zero-shot. All three models exhibit systematic failures, including nonce reuse and API hallucinations.
要約:
Video large language models (VideoLLMs) are increasingly trained or instruction-tuned on large-scale video--text corpora collected from heterogeneous sources, raising an immediate privacy question: can an external auditor determine whether a particular video was used during training? While membership inference attacks (MIAs) have been studied extensively for classifiers and, more recently, for text and image generation models, the VideoLLM setting remains unexplored. This setting is challenging because black-box auditors observe only generated text, whereas the membership signal is entangled with video-specific factors such as motion complexity and temporal span. In this paper, we present a black-box MIA targeting VideoLLMs that couples temperature-perturbed generation with video-aware difficulty features. Our key intuition is that member samples tend to induce sharper, more brittle generation behavior across decoding temperatures, and that this signal should be interpreted jointly with the intrinsic difficulty of the queried video. Concretely, we query the target model at low and high temperatures, measure the semantic drift between the resulting texts. We evaluate the attack against \texttt{LLaVA-Video-7B-Qwen2-Video-Only} and achieve a member inference AUC of 0.68 and accuracy of 0.63. These results demonstrate that Video-LLMs are vulnerable to black-box membership inference attacks, highlighting an urgent need for the community to systematically evaluate and mitigate privacy risks in VideoLLMs.
要約:
Recent research has demonstrated the potential of Large Language Models (LLMs) for autonomous penetration testing, particularly when using cloud-based restricted-weight models. However, reliance on such models introduces security, privacy, and sovereignty concerns, motivating the use of locally hosted open-weight alternatives. Prior work shows that small open-weight models perform poorly on automated Linux privilege escalation, limiting their practical applicability.
In this paper, we present a systematic empirical study of whether targeted system-level and prompting interventions can bridge this performance gap. We analyze failure modes of open-weight models in autonomous privilege escalation, map them to established enhancement techniques, and evaluate five concrete interventions (chain-of-thought prompting, retrieval-augmented generation, structured prompts, history compression, and reflective analysis) implemented as extensions to hackingBuddyGPT.
Our results show that open-weight models can match or outperform cloud-based baselines such as GPT-4o. With our treatments enabled, Llama3.1 70B exploits 83% of tested vulnerabilities, while smaller models including Llama3.1 8B and Qwen2.5 7B achieve 67% when using guidance. A full-factorial ablation study over all treatment combinations reveals that reflection-based treatments contribute most, while also identifying vulnerability discovery as a remaining bottleneck for local models.
要約:
As LLMs are increasingly integrated into systems that browse, retrieve, summarize, and act on web content, webpages have become an untrusted input vector for downstream model behavior. This enables site owners, contributors, and adversaries to embed instructions directly in web resources, i.e., indirect prompt injections. While prior work demonstrates such attacks in controlled settings, their prevalence, deployment, and real-world impact remain unclear.
We present one of the first large-scale empirical analyses of indirect prompt injections in webpages and HTTP responses. Analyzing 1.2B URLs from 24.8M hosts, we identify 15.3K validated instances across 11.7K pages. These are not isolated cases: a small number of recurring templates account for most cases. We characterize their objectives, delivery mechanisms, visibility, persistence, and impact, revealing a heterogeneous ecosystem spanning disruptive prompts, reputation manipulation, content-protection directives, and AI-bot detection, targeting systems such as crawlers, search pipelines, customer-support agents, and hiring workflows. A key finding is that most instructions target machines rather than humans: about 70% appear in non-rendered HTML (e.g., headers, comments, metadata), and many visible cases are hidden via rendering techniques. To assess practical risk, we run 5,200 controlled experiments across 13 models and four webpage representations. Our results show compliance is limited but non-negligible, reaching up to 8% for smaller models on plain-text inputs, while structured representations reduce compliance by preserving structural cues. Overall, prompt-based interference is already present in the web ecosystem and represents a growing source of tension between LLM-driven automation and the sites it consumes.
要約:
As large language models (LLMs) are increasingly fine-tuned for hardware tasks like RTL code generation, the scarcity of high-quality datasets often leads to the use of rapidly assembled or generated training data. These datasets frequently lack security verification and are highly susceptible to data poisoning attacks. Such poisoning can cause models to generate syntactically valid but insecure hardware modules that bypass standard functionality checks. To address this, we present SafeTune, a framework designed to harden LLM-based RTL generation against poisoning, specifically focusing on hardware Trojan (HT) insertion. SafeTune integrates two core components: (i) a Graph Neural Network (GNN) that models structural properties to identify anomalous circuitry patterns during fine-tuning, and (ii) a semantic verification module using text embeddings and an XGBoost classifier to assess prompt security. By coupling structural and semantic knowledge, SafeTune effectively filters poisoned inputs without sacrificing legitimate data. Experimental results demonstrate that SafeTune significantly enhances the robustness and reliability of LLM fine-tuning without requiring modifications to the underlying model architecture.
要約:
As large language models are integrated into autonomous robotic systems for task planning and control, compromised inputs or unsafe model outputs can propagate through the planning pipeline to physical-world consequences. Although prior work has studied robotic cybersecurity, adversarial perception attacks, and LLM safety independently, no existing study traces how these threat categories interact and propagate across trust boundaries in a unified architectural model. We address this gap by modeling an LLM-enabled autonomous robot in an edge-cloud architecture as a hierarchical Data Flow Diagram and applying STRIDE-per-interaction analysis across six boundary-crossing interaction points using a three-category taxonomy of Conventional Cyber Threats, Adversarial Threats, and Conversational Threats. The analysis reveals that these categories converge at the same boundary crossings, and we trace three cross-boundary attack chains from external entry points to unsafe physical actuation, each exposing a distinct architectural property: the absence of independent semantic validation between user input and actuator dispatch, cross-modal translation from visual perception to language-model instruction, and unmediated boundary crossing through provider-side tool use. To our knowledge, this is the first DFD-based threat analysis integrating all three threat categories across the full perception-planning-actuation pipeline of an LLM-enabled robotic system.
要約:
Android residential proxy applications represent a growing class of potentially-unwanted programs (PUPs) that covertly route third-party traffic through end-user devices, enabling ad fraud, credential abuse, and evasion of geolocation controls by sophisticated threat actors. Attributing an unknown APK to a specific proxy network remains challenging due to code reuse, SDK embedding, and obfuscation across proxy families.
We present a static-analysis pipeline for automated proxyware family attribution, extracting graph-structured representations (control-flow and function-call graphs) and behavioral signatures from a labeled corpus of 3,365 Android proxy apps spanning four commercial proxy networks. We evaluate Weisfeiler-Lehman graph kernel features alone and fused with binary capability vectors across multiple classifiers. Using 5-fold DEX-grouped cross-validation to prevent data leakage, SGD achieves a macro F1 of 0.985 on the expanded dataset. To support explainability, we map classifier decisions to automatically generated Yara rules, achieving per-family accuracies up to 88.45\% after filtering non-discriminative signatures.
Finally, we discuss these results in the context of the broader ecosystem. We find that from the expanded dataset, the majority of applications (51.4\%) still available through APKPure still contain embedded proxy SDK code. Further analysis of developer accounts reveals that 23 developers are responsible for other applications also containing such functionality, suggesting continuous and ongoing commercial relationships between proxy providers and developers.
要約:
Large Language Models (LLMs) have achieved remarkable progress in recent years, driving their adoption across a wide range of domains, including computer security. In reverse engineering, LLMs are increasingly applied to critical tasks such as function and variable name recovery and type inference. However, despite the rapid growth of research in this area, progress has been hindered by the absence of a standardized dataset. Existing studies rely on disparate datasets, preprocessing pipelines, and evaluation metrics, making fair comparisons between approaches difficult and obscuring a clear understanding of LLM capabilities in binary analysis. To address these challenges, we present REBench, a comprehensive benchmark dataset for evaluating LLMs on binary reverse engineering tasks. REBench consolidates a superset of existing datasets, comprising hundreds of millions of lines of source code and a diverse collection of binaries spanning multiple architectures and optimization levels. REBench adopts a knowledge-base-driven methodology that stores byte-level stack information to generate ground truth, ensuring that task difficulty is preserved while maintaining universal applicability. This design enables fair evaluation across tasks while avoiding simplifications that could bias results. As a use case, we apply REBench to measure the reverse engineering performance of LLMs and the result demonstrates difficulties in complex tasks.
要約:
Security Operations Centers (SOCs) face mounting operational challenges. These challenges come from increasing threat volumes, heterogeneous SIEM platforms, and time-consuming manual triage workflows. We present an end-to-end threat management framework that integrates ensemble-based detection, syntax-constrained query generation, and retrieval-augmented resolution support to automate critical security workflows. Our detection module evaluates both traditional machine learning classifiers and large language models (LLMs), then combines the three best-performing LLMs to create an ensemble model, achieving 82.8% accuracy while maintaining 0.120 false positive rate on SIEM logs. We introduce the SQM (Syntax Query Metadata) architecture for automated evidence collection. It uses platform-specific syntax constraints, metadata-based retrieval, and documentation-grounded prompting to generate executable queries for IBM QRadar and Google SecOps. SQM achieves a BLEU score of 0.384 and a ROUGE-L score of 0.731. These results are more than twice as good as the baseline LLM performance. For incident resolution and recommendation generation, we demonstrate that integrating SQM-derived evidence improves resolution code prediction accuracy from 78.3% to 90.0%, with an overall recommendation quality score of 8.70. In production SOC environments, our framework reduces average incident triage time from hours to under 10 minutes. This work demonstrates that domain-constrained LLM architectures with retrieval augmentation can meet the strict reliability and efficiency requirements of operational security environments at scale.
要約:
Local fine-tuning datasets routinely contain sensitive secrets such as API keys, personal identifiers, and financial records. Although ''local offline fine-tuning'' is often viewed as a privacy boundary, we reveal that compromised model code is sufficient to steal them. Current passive pretrained-weight poisoning attacks, while effective for natural language, fundamentally fail to capture such sparse high-entropy targets due to their reliance on probabilistic semantic prefixes. To bridge this gap, we identify and exploit a practical but overlooked supply-chain vector -- model code camouflaged as standard architectural definitions -- to realize a paradigm shift from passive weight poisoning to active execution hijacking. We introduce a deterministic full-chain memorization mechanism: it locks onto token-level secrets in dynamic computation flows via online tensor-rule matching, and leverages value-gradient decoupling to stealthily inject attack gradients, overcoming gradient drowning to force model memorization. Furthermore, we achieve, for the first time, attacker-verifiable secret stealing through black-box queries that precisely distinguishes true leakage from hallucination. Experiments demonstrate that our method achieves over 98\% Strict ASR without compromising the primary task, and can effectively bypass defense measures including DP-SGD, semantic auditing, and code auditing.
要約:
AI chatbots are becoming a primary interface for seeking information. As their popularity grows, chatbot providers are starting to deploy advertising and analytics. Despite this, tracking on AI chatbots has not been systematically studied. We present a systematic measurement of web tracking on 20 popular AI chatbots. Under controlled settings using a sensitive prompt, we capture and compare network traffic in normal chats and, where supported, private chats. We search for exposure of two categories of information: content, including prompts, prompt-derived titles, chat URLs, and chat identifiers; and identity, including names, emails, account identifiers, first-party cookies, and explicit IP/User-Agent fields in payloads. We find that 17 of 20 chatbots share information with at least one third party. Three chatbots share plaintext conversation text, including both prompt and response snippets, with Microsoft Clarity through session replay. Fifteen chatbots share conversation URLs or chat identifiers with third-party advertising, analytics, or social endpoints. Several chatbots expose user identity through support widgets, analytics, advertising, and session replay tags; in some cases, hashed emails are shared.
要約:
Access to genomic data is highly regulated due to its sensitive nature. While safeguards are essential, cumbersome data access processes pose a significant barrier to the development of AI methods for genomics. Synthetic data generation can mitigate this tension by enabling broader data sharing without exposing sensitive information. Synthetic genomic data are produced by training generative models on real data and subsequently sampling artificial data that preserves relevant statistics while limiting disclosures about the underlying individuals. In some settings, a single data holder may have sufficient data to train such generative models; however, in many applications data must be combined across multiple sites to achieve adequate scale. This need arises, e.g., in rare disease studies, where individual hospitals typically hold data for only a small number of patients. The solution we present in this paper enables multiple data holders to jointly train a synthetic data generator without revealing their raw data. Our approach combines secure multiparty computation (MPC) to ensure input privacy, so that no party ever discloses its data in unencrypted form, with differential privacy (DP) to provide output privacy by mitigating information leakage from the released synthetic data. We empirically demonstrate the effectiveness of the proposed method by generating high-utility synthetic datasets from multiple real RNA-seq cohorts in federated settings, showing that our approach enables privacy-preserving data synthesis even when data are distributed across institutions.
要約:
Autonomous agent frameworks built upon large language models (LLMs) are evolving into complex, tool-integrated, and continuously operating systems, introducing security risks beyond traditional prompt-level vulnerabilities. As this paradigm is still at an early stage of development, a timely and systematic understanding of its security implications is increasingly important. Although a growing body of work has examined different attack surfaces and defense problems in agent systems, existing studies remain scattered across individual aspects of agent security, and there is still a lack of a layered review on this topic. To address this gap, this survey presents a layered review of security risks and defense strategies in autonomous agent frameworks, with OpenClaw as a case study. We organize the analysis into four security-relevant layers: the context and instruction layer, the tool and action layer, the state and persistence layer, and the ecosystem and automation layer. For each layer, we summarize its functional role, representative security risks, and corresponding defense strategies. Based on this layered analysis, we further identify that threats in autonomous agent frameworks may propagate across layers, from manipulated inputs to unsafe actions, persistent state contamination, and broader ecosystem-level impact. Finally, we highlight potential key challenges, including research imbalance across layers, the lack of long-horizon evaluation, and weak ecosystem trust models, and outline future directions toward more systematic and integrated defenses.
要約:
As web browsers increasingly restrict client-side tracking, the web tracking ecosystem is shifting from client-side to server-side tracking (SST). In SST, the browser sends tracking requests to an intermediate endpoint, which then forwards them to the tracker's endpoint, eliminating direct client-to-tracker requests. As a result, existing tracking protections that block requests to known tracker endpoints are rendered ineffective.
In this paper, we investigate server-side implementation of Google Analytics, the most widely deployed third-party tracking service on the web today. We also present SST-Guard, a multi-modal, browser-based system for detecting and blocking server-side Google Analytics (sGA). Our key insight is that even when the tracker's endpoints change, sGA must necessarily still collect and share the same semantic information as client-side Google Analytics (e.g., identifiers, event metadata). Therefore, rather than detecting requests to known Google Analytics endpoints, SST-Guard aims to detect underlying artifacts of collection and sharing of these semantic values to any arbitrary endpoint. Operationalizing this insight is challenging because real-world sGA deployments commonly customize endpoints and obfuscate URLs/payloads. SST-Guard addresses this challenge using a value-template approach that employs regular expressions to match semantic value patterns across multiple modalities: network requests, cookies, and the window object.
We validate SST-Guard on Tranco top-10k websites, detecting 4.02\% (403) sGA domains with over 93\% accuracy across three modalities, with network request classifier demonstrating the highest accuracy (99.8\%). By deploying SST-Guard in the wild, we find 4.21\% (6,314) of Tranco top-150k websites using sGA.
要約:
Boolean circuits form the foundational computational substrate of symmetric cryptography, yet the exploration of their architectural design space has remained largely confined to a handful of canonical paradigms - SPN, Feistel networks, and their immediate variants. This paper takes a deliberately broader perspective by formalizing the design space of cryptographic Boolean systems through six independent binary structural constraints: Stratification, Acyclicity, Regularity, Interleaving, Homogeneity, and Locality. These constraints generate a hypercube of $2^6 = 64$ distinct architectural classes defined over Synchronous Boolean Networks, a general model that subsumes both acyclic combinational circuits and recurrent synchronous systems. We systematically evaluate all 64 classes against three generic cryptanalytic fitness objectives - differential, linear and algebraic resistance - using a five-stage methodology centered on Formal Concept Analysis. The results reveal that the best Boolean networks are governed by the identification of sparse, mutually compatible combinations of constraints - a fundamentally epistatic problem that classical cryptography has barely addressed.
著者: Dawei Huang (Beijing University of Posts,Telecommunications), Hui Li (Beijing University of Posts,Telecommunications), Haonan Feng (Beijing University of Posts,Telecommunications), Jingjing Guan (Beijing University of Posts,Telecommunications), Yueshuang Jiao (Beijing University of Posts,Telecommunications), Bo Jia (Beijing University of Posts,Telecommunications)
要約:
Formal verification provides rigorous guarantees for cryptographic security, yet automating the extraction and formalization of security goals from natural language protocol documents remains a major bottleneck, compounded by the scarcity of expert-annotated resources and integrated frameworks bridging unstructured text and symbolic logic. We introduce SecGoal, the first expert-annotated benchmark covering 15 widely deployed protocol documents, including 5G-AKA and TLS 1.3, and AIFG, an AI-assisted framework that decomposes the task into context-aware goal extraction and retrieval-augmented formalization. We conduct a comprehensive evaluation to assess whether contemporary LLMs are ready to automate this pipeline. Our results reveal a pronounced precision-recall imbalance: frontier models, such as Gemini 2.5-Pro, achieve high recall but precision below 15%, frequently misclassifying operational text as security goals. In contrast, instruction tuning on SecGoal enables compact models with 7B/9B parameters to achieve F1-scores above 80%, substantially outperforming larger general-purpose models. Our work establishes a foundational dataset and reproducible baseline for automated formal protocol analysis.
要約:
Large Language Model (LLM) watermarking is crucial for establishing the provenance of machine-generated text, but most existing methods rely on a centralized trust model. This model forces users to reveal potentially sensitive text to a provider for detection and offers no way to verify the integrity of the result. While asymmetric schemes have been proposed to address these issues, they are either impractical for short texts or lack formal guarantees linking watermark insertion and detection. We propose VOW, a new protocol that achieves both privacy-preserving and cryptographically verifiable watermark detection with high efficiency. Our approach formulates detection as a secure two-party computation problem, instantiating the watermark's core logic with a Verifiable Oblivious Pseudorandom Function (VOPRF). This allows the user and provider to perform detection without the user's text being revealed, while the provider's result is verifiable. Our comprehensive evaluation shows that VOW is practical for short texts and provides a crucial reassessment of watermark robustness against modern paraphrasing attacks.
要約:
How code representation format shapes false positive behaviour in cross-language LLM vulnerability detection remains poorly understood. We systematically vary training intensity and code representation format, comparing raw source text with pruned Abstract Syntax Trees at both training time and inference time, across two 8B-parameter LLMs (Qwen3-8B and Llama 3.1-8B-Instruct) fine-tuned on C/C++ data from the NIST Juliet Test Suite (v1.3) and evaluated on Java (OWASP Benchmark v1.2) and Python (BenchmarkPython v0.1).
Cross-language FPR reflects the joint effect of training-time and inference-time representation, not either alone. Text fine-tuning drives FPR upward monotonically (Qwen3-8B: 0.763 zero-shot, 0.866 pilot, 1.000 full-scale) while F1 remains stable (0.637-0.688), masking the collapse. We argue surface-cue memorisation is the primary mechanism: text fine-tuning encodes C/C++-specific API names and syntactic idioms as vulnerability triggers that fire indiscriminately on target-language code. A cross-representation probe, applying text-trained weights to AST-encoded input without retraining, isolates this: Qwen3-8B FPR drops from 0.866 to 0.583, and 37.2% of false positives revert to true negatives under AST input alone. Direct AST fine-tuning does not preserve the benefit (FPR at least 0.970), as flat linearisation introduces structural surface cues of its own. The pattern replicates across both model families. On BenchmarkPython the AST probe yields FPR=0.554, within 2.9 percentage points of the Java result, despite maximal surface-syntax differences, substantially weakening a domain-shift explanation. These findings motivate a pre-deployment consistency gate, running alerts through both text and AST paths, as a retraining-free filter for false-positive-sensitive settings, at the cost of reduced recall.
要約:
Lattice reduction smooths the Gram-Schmidt profile, and we use majorization to describe the local swap mechanism behind that smoothing. In this language, each non-degenerate Lov\'asz swap acts as a T-transform on the log-norm profile. As a consequence, every strictly Schur-convex measure of profile spread decreases at such a swap. Two structural consequences follow. First, the worst-case GSA envelope admits a variational interpretation. It is the unique minimum-variance profile compatible with the Lov\'asz gap geometry, so its slope is determined by the LLL parameter alone. Second, the realized swap trajectory satisfies an exact telescoping identity for variance dissipation. The same viewpoint also helps organize deep-insertion heuristics. It suggests a thermal family of Schur-convex scoring rules, motivates adaptive selection within that family, and leads to two concrete selectors: Thermal-Adaptive, which reduces operation counts relative to SS-GG on flat profiles in our benchmarks while recovering SS-GG on $q$-ary inputs, and Geodesic Deep-LLL, which reduces equivalent-swap counts on structured lattices in our benchmarks at higher wall-clock cost.
要約:
Mixture-of-Experts (MoE) architectures in Large Language Models (LLMs) have significantly reduced inference costs through sparse activation. However, this sparse activation paradigm also introduces new safety challenges. Since only a subset of experts is engaged for each input, model behavior becomes coupled to routing decisions, yielding a difficult-to-control mechanism that can vary across safety-relevant scenarios. At the same time, adapting model behavior through full fine-tuning or retraining is costly, especially when developers need to rapidly configure the same model for different safety objectives. We present MASCing (MoE Activation Steering Configuration), the first framework that enables flexible reconfiguration of MoE behavior across diverse safety scenarios without retraining. MASCing uses an LSTM-based surrogate model to capture cross-layer routing dependencies and map routing logits to downstream behaviors. It then optimizes a steering matrix to identify behavior-relevant expert circuits and, at inference time, applies steering masks to the routing gates to override expert selection. This enables targeted enhancement or suppression of specific behaviors while preserving general language utility. To demonstrate its reconfigurability, we apply MASCing to two different safety-related objectives and observe consistent gains with negligible overhead across seven open-source MoE models. For multi-turn jailbreak defense, it improves the average defense success rate from 52.5% to 83.9%, with gains of up to 89.2%. For adult-content generation, MASCing enables models to comply with such requests that would otherwise be refused, increasing the average generation success rate from 52.6% to 82.0%, with gains of up to 93.0%. These results establish MASCing as a practical, lightweight, and flexible framework for scenario-specific safety reconfiguration in MoE models.
著者: Simon Althaus (Technical University of Darmstadt), Nikolaos Alexopoulos (Athens University of Economics and Business), Max M\"uhlh\"auser (Technical University of Darmstadt), Christian Reuter (Technical University of Darmstadt), Ephraim Zimmer (Technical University of Darmstadt)
要約:
System auditing on Android faces two problems. First, existing syscall tracers lose events under load, silently overwriting entries faster than a user space reader can drain them. Second, security-relevant application behavior is mediated through Binder, Android's kernel IPC mechanism, and is therefore hidden from the syscall layer. The Binder parcels that the kernel does see carry no method names or typed arguments, a disconnect between low-level events and high-level behavior known as the semantic gap. Existing approaches address the semantic gap either by modifying the Android platform, making them difficult to adjust to OS updates, or by instrumenting the traced application in user space, which sophisticated adversaries can evade by bypassing the instrumented framework APIs.
We present WOOTdroid, a design and prototype for on-device tracing on stock Android that addresses both problems without OS modification or application instrumentation. WDSys, an eBPF port of eAudit-style syscall auditing, runs on current Android with at most 3.6% Geekbench overhead and traces 33% more syscalls than ftrace. WDBind captures Binder parcels in the kernel and decodes them out-of-process against a framework signature table extracted via Java reflection. We demonstrate WOOTdroid on Pixel 9 devices running Android 16 with an end-to-end case study reconstructing ten security-relevant Binder transactions.
要約:
Decompositional jailbreaks pose a critical threat to large language models (LLMs) by allowing adversaries to fragment a malicious objective into a sequence of individually benign queries that collectively reconstruct prohibited content. In real-world deployments, LLMs face a continuous, untraceable stream of fully anonymized and arbitrarily interleaved requests, infiltrated by covertly distributed adversarial queries. Under this rigorous threat model, state-of-the-art defensive strategies exhibit fundamental limitations. In the absence of trustworthy user metadata, they are incapable of tracking global historical contexts, while their deployment of generative models for real-time monitoring introduces computationally prohibitive overhead. To address this, we present TwinGate, a stateful dual-encoder defense framework. TwinGate employs Asymmetric Contrastive Learning (ACL) to cluster semantically disparate but intent-matched malicious fragments in a shared latent space, while a parallel frozen encoder suppresses false positives arising from benign topical overlap. Each request requires only a single lightweight forward pass, enabling the defense to execute in parallel with the target model's prefill phase at negligible latency overhead. To evaluate our approach and advance future research, we construct a comprehensive dataset of over 3.62 million instructions spanning 8,600 distinct malicious intents. Evaluated on this large-scale corpus under a strictly causal protocol, TwinGate achieves high malicious intent recall at a remarkably low false positive rate while remaining highly robust against adaptive attacks. Furthermore, our proposal substantially outperforms stateful and stateless baselines, delivering superior throughput and reduced latency.
要約:
Multi-turn prompt injection follows a known attack path -- trust-building, pivoting, escalation but text-level defenses miss covert attacks where individual turns appear benign. We show this attack path leaves an activation-level signature in the model's residual stream: each phase shift moves the activation, producing a total path length far exceeding benign conversations. We call this adversarial restlessness. Five scalar trajectory features capturing this signal lift conversation-level detection from 76.2% to 93.8% on synthetic held-out data. The signal replicates across four model families (24B-70B); probes are model-specific and do not transfer across architectures. Generalization is source-dependent: leave-one-source-out evaluation shows each of synthetic, LMSYS-Chat-1M, and SafeDialBench captures distinct attack distributions, with detection on real-world LMSYS reaching 47-71% when its distribution is represented in training. Combined three-source training achieves 89.4% detection at 2.4% false positive rate on a held-out mixed set. We further show that three-phase turn-level labels(benign/pivoting/adversarial) unique to our synthetic dataset are essential: binary conversation-level labels produce 50-59% false positives. These results establish adversarial restlessness as a reliable activation-level signal and characterize the data requirements for practical deployment.
要約:
Long-context large language models (LLMs)-for example, Gemini-3.1-Pro and Qwen-3.5-are widely used to empower many real-world applications, such as retrieval-augmented generation, autonomous agents, and AI assistants. However, security remains a major concern for their widespread deployment, with threats such as prompt injection and knowledge corruption. To quantify the security risks faced by LLMs under these threats, the research community has developed heuristic-based and optimization-based red-teaming methods. Optimization-based methods generally produce stronger attacks than heuristic attacks and thus provide a more rigorous assessment of LLM security risks. However, they are often resource-intensive, requiring significant computation and GPU memory, especially for long context scenarios. The resource-intensive nature poses a major obstacle for the community (especially academic researchers) to systematically evaluate the security risks of long-context LLMs and assess the effectiveness of defense strategies at scale. In this work, we propose FlashRT, the first framework to improve the efficiency (in terms of both computation and memory) for optimization-based prompt injection and knowledge corruption attacks under long-context LLMs. Through extensive evaluations, we find that FlashRT consistently delivers a 2x-7x speedup (e.g., reducing runtime from one hour to less than ten minutes) and a 2x-4x reduction in GPU memory consumption (e.g., reducing from 264.1 GB to 65.7 GB GPU memory for a 32K token context) compared to state-of-the-art baseline nanoGCG. FlashRT can be broadly applied to black-box optimization methods, such as TAP and AutoDAN. We hope FlashRT can serve as a red-teaming tool to enable systematic evaluation of long-context LLM security. The code is available at: https://github.com/Wang-Yanting/FlashRT
要約:
Modern software systems are increasingly developed within rapid continuous integration and deployment (CI/CD) pipelines, where ensuring security prior to release presents significant technical and organizational challenges. Traditional static and dynamic analysis tools provide valuable structural and behavioral insights, yet they often operate in non-adaptive workflows and produce large volumes of warnings requiring manual triage. Feedback-driven fuzzing and search-based testing approaches have demonstrated the power of iterative input refinement guided by execution signals, while large language models (LLMs) have shown promise in automated test generation but frequently lack semantic grounding in program structure. This paper presents a systematic survey of adaptive and AI-augmented security testing research across five domains: (1) structural program analysis for vulnerability detection, (2) DevSecOps and continuous security testing, (3) feedback-driven fuzzing and search-based testing, (4) LLM-based automated test generation, and (5) emerging hybrid systems integrating program analysis with adaptive learning. We analyze fifty-five peer-reviewed studies drawn from a systematic search of four major databases yielding 22,088 raw records. Our analysis reveals a persistent disconnect between structural program representations (ASTs, CFGs, and CPGs) and adaptive testing mechanisms. We characterize this as structural-adaptive fragmentation: a systematic separation that neither paradigm individually addresses. No existing system incorporates human triage signals as feedback for refining structural models. We conclude by identifying five open research challenges and outlining a unified agenda for semantically grounded, feedback-driven, polyglot security testing frameworks.
要約:
The scarcity of high-quality annotated medical data, particularly in mental health, poses a significant bottleneck for training robust machine learning models. Privacy regulations restrict data sharing, making synthetic data generation a promising alternative. The use of Large Language Models (LLMs) in a data augmentation pipeline could be leveraged as an alternative in this field. In the proposed methodology, DeepSeek-R1, OpenBioLLM-Llama3 and Qwen 3.5 are used to generate synthetic mental health evaluation reports conditioned on specific International Classification of Diseases, Tenth Revision (ICD-10) codes. Because naive text generation can lead to mode collapse or privacy breaches (memorization), a comprehensive evaluation framework is introduced. The generated diagnostic texts are assessed across three dimensions: semantic fidelity, lexical diversity, and privacy/plagiarism. The results demonstrate that all models can generate clinically coherent, diverse, and privacy-safe synthetic reports, significantly expanding the available training data for clinical natural language processing tasks without compromising patient confidentiality.
要約:
Safety-aligned language models must refuse harmful requests without collapsing into broad over-refusal, but the training-time mechanisms behind this tradeoff remain unclear. Prior work characterizes refusal directions and jailbreak robustness, yet does not explain how dynamic adversarial fine-tuning changes refusal carriers across training. We present a measurement-driven mechanism study, not a new defense, on one 7B backbone under supervised fine-tuning (SFT) and R2D2-style dynamic adversarial fine-tuning. Our protocol aligns fixed-source HarmBench, StrongREJECT, and XSTest with a five-anchor refusal-geometry suite and causal interventions. R2D2 drives fixed-source HarmBench ASR to 0.000 at steps 50 and 100, then partially reopens to 0.035 at step 250 and 0.250 at step 500; SFT remains less robust, with ASR between 0.505 and 0.588 at the same anchors. On XSTest, R2D2 any-refusal is 1.000 early, then falls to 0.664 and 0.228. Geometrically, R2D2 preserves a late-layer admissible carrier through step 100 before relocating to an early-layer carrier, while effective rank remains near 1.23--1.27. Causal interventions indicate low-dimensional but utility-coupled control. These results support a reorganization account rather than a drift-only account, with evidence limited to one backbone and fixed-source attacks.
要約:
While current network intrusion detection systems achieve satisfactory accuracy, they often lack explainability. Subgroup Discovery (SD) addresses this by building interpretable rules that characterize feature interactions associated with attack traffic. With large datasets, classical heuristic beam search methods struggle with exponentially scaling search spaces and can prune critical multi-feature interactions. This paper introduces a quantum-enhanced pipeline for SD applied to network intrusion detection using NSL-KDD, formulating SD as quantum optimization for the first time. By encoding feature selection as a Quadratic Unconstrained Binary Optimization (QUBO) and solving it via the Quantum Approximate Optimization Algorithm (QAOA) on IBM Quantum hardware (ibm_pittsburgh), the pipeline identifies subgroups of network features that discriminate normal from attack traffic. A least-squares regression QUBO formulation fits the Weighted Relative Accuracy (WRAcc) landscape over feature subsets, with surrogate sampling for larger QUBOs. Results are benchmarked against exhaustive enumeration and Beam Search using ratios for Hamiltonian quality and WRAcc. Hardware scaling experiments on ibm_pittsburgh (10-30 qubits) reveal that QAOA at depth p = 1 shows WRAcc ratios of 0.983 at 10 qubits, 0.971 at 15 qubits, 0.855 at 20 qubits, and 0.624 at 25 qubits, degrading to 0.039 at 30 qubits as circuit noise dominates, establishing an empirical NISQ scaling boundary. Results demonstrate that QAOA discovers subgroups competitive with classical heuristics and finds multi-feature interaction patterns that greedy Beam Search prunes, with QAOA-unique subgroups achieving up to 99.6% test precision. This work establishes a framework for quantum combinatorial optimization in cybersecurity and characterizes hardware scaling for NISQ devices.
要約:
Quantum secret sharing schemes are a family of quantum cryptographic protocols which provide secure quantum encodings, mapping one secret to multiple shares of information such that the original secret cannot be accessed without an authorized set of shares present for decoding. In this work, we describe a protocol that enables sender-anonymity during the secret decoding process. By using permutation-invariant QEC codes along with a set of anonymous quantum transmission algorithms, we construct a quantum anonymous secret sharing scheme that achieves sender-anonymity. We quantify information leakage in ramp quantum secret sharing schemes via the quantum conditional min-entropy, justifying it as a valid measure of leaked information by relating it to the Knill-Laflamme quantum error correction conditions. Finally, we evaluate several permutation-invariant codes using this measure to make observations on the information leakage of intermediate shares for each quantum anonymous secret sharing scheme.
要約:
Vision-language models (VLMs) are increasingly used in autonomous driving because they combine visual perception with language-based reasoning, supporting more interpretable decision-making, yet their robustness to physical adversarial attacks, especially whether such attacks transfer across different VLM architectures, is not well understood and poses a practical risk when attackers do not know which model a vehicle uses. We address this gap with a systematic cross-architecture study of adversarial transferability in VLM-based driving, evaluating three representative architectures (Dolphins, OmniDrive, and LeapVAD) using physically realizable patches placed on roadside infrastructure in both crosswalk and highway scenarios. Our transfer-matrix evaluation shows high cross-architecture effectiveness, with transfer rates of 73-91% (mean TR = 0.815 for crosswalk and 0.833 for highway) and sustained frame-level manipulation over 64.7-79.4% of the critical decision window even when patches are not optimized for the target model.
要約:
Federated learning (FL) is a popular distributed learning paradigm in machine learning, which enables multiple clients to collaboratively train models under the guidance of a server without exposing private client data. However, FL's decentralized nature makes it vulnerable to poisoning attacks, where malicious clients can submit corrupted models to manipulate the system. To counter such attacks, although various Byzantine-robust methods have been proposed, these methods struggle to provide balanced defense against multiple types of attacks or rely on possessing the dataset in the server. To deal with these drawbacks, thus, we propose an effective multi-layer defensive adaptive aggregation for Bzantine-robust federated learning (AdaBFL) based on a novel three-layer defensive mechanism, which can adaptively adjust the weights of defense algorithms to counter complex attacks. Moreover, we provide convergence properties of our AdaBFL method under the non-convex setting on non-iid data. Comprehensive experiments across multiple datasets validate the superiority of our AdaBFL over the comparable algorithms.
要約:
Low-Rank Adaptation (LoRA), which leverages the insight that model updates typically reside in a low-dimensional space, has significantly improved the training efficiency of Large Language Models (LLMs) by updating neural network layers using low-rank matrices. Since the generation of adversarial examples is an optimization process analogous to model training, this naturally raises the question: Do adversarial perturbations exhibit a similar low-rank structure?
In this paper, we provide both theoretical analysis and extensive empirical investigation across various attack methods, model architectures, and datasets to show that adversarial perturbations indeed possess an inherently low-rank structure. This insight opens up new opportunities for improving both adversarial attacks and defenses. We mainly focus on leveraging this low-rank property to improve the efficiency and effectiveness of black-box adversarial attacks, which often suffer from excessive query requirements. Our method follows a two-step approach. First, we use a reference model and auxiliary data to guide the projection of gradients into a low-dimensional subspace. Next, we confine the perturbation search in black-box attacks to this low-rank subspace, significantly improving the efficiency and effectiveness of the adversarial attacks.
We evaluated our approach across a range of attack methods, benchmark models, datasets, and threat models. The results demonstrate substantial and consistent improvements in the performance of our low-rank adversarial attacks compared to conventional methods.
要約:
The hubness problem, in which hub embeddings are close to many unrelated examples, occurs often in high-dimensional embedding spaces and may pose a practical threat for purposes such as information retrieval and automatic evaluation metrics. In particular, since cross-modal similarity between text and images cannot be calculated by direct comparisons, such as string matching, cross-modal encoders that project different modalities into a shared space are helpful for various cross-modal applications, and thus, the existence of hubs may pose practical threats. To reveal the vulnerabilities of cross-modal encoders, we propose a method for identifying the hub embedding and its corresponding hub text. Experiments on image captioning evaluation in MSCOCO and nocaps along with image-to-text retrieval tasks in MSCOCO and Flickr30k showed that our method can identify a single hub text that unreasonably achieves comparable or higher similarity scores than human-written reference captions in many images, thereby revealing the vulnerabilities in cross-modal encoders.
要約:
Renewed public attention on the identity of Bitcoin's pseudonymous creator has sharpened focus on the Satoshi overhang, commonly framed as a tail risk for bitcoin. This paper argues that the mechanical downside of a disposition is bounded well below the existential-loss framing, and that the terminal states most consistent with sixteen years of holder behavior are nonbearish for bitcoin's effective supply. The approximately 1.148 million BTC Patoshi position is analyzed on two tracks. For a purely wealth-maximizing holder, a three-scenario quantitative analysis (Appendix A) shows that bitcoin's current market depth is sufficient to absorb a patient multi-year liquidation at a cumulative price impact in the mid-single-digit to mid-double-digit percent range relative to counterfactual, with the central scenario clustering near 10 percent. The paper maps a decision space rather than identifying a unique modal outcome, assuming a holder whose profile is consistent with the sixteen-year record. Preference sets consistent with the record, including ideological non-intervention, privacy above all, satisficing, and myth preservation, favor continued dormancy terminating in a cryptographically enforced nonrecovery or destruction arrangement; preference sets favoring adversarial or wealth-maximizing action are possible but less supported. Across the plausible region of the decision space, the bear case is bounded and the terminal states most consistent with observed behavior are neutral to slightly positive for bitcoin's effective supply.
要約:
The rapid proliferation of image generation models and other artificial intelligence (AI) systems has intensified concerns regarding data privacy and user consent. As the availability of public datasets declines, major technology companies increasingly rely on proprietary or private user data for model training, raising ethical and legal challenges when users request the deletion of their data after it has influenced a trained model. Machine unlearning seeks to address this issue by enabling the removal of specific data from models without complete retraining. This study investigates a modified SISA (Sharded, Isolated, Sliced, and Aggregated) framework designed to achieve class-level unlearning in Convolutional Neural Network (CNN) architectures. The proposed framework incorporates a reinforced replay mechanism and a gating network to enhance selective forgetting efficiency. Experimental evaluations across multiple image datasets and CNN configurations demonstrate that the modified SISA approach enables effective class unlearning while preserving model performance and reducing retraining overhead. The findings highlight the potential of SISA-based unlearning for deployment in privacy-sensitive AI applications. The implementation is publicly available at https://github.com/SiamFS/ sisa-class-unlearning.
要約:
Neural backdoors represent insidious cybersecurity loopholes that render learning machinery vulnerable to unauthorised manipulations, potentially enabling the weaponisation of artificial intelligence with catastrophic consequences. A backdoor attack involves the clandestine infiltration of a trigger during the learning process, metaphorically analogous to hypnopaedia, where ideas are implanted into a subject's subconscious mind under the state of hypnosis or unconsciousness. When activated by a sensory stimulus, the trigger evokes a conditioned reflex that directs a machine to mount a predetermined response. In this study, we propose a cybernetic framework for constant surveillance of backdoor threats, driven by the dynamic nature of untrustworthy data sources. We develop a self-aware unlearning mechanism to autonomously detach a machine's behaviour from the backdoor trigger. Through reverse engineering and statistical inference, we detect deceptive patterns and estimate the likelihood of backdoor infection. We employ model inversion to elicit artificial mental imagery, using stochastic processes to disrupt optimisation pathways and avoid convergent but potentially flawed patterns. This is followed by hypothesis analysis, which estimates the likelihood of each potentially malicious pattern as the true trigger and infers the probability of infection. The primary objective of this study is to maintain a stable state of equilibrium between knowledge fidelity and backdoor vulnerability.
要約:
The task of text-to-image generation has achieved tremendous success in practice, with emerging concept generation models capable of producing highly personalized and customized content. Fervor for concept generation is increasing rapidly among users, and platforms for concept sharing have sprung up. The concept owners may upload malicious concepts and disguise them with non-malicious text descriptions and example images to deceive users into downloading and generating malicious content. The platform needs a quick method to determine whether a concept is malicious to prevent the spread of malicious concepts. However, simply relying on concept image generation to judge whether a concept is malicious requires time and computational resources. Especially, as the number of concepts uploaded and downloaded on the platform continues to increase, this approach becomes impractical and poses a risk of generating malicious content. In this paper, we propose Concept QuickLook, the first systematic work to incorporate malicious concept detection into research, which performs detection based solely on concept files without generating any images. We define malicious concepts and design two operational modes for detection: concept matching and fuzzy detection. Extensive experiments demonstrate that the proposed Concept QuickLook can detect malicious concepts and demonstrate practicality in concept sharing platforms. We also design robustness experiments to further validate the effectiveness of the solution. We hope this work can initiate malicious concept detection tasks and provide some inspiration.
要約:
Boolean functions and binary sequences are main tools used in cryptography. In this work, we introduce a new bijection between the set of Boolean functions and the set of binary sequences with period a power of two. We establish a connection between them which allows us to study some properties of Boolean functions through binary sequences and vice versa. Then, we define a new representation of sequences, based on Boolean functions and derived from the algebraic normal form, named reverse-ANF. Next, we study the relation between such a representation and other representations of Boolean functions as well as between such a representation and the binary sequences. Finally, we analyse the generalized self-shrinking sequences in terms of Boolean functions and some of their properties using the different representations.
要約:
Modern cybersecurity requires systematic ways to evaluate how detection systems respond to evolving and previously unseen attack behaviors. Existing malware repositories largely capture known patterns and provide limited support for stress-testing defenses against novel threats. To address this, we present MalGEN, a modular testbed that models adversarial workflows and generates executable artifacts in a controlled environment. The framework decomposes high-level attack objectives into structured stages, enabling the synthesis of diverse and multi-stage behaviors. We evaluate MalGEN across 1,920 benchmark settings covering multiple platforms and behavioral objectives, resulting in 977 executable samples. Analysis shows that the generated artifacts exhibit a wide range of malicious techniques and multi-stage attack patterns. However, 45.71% of these samples remain undetected by existing detection engines, which reveals notable gaps in current defenses. These findings provide practical insights into the limitations of widely used detection approaches and support the development of more robust security evaluation and testing practices.
要約:
The rise of synthetic media has blurred the boundary between reality and fabrication under the evolving power of artificial intelligence, fueling an infodemic that erodes public trust in cyberspace. For digital imagery, a multitude of editing applications further complicates the forensic analysis, including semantic edits that alter content, photometric adjustments that recalibrate colour characteristics, and geometric projections that reshape viewpoints. Collectively, these transformations manipulate and control perceptual interpretation of digital imagery. This susceptibility calls for forensic enquiry into reconstructing the chain of events, thereby revealing deeper evidential insight into the presence or absence of criminal intent. This study seeks to address an inverse problem of tracing the underlying generation chain that gives rise to the observed synthetic media. A tell-tale watermarking system is developed for explanatory reasoning over the nature and extent of transformations across the lifecycle of synthetic media. Tell-tale watermarks are tailored to different classes of transformations, responding in a manner that is neither strictly robust nor fragile but instead interpretable. These watermarks function as reference clues that evolve under the same transformation dynamics as the carrier media, leaving interpretable traces when subjected to transformations. Explanatory reasoning is then performed to infer the most plausible account across the combinatorial parameter space of composite transformations. Experimental evaluations demonstrate the validity of tell-tale watermarking with respect to fidelity, synchronicity and traceability.
要約:
When AI agents operating with access to sensitive information encounter a conflict between completing an assigned task and following rules or ethical constraints, they can resort to unsanctioned behaviour. Existing inference time safety work addresses this primarily through monitoring and access restriction. We investigate a complementary and under-explored layer: environmental controls that act on the agent's decision context at the point of conflict, making it more likely that the agent takes an authorised alternative path rather than an unsanctioned one. Drawing on Situational Crime Prevention (SCP), a framework used in human insider risk management to make harmful actions less rewarding and compliant actions more viable by design choices in the environment, we design and evaluate escalation channels as a concrete instantiation of this control class. An escalation channel provides an agent with a formal, out-of-band route to surface a conflict to an independent authority. We evaluate two designs: a simple email escalation and an instrumentally credible channel that guarantees a 30-minute pause and independent review, making the authorised path genuinely useful for goal achievement rather than merely nominally available. Across 10 frontier LLMs using the agentic task-rule conflict scenario of Lynch et al. (2025), we find that without any control the harmful action rate is 38.73%. A simple escalation channel reduces this to 5.92%; the instrumentally credible channel reduces it further to 1.21%, a statistically significant improvement observed in all 10 models tested across 24,000 samples. Our results suggest that the instrumental credibility of the authorised alternative matters considerably, and that environmental control design is a productive and largely unexplored addition to the defence-in-depth toolkit for agentic AI systems.
要約:
Web measurements are a well-established methodology for assessing the security and privacy landscape of the Internet. However, existing top lists of popular websites are unlabeled and lack semantic information about the nature of the included websites, making targeted web measurements challenging, as researchers often rely on ad-hoc techniques to bias datasets toward specific website classes of interest. In this paper, we investigate the use of Large Language Models (LLMs) to enable targeted web measurement studies. Building on prior literature, we identify key website classification tasks relevant to web measurements and highlight limitations in state-of-the-art classification approaches. We construct carefully curated datasets to evaluate different LLMs on these tasks. Our results show that LLMs can achieve strong performance across multiple classification scenarios, but the choice of model and configuration plays a significant role. Motivated by the observed trade-off between classification accuracy and computational efficiency, we propose a practical two-step methodology for scalable targeted web measurements starting from the Tranco list. Finally, we conduct LLM-assisted web measurement studies inspired by prior work using our methodology and assess the validity of the resulting research inferences, showing that LLMs can effectively enable targeted measurements of security and privacy trends on the Web.
著者: Felix Oberhansl, Marc Schink, Nisha Jacob Kabakci, Michael Gruber, Dominik Klein, Sven Freud, Tobias Damm, Michael Hartmeier, Ivan Gavrilan, Silvan Streit, Jonas Stappenbeck, Andreas Zankl
要約:
Smartphones handle sensitive tasks such as messaging and payment and may soon support critical electronic identification through initiatives such as the European Digital Identity (EUDI) wallet, currently under development. Yet the susceptibility of modern smartphones to physical side-channel analysis (SCA) is underexplored, with recent work limited to pre-2019 hardware. Since then, smartphone system on chip (SoC) platforms have grown more complex, with heterogeneous processor clusters, sub 10 nm nodes, and frequencies over 2 GHz, potentially complicating SCA. In this paper, we assess the feasibility of electromagnetic (EM) SCA on a Raspberry Pi 4, featuring a Broadcom BCM2711 SoC and a Fairphone 4 featuring a Snapdragon 750G 5G SoC. Using new attack methodologies tailored to modern SoCs, we recover ECDSA secrets from OpenSSL by mounting the Nonce@Once attack of Alam et al. (Euro S&P 2021) and show that the libgcrypt countermeasure does not fully mitigate it. We present case studies illustrating how hardware and software stacks impact EM SCA feasibility. Motivated by use cases such as the EUDI wallet, we survey Android cryptographic implementations and define representative threat models to assess the attack. Our findings show weaknesses in ECDSA software implementations and underscore the need for independently certified secure elements (SEs) in all smartphones.
要約:
The rapid advancement of deep learning has turned models into highly valuable assets due to their reliance on massive data and costly training processes. However, these models are increasingly vulnerable to leakage and theft, highlighting the critical need for robust intellectual property protection. Model watermarking has emerged as an effective solution, with black-box watermarking gaining significant attention for its practicality and flexibility. Nonetheless, existing black-box methods often fail to better balance covertness (hiding the watermark to prevent detection and forgery) and robustness (ensuring the watermark resists removal)-two essential properties for real-world copyright verification. In this paper, we propose ComMark, a novel black-box model watermarking framework that leverages frequency-domain transformations to generate compressed, covert, and attack-resistant watermark samples by filtering out high-frequency information. To further enhance watermark robustness, our method incorporates simulated attack scenarios and a similarity loss during training. Comprehensive evaluations across diverse datasets and architectures demonstrate that ComMark achieves state-of-the-art performance in both covertness and robustness. Furthermore, we extend its applicability beyond image recognition to tasks including speech recognition, sentiment analysis, image generation, image captioning, and video recognition, underscoring its versatility and broad applicability.
要約:
Machine learning property attestations allow provers (e.g., model providers or owners) to attest properties of their models/datasets to verifiers (e.g., regulators, customers), enabling accountability towards regulations and policies. But, current approaches do not support generative models or large datasets. We present PAL*M, a property attestation framework for large generative models, illustrated using large language models. PAL*M defines properties across training and inference, leverages confidential virtual machines with security-aware GPUs for coverage of CPU-GPU operations, and proposes using incremental multiset hashing over memory-mapped datasets to efficiently track their integrity. We implement PAL*M on Intel TDX+NVIDIA H100 and evaluate it using state-of-the-art models and datasets, showing PAL*M is efficient, incurring < 11% overhead for common operations. Finally, we use the Tamarin Prover symbolic verification tool to formally model PAL*M's property attestation protocol, confirming that its security guarantees are upheld under the defined threat model.
要約:
Eclipse attacks isolate blockchain nodes by monopolizing their peer-to-peer connections. The attacks were extensively studied in Bitcoin (SP'15, SP'20, CCS'21, SP'23) and Monero (NDSS'25), but their practicality against Ethereum nodes remains underexplored, particularly in the post-Merge settings.
We present the first end-to-end implementation of an eclipse attack targeting Ethereum (2.0 version) execution-layer nodes. Our attack exploits the bootstrapping and peer management logic of Ethereum to fully isolate a node upon restart. We introduce a multi-stage strategy that majorly includes (i) poisoning the node's discovery table via unsolicited messages, (ii) infiltrating Ethereum's DNS-based peerlist by identifying and manipulating the official DNS crawler, and (iii) hijacking idle incoming connection slots across the network to block benign connections. Our DNS list poisoning is the first in the cryptocurrency context and requires only 28 IP addresses over 100 days. Slots hijacking raises outgoing redirection success from 45\% to 95\%. We validate our approach through controlled experiments on Ethereum's Sepolia testnet and broad measurements on the mainnet. Our findings demonstrate that over 80\% of public nodes do not leave sufficient idle capacity for effective slots occupation, highlighting the feasibility and severity of the threat. We further propose concrete countermeasures and responsibly disclosed all findings to Ethereum's security team.
要約:
Insider threat detection is difficult because malicious behavior is rare, irregular, and buried in long periods of inactivity. In enterprise audit data, most windows contain little activity, while attacks appear intermittently and range from brief events to sustained campaigns. Standard reconstruction-based models are therefore dominated by inactive regions and tend to learn baseline behavior rather than meaningful deviations. We separate activity presence from magnitude. Each window is decomposed into a binary mask indicating whether activity occurs and a value matrix capturing its intensity. A dual-channel autoencoder reconstructs both, with value loss applied only where activity is present, directing learning toward sparse structure. Using the CERT r5.2 dataset as a controlled setting, we examine how anomaly signal changes with temporal configuration. Short attacks are detected mainly through presence; longer attacks introduce a magnitude component; noise degrades magnitude reliability and shifts detection back toward presence. The balance between channels is not fixed and follows the data. At the campaign level, signal concentrates in a small number of anomalous windows. Simple aggregation that emphasizes extreme scores is sufficient to recover extended activity without explicit sequence modeling. Effective detection depends less on model complexity and more on aligning representation and objective with sparse temporal structure.
要約:
Cybersecurity decision-making increasingly occurs in environments characterized by uncertainty, partial observability, and adversarial manipulation, where heterogeneous signals from multiple sources are often incomplete, ambiguous, or conflicting. Traditional Security Orchestration, Automation, and Response (SOAR) systems rely on deterministic pipelines and threshold-based triggers, limiting their ability to support reliable decision-making under such conditions.
This paper proposes a probabilistic, agentic framework for cybersecurity orchestration that models decision-making as a meta-cognitive process. The framework decomposes cybersecurity functions into interacting agents responsible for detection, hypothesis formation, contextualization, explanation, and governance, coordinated through a meta-cognitive judgement mechanism. This mechanism evaluates uncertainty, agent disagreement, and operational constraints to determine decision readiness, enabling adaptive strategies including automated action, escalation, deferral, and evidence refinement.
Empirical evaluation on benchmark datasets (CICIDS2017 and NSL-KDD), augmented with adversarial and uncertain conditions, demonstrates that the proposed approach improves robustness and decision quality compared to deterministic and single-agent baselines. The framework achieves higher accuracy under noise, reduces false positive rates, and produces better-calibrated confidence estimates, while enabling more adaptive and context-aware decision behavior.
By explicitly modeling meta-cognitive processes - monitoring, evaluation, control, and reflection - the proposed approach reframes cybersecurity as an instance of AI-mediated cognitive problem solving, supporting accountable autonomy and more effective human-AI collaboration in adversarial environments.
要約:
Homomorphic encryption (HE) is a promising technology for confidential cloud computing, as it allows computations on encrypted data. However, HE is computationally expensive and often memory-bound on conventional computer architectures. Processing-in-Memory (PIM) is an alternative hardware architecture that integrates processing units and memory on the same chip or memory module. PIM enables higher memory bandwidth than conventional architectures and could thus be suitable for accelerating HE. We present DRAMatic, which implements operations foundational to HE on UPMEM PIM -- a programmable general-purpose PIM system developed by UPMEM. DRAMatic incorporates many arithmetic optimizations, including residue number system and number-theoretic transform techniques, and can support the large parameters required for secure homomorphic evaluations. It achieves a 334 times speed-up compared to previous HE implementations on UPMEM PIM. We also evaluate DRAMatic against Microsoft SEAL, a popular open-source HE library, regarding both runtime and energy efficiency. The results show that DRAMatic significantly closes the gap between Microsoft SEAL and HE implementations on UPMEM PIM. However, we also show that DRAMatic is currently constrained by data transfer overhead and limited multiplication performance on UPMEM PIM hardware. Finally, we discuss potential hardware extensions to UPMEM PIM.
要約:
Hardware-secured remote attestation is essential to establishing trust in the integrity of confidential virtual machines (cVMs), but is difficult to use in practice because verifying attestation evidence requires the use of hardware-specific cryptographic logic. This increases both maintenance costs and the verifiers' trusted computing base. We introduce the concept of self-verifying remote attestation evidence. Each attestation bundle identifies its verification logic in the form of a WebAssembly component that is downloaded by the verifier and executed. This approach transforms evidence verification into a platform-agnostic functionality that is implemented once for all platforms: the verifier measures the verification logic and then executes it to validate the evidence. As a result, verifiers can validate attestation evidence without any platform-specific code; the verification logic is just another measurement whose reference value can be checked with existing mechanisms. We implement this concept as TrustMee, a platform-agnostic verification driver for the Trustee framework. We demonstrate its functionality with self-verifying evidence for AMD SEV-SNP, Intel TDX, and Intel SGX attestations, producing attestation claims in the standard Entity Attestation Token (EAT) format.
要約:
We study builder-driven MEV arbitrage on BNB Smart Chain (BSC). BSC's Proposer-Builder Separation (PBS) adopts a leaner design: only whitelisted builders can participate, blocks are produced at shorter intervals, and private order flow bypasses the public mempool. These features have long raised community concerns over centralization, which we empirically confirm by tracing the arbitrage activities of the two dominant builders from Apr. 1, 2025 to Feb. 28, 2026 (full observable activity cycle). Within months, the two leading builders, \bd{48Club} and \bd{Blockrazor}, produced over 87\% of blocks and captured about 90\%+ of MEV profits.
We find that profits concentrate in short, low-hop arbitrage routes over wrapped tokens and stablecoins, and that block construction rapidly converges toward monopoly. Beyond concentration alone, our analysis reveals a structural source of inequality: BSC's short block interval and whitelisted PBS collapse the contestable window for MEV competition, amplifying latency advantages and excluding slower builders and searchers. MEV extraction on BSC is not only more centralized than on Ethereum, but also structurally more vulnerable to censorship and fairness erosion.
要約:
Network segmentation is a foundational enterprise security control. Despite its recognized benefits, segmentation initiatives frequently fail in practice, and the field lacks a systematic empirical explanation for why these projects do not achieve their intended outcomes. This paper presents an empirical study of failed segmentation projects based on a survey of 400 U.S.-based\ network security practitioners. The survey was grounded in a two-part failure framework that separately measures general IT project failure factors and segmentation-specific technical and operational barriers. Clustering analysis of the responses reveals four distinct failure archetypes. Surprisingly, practitioners across all four archetypes propose general IT project management fixes over segmentation-specific fixes in the same ratio.
要約:
Mutual TLS (mTLS) provides strong, certificate-based authentication for both clients and servers, yet its adoption for user-facing websites remains rare. This paper presents a longitudinal study of mTLS usability, tracking 46 senior and graduate computer science students who configured client certificates from scratch, used them for routine authentication over a semester-long course, and managed credentials across multiple devices. The results reveal that initial setup is a major bottleneck; while daily use was considered smooth, it did not improve long-term usability perceptions. Most concerningly, only 9% of participants fully understood the security implications of certificate-based authentication. We conclude that in a realistic, tooling-heavy deployment utilizing OpenSSL, a custom CA, and a 3072-bit minimum key requirement, even highly technical students struggled significantly. We argue this provides empirical evidence that today mTLS user experience is fundamentally misaligned with non-PKI specialists, and it is difficult to see a path toward mainstream adoption without substantial platform-level changes.
要約:
We propose CHRONOS, a hardware-assisted framework that decouples the cryptographic setup required for private gradient aggregation from the active training phase. CHRONOS executes a once-per-epoch server-relayed Diffie-Hellman key exchange during a device's idle window. It generates ephemeral keypairs and derives PRG keys entirely within an ARM TrustZone enclave, ensuring private keys never exist in Normal World memory. Pairwise secrets are sealed in the enclave, and Shamir secret shares of the ephemeral private key are distributed to peers. During training, clients mask gradients with a single stream-cipher evaluation and transmit them in one communication round. A hardware-backed round counter enforces single-use freshness. If clients drop out mid-round, the server reconstructs their masks from peer-held Shamir shares, preserving correct aggregation without repeating the round.
Evaluation on Rock Pi 4 devices using OP-TEE demonstrates that CHRONOS achieves OS-level compromise resistance and thwarts state-of-the-art gradient inversion attacks. It reduces active-phase aggregation latency by up to 74% compared to synchronous secure aggregation for 20 clients. The system maintains a persistent Secure World storage footprint of fewer than 700 bytes per device, scaling independently of model dimension.
要約:
The deployment of Large Language Models (LLMs) as assistants in electric grid operations promises to streamline compliance and decision-making but exposes new vulnerabilities to prompt-based adversarial attacks. This paper evaluates the risk of jailbreaking LLMs, i.e., circumventing safety alignments to produce outputs violating regulatory standards, assuming threats from authorized users, such as operators, who craft malicious prompts to elicit non-compliant guidance. Three state-of-the-art LLMs (OpenAI's GPT-4o mini, Google's Gemini 2.0 Flash-Lite, and Anthropic's Claude 3.5 Haiku) were tested against Baseline, BitBypass, and DeepInception jailbreaking methods across scenarios derived from nine NERC Reliability Standards (EOP, TOP, and CIP). In the initial broad experiment, the overall Attack Success Rate (ASR) was 33.1%, with DeepInception proving most effective at 63.17% ASR. Claude 3.5 Haiku exhibited complete resistance (0% ASR), while Gemini 2.0 Flash-Lite was most vulnerable (55.04% ASR) and GPT-4o mini moderately susceptible (44.34% ASR). A follow-up experiment refining malicious wording in Baseline and BitBypass attacks yielded a 30.6% ASR, confirming that subtle prompt adjustments can enhance simpler methods' efficacy.
要約:
Retrieval-Augmented Generation (RAG) is essential for enhancing Large Language Models (LLMs) with external knowledge, but its reliance on cloud environments exposes sensitive data to privacy risks. Existing privacy-preserving solutions often sacrifice retrieval quality due to noise injection or only provide partial encryption. We propose PRAG, an end-to-end privacy-preserving RAG system that achieves end-to-end confidentiality for both documents and queries without sacrificing the scalability of cloud-hosted RAG. PRAG features a dual-mode architecture: a non-interactive PRAG-I utilizes homomorphic-friendly approximations for low-latency retrieval, while an interactive PRAG-II leverages client assistance to match the accuracy of non-private RAG. To ensure robust semantic ordering, we introduce Operation-Error Estimation (OEE), a mechanism that stabilizes ranking against homomorphic noise. Experiments on large-scale datasets demonstrate that PRAG achieves competitive recall (72.45%-74.45%), practical retrieval latency, and strong resilience against graph reconstruction attacks while maintaining end-to-end confidentiality. This work confirms the feasibility of secure, high-performance RAG at scale.
要約:
As the cornerstone of artificial intelligence, machine perception confronts a fundamental threat posed by adversarial illusions. These adversarial attacks manifest in two primary forms: deductive illusion, where specific stimuli are crafted based on the victim model's general decision logic, and inductive illusion, where the victim model's general decision logic is shaped by specific stimuli. The former exploits the model's decision boundaries to create a stimulus that, when applied, interferes with its decision-making process. The latter reinforces a conditioned reflex in the model, embedding a backdoor during its learning phase that, when triggered by a stimulus, causes aberrant behaviours. The multifaceted nature of adversarial illusions calls for a unified defence framework, addressing vulnerabilities across various forms of attack. In this study, we propose a disillusion paradigm based on the concept of an imitation game. At the heart of the imitation game lies a multimodal generative agent, steered by chain-of-thought reasoning, which observes, internalises and reconstructs the semantic essence of a sample, liberated from the classic pursuit of reversing the sample to its original state. As a proof of concept, we conduct experimental simulations using a multimodal generative dialogue agent and evaluates the methodology under a variety of attack scenarios. Experimental results demonstrate that the proposed framework consistently neutralises both deductive and inductive adversarial illusions across diverse white-box and black-box attack scenarios.
要約:
Provenance is the chronological history of things, resonating with the fundamental pursuit to uncover origins, trace connections, and situate entities within the flow of space and time. As artificial intelligence advances towards autonomous agents capable of interactive collaboration on complex tasks, the provenance of generated content becomes entangled in the interplay of collective creation, where contributions are continuously revised, extended or overwritten. In a multi-agent generative chain, content undergoes successive transformations, often leaving little, if any, trace of prior contributions. In this study, we investigate the problem of tracking multi-agent provenance across the temporal dimension of generation. We propose a chronological system for post hoc attribution of generative history from content alone, without reliance on internal memory states or external meta-information. At its core lies the notion of symbolic chronicles, representing signed and time-stamped records, in a form analogous to the chain of custody in forensic science. The system operates through a feedback loop, whereby each generative timestep updates the chronicle of prior interactions and synchronises it with the synthetic content in the very act of generation. This research seeks to develop an accountable form of collaborative artificial intelligence within evolving cyber ecosystems.
要約:
The growing reliance on artificial intelligence in safety- and security-critical applications is raising concerns about the robustness of neural networks to erroneous or adversarial input. Certification is a methodology for ensuring model trustworthiness by providing formal guarantees on model behaviour. While most verification methods focus on worst-case analysis by bounding the network output, an alternative approach based on approximating the preimage can complement such analysis by estimating the proportion of inputs that satisfy a given specification. However, existing preimage-based methods, such as the state-of-the-art PREMAP, are limited to fully connected neural networks of moderate dimensionality. In this paper, we introduce PREMAP2, a collection of algorithmic extensions to PREMAP that enhance its scalability and efficiency through improved branching heuristics, adaptive Monte Carlo sampling, and reverse bound propagation. We further endow PREMAP2 with additional functionality such as support for non-uniform priors and confidence intervals. These advances enable the application of PREMAP2 to previously intractable settings, including real-world patch attacks against convolutional neural networks, where adversarial stickers or lighting conditions obscure parts of images. We showcase the effectiveness of our approach across several use cases, including certifying reliability, robustness, interpretability, and fairness, on domains ranging from computer vision to control tasks. Our implementation is available as open-source software.
要約:
This paper presents an exact and explicit tensor-network equation for the search of nontrivial divisors of a composite integer, together with an algorithm for its computation. The proposed method is based on the MeLoCoToN approach, which addresses combinatorial optimization problems through classical tensor networks. The presented tensor network tensorizes a binary multiplication circuit and projects its output onto the target integer to be factorized. Additionally, in order to make the algorithm more efficient, the number and dimension of the tensors and their contraction scheme are optimized, including a reduced auxiliary register that still preserves at least one valid factorization orientation. Finally, a series of tests on the algorithm are conducted, contracting the tensor network both exactly and approximately using tensor train compression, and evaluating its performance.
要約:
Prompt-driven Video Segmentation Foundation Models (VSFMs), such as SAM2, are increasingly used in applications including autonomous driving and digital pathology, yet their security risks remain underexplored. We study backdoor attacks against VSFMs and show that directly applying classic attacks such as BadNet is largely ineffective, yielding attack success rates (ASR) below 5%. Through gradient-similarity and attention-map analyses, we find that traditional backdoor training fails because clean and triggered samples induce aligned image-encoder gradients, while model attention remains focused on the prompt-specified object rather than the trigger. To address this limitation, we propose BadVSFM, the first backdoor attack framework tailored to prompt-driven VSFMs. BadVSFM uses a two-stage strategy that first learns trigger-specific encoder features and then trains the decoder to map triggered frame prompt representations to an attacker-specified target mask while preserving clean segmentation behavior. Experiments on five VSFMs and two datasets show that BadVSFM achieves strong, controllable backdoor effects across triggers and prompt types with limited clean-performance degradation. Ablations and interpretability analyses validate the necessity of the two-stage design, and five representative defenses remain largely ineffective. Our results reveal a practical and underexplored vulnerability of current VSFMs to backdoor threats.
要約:
Large language models (LLMs) are increasingly paired with activation-based monitoring to detect and prevent harmful behaviors that may not be apparent at the surface-text level. However, existing activation safety approaches, trained on broad misuse datasets, struggle with poor precision, limited flexibility, and lack of interpretability. This paper introduces a new paradigm: rule-based activation safety, inspired by rule-sharing practices in cybersecurity. We propose modeling activations as cognitive elements (CEs), fine-grained, interpretable factors such as 'making a threat' and 'payment processing', that can be composed to capture nuanced, domain-specific behaviors with higher precision. Building on this representation, we present a practical framework that defines predicate rules over CEs and detects violations in real time. This enables practitioners to configure and update safeguards without retraining models or detectors, while supporting transparency and auditability. Our results show that compositional rule-based activation safety improves precision, supports domain customization, and lays the groundwork for scalable, interpretable, and auditable AI governance. We open source GAVEL and introduce GAVEL Studio, an interactive rule authoring and management tool. Code and datasets are available at github.com/Offensive-AI-Lab/gavel.
要約:
Commit signing is a principal mechanism for verifying the origin of code in software supply chains. Security frameworks treat it as a core trust signal, assuming developers sign their commits consistently with keys they control and keep those keys in good standing over time. Whether this assumption holds in practice has not been evaluated at ecosystem scale. This study addresses this gap. We present the first developer-centric, ecosystem-scale measurement of commit signing on GitHub, covering the platform's full history, spanning 71,694 active developers, 16.1 million commits, and 874,198 repositories. To summarize our findings: (1) overall signing adoption rates are misleading, as most signed commits come from automatic platform signing rather than deliberate developer action; (2) developers who do sign locally rarely keep it up consistently across repositories or over time; (3) signing lapse rates rise alongside account age rather than falling, making sustained coverage structurally unlikely; and (4) developers manage their signing keys poorly, leaving expired keys unrevoked and letting credential debt grow over time. Our findings show that the assumptions underlying supply-chain security frameworks do not hold in practice, establishing that progress requires either signing credentials redesigned to travel with developer identity while remaining outside platform control, or frameworks built on defenses that do not depend on every developer managing their own keys.