要約:
The emergence of open data portals necessitates more attention to protecting sensitive data before datasets get published and exchanged. While an abundance of methods for suppressing sensitive data exist, the conceptualization of sensitive data and methods to detect it, focus particularly on personal data that, if disclosed, may be harmful or violate privacy. We observe the need for refining and broadening our definitions of sensitive data, and argue that the sensitivity of data depends on its context. Based on this definition, we introduce two mechanisms for contextual sensitive data detection that con- sider the broader context of a dataset at hand. First, we introduce type contextualization, which first detects the semantic type of particular data values, then considers the overall context of the data values within the dataset or document. Second, we introduce domain contextualization which determines sensitivity of a given dataset in the broader context based on the retrieval of relevant rules from documents that specify data sensitivity (e.g., data topic and geographic origin). Experiments with these mechanisms, assisted by large language models (LLMs), confirm that: 1) type-contextualization significantly reduces the number of false positives for type-based sensitive data detection and reaches a recall of 94% compared to 63% with commercial tools, and 2) domain-contextualization leveraging sensitivity rule retrieval is effective for context-grounded sensitive data detection in non-standard data domains such as humanitarian datasets. Evaluation with humanitarian data experts also reveals that context-grounded LLM explanations provide useful guidance in manual data auditing processes, improving consistency. We open-source mechanisms and annotated datasets for contextual sensitive data detection at https://github.com/trl-lab/sensitive-data-detection.
要約:
LLM-based multi-agent systems (MASs) have reshaped the digital landscape with their emergent coordination and problem-solving capabilities. However, current security evaluations of MASs are still confined to limited attack scenarios, leaving their security issues unclear and likely underestimated. To fill this gap, we propose TOMA, a topology-aware multi-hop attack scheme targeting MASs. By optimizing the propagation of contamination within the MAS topology and controlling the multi-hop diffusion of adversarial payloads originating from the environment, TOMA unveils new and effective attack vectors without requiring privileged access or direct agent manipulation. Experiments demonstrate attack success rates ranging from 40% to 78% across three state-of-the-art MAS architectures: \textsc{Magentic-One}, \textsc{LangManus}, and \textsc{OWL}, and five representative topologies, revealing intrinsic MAS vulnerabilities that may be overlooked by existing research. Inspired by these findings, we propose a conceptual defense framework based on topology trust, and prototype experiments show its effectiveness in blocking 94.8% of adaptive and composite attacks.
要約:
This work introduces the Primitive Vector Cipher (PVC), a novel hybrid encryption scheme integrating matrix-based cryptography with advanced Diffie-Hellman key exchange. PVC's security is grounded on the established hardness of the Vector Computational Diffie- Hellman (V-CDH) problem. The two-layered design uses HKDF to mask plaintext via a DH-authenticated shared primitive vector and randomize cipher blocks with a per-block offset. This approach eliminates deterministic repetitions and provides strong resistance against linear and known-plaintext attacks. PVC's block-wise structure allows for massive parallelism and excellent linear scaling. Security is formally analyzed, demonstrating INDCPA security under V-CDH. STS protocol integration elevates security toward IND-CCA guarantees.
要約:
The use of large language models in digital forensics has been widely explored. Beyond identifying potential applications, research has also focused on optimizing model performance for forensic tasks through fine-tuning. However, limited result explainability reduces their operational and legal usability. Recently, a new class of reasoning language models has emerged, designed to handle logic-based tasks through an `internal reasoning' mechanism. Yet, users typically see only the final answer, not the underlying reasoning. One of these reasoning models is gpt-oss, which can be deployed locally, providing full access to its underlying reasoning process. This article presents the first investigation into the potential of reasoning language models for digital forensics. Four test use cases are examined to assess the usability of the reasoning component in supporting result explainability. The evaluation combines a new quantitative metric with qualitative analysis. Findings show that the reasoning component aids in explaining and validating language model outputs in digital forensics at medium reasoning levels, but this support is often limited, and higher reasoning levels do not enhance response quality.
要約:
LLM models are increasingly used to generate code, but the quality and security of this code are often uncertain. Several recent studies have raised alarm bells, indicating that such AI-generated code may be particularly vulnerable to cyberattacks. However, most of these studies rely on code that is generated specifically for the study, which raises questions about the realism of such experiments. In this study, we perform a large-scale empirical analysis of real-life code generated by ChatGPT. We evaluate code generated by ChatGPT both with respect to correctness and security and delve into the intentions of users who request code from the model. Our research confirms previous studies that used synthetic queries and yielded evidence that LLM-generated code is often inadequate with respect to security. We also find that users exhibit little curiosity about the security features of the code they ask LLMs to generate, as evidenced by their lack of queries on this topic.
要約:
Hypervisors are under threat by critical memory safety vulnerabilities, with pointer corruption being one of the most prevalent and severe forms. Existing exploitation frameworks depend on identifying highly-constrained structures in the host machine and accurately determining their runtime addresses, which is ineffective in hypervisor environments where such structures are rare and further obfuscated by Address Space Layout Randomization (ASLR). We instead observe that modern virtualization environments exhibit weak memory isolation -- guest memory is fully attacker-controlled yet accessible from the host, providing a reliable primitive for exploitation. Based on this observation, we present the first systematic characterization and taxonomy of Cross-Domain Attacks (CDA), a class of exploitation techniques that enable capability escalation through guest memory reuse. To automate this process, we develop a system that identifies cross-domain gadgets, matches them with corrupted pointers, synthesizes triggering inputs, and assembles complete exploit chains. Our evaluation on 15 real-world vulnerabilities across QEMU and VirtualBox shows that CDA is widely applicable and effective.
要約:
The rise of supply chain attacks via malicious Python packages demands robust detection solutions. Current approaches, however, overlook two critical challenges: robustness against adversarial source code transformations and adaptability to the varying false positive rate (FPR) requirements of different actors, from repository maintainers (requiring low FPR) to enterprise security teams (higher FPR tolerance).
We introduce a robust detector capable of seamless integration into both public repositories like PyPI and enterprise ecosystems. To ensure robustness, we propose a novel methodology for generating adversarial packages using fine-grained code obfuscation. Combining these with adversarial training (AT) enhances detector robustness by 2.5x. We comprehensively evaluate AT effectiveness by testing our detector against 122,398 packages collected daily from PyPI over 80 days, showing that AT needs careful application: it makes the detector more robust to obfuscations and allows finding 10% more obfuscated packages, but slightly decreases performance on non-obfuscated packages.
We demonstrate production adaptability of our detector via two case studies: (i) one for PyPI maintainers (tuned at 0.1% FPR) and (ii) one for enterprise teams (tuned at 10% FPR). In the former, we analyze 91,949 packages collected from PyPI over 37 days, achieving a daily detection rate of 2.48 malicious packages with only 2.18 false positives. In the latter, we analyze 1,596 packages adopted by a multinational software company, obtaining only 1.24 false positives daily. These results show that our detector can be seamlessly integrated into both public repositories like PyPI and enterprise ecosystems, ensuring a very low time budget of a few minutes to review the false positives.
Overall, we uncovered 346 malicious packages, now reported to the community.
要約:
Contemporary DevSecOps pipelines have to deal with the evolution of security in an ever-continuously integrated and deployed environment. Existing methods,such as rule-based intrusion detection and static vulnerability scanning, are inadequate and unreceptive to changes in the system, causing longer response times and organization needs exposure to emerging attack vectors. In light of the previous constraints, we introduce AutoGuard to the DevSecOps ecosystem, a reinforcement learning (RL)-powered self-healing security framework built to pre-emptively protect DevSecOps environments. AutoGuard is a self-securing security environment that continuously observes pipeline activities for potential anomalies while preemptively remediating the environment. The model observes and reacts based on a policy that is continually learned dynamically over time. The RL agent improves each action over time through reward-based learning aimed at improving the agent's ability to prevent, detect and respond to a security incident in real-time. Testing using simulated ContinuousIntegration / Continuous Deployment (CI/CD) environments showed AutoGuard to successfully improve threat detection accuracy by 22%, reduce mean time torecovery (MTTR) for incidents by 38% and increase overall resilience to incidents as compared to traditional methods.
Keywords- DevSecOps, Reinforcement Learning, Self- Healing Security, Continuous Integration, Automated Threat Mitigation
要約:
Processor designs rely on iterative modifications and reuse well-established designs. However, this reuse of prior designs also leads to similar vulnerabilities across multiple processors. As processors grow increasingly complex with iterative modifications, efficiently detecting vulnerabilities from modern processors is critical. Inspired by software fuzzing, hardware fuzzing has recently demonstrated its effectiveness in detecting processor vulnerabilities. Yet, to our best knowledge, existing processor fuzzers fuzz each design individually, lacking the capability to understand known vulnerabilities in prior processors to fine-tune fuzzing to identify similar or new variants of vulnerabilities.
To address this gap, we present ReFuzz, an adaptive fuzzing framework that leverages contextual bandit to reuse highly effective tests from prior processors to fuzz a processor-under-test (PUT) within a given ISA. By intelligently mutating tests that trigger vulnerabilities in prior processors, ReFuzz effectively detects similar and new variants of vulnerabilities in PUTs. ReFuzz uncovered three new security vulnerabilities and two new functional bugs. ReFuzz detected one vulnerability by reusing a test that triggers a known vulnerability in a prior processor. One functional bug exists across three processors that share design modules. The second bug has two variants. Additionally, ReFuzz reuses highly effective tests to enhance efficiency in coverage, achieving an average 511.23x coverage speedup and up to 9.33% more total coverage, compared to existing fuzzers.
要約:
To enhance the performance of large language models (LLMs) in various domain-specific applications, sensitive data such as healthcare, law, and finance are being used to privately customize or fine-tune these models. Such privately adapted LLMs are regarded as either personal privacy assets or corporate intellectual property. Therefore, protecting model weights and maintaining strict confidentiality during deployment and distribution have become critically important. However, existing model formats and deployment frameworks provide little to no built-in support for confidentiality, access control, or secure integration with trusted hardware. Current methods for securing model deployment either rely on computationally expensive cryptographic techniques or tightly controlled private infrastructure. Although these approaches can be effective in specific scenarios, they are difficult and costly for widespread deployment.
In this paper, we introduce CryptoTensors, a secure and format-compatible file structure for confidential LLM distribution. Built as an extension to the widely adopted Safetensors format, CryptoTensors incorporates tensor-level encryption and embedded access control policies, while preserving critical features such as lazy loading and partial deserialization. It enables transparent decryption and automated key management, supporting flexible licensing and secure model execution with minimal overhead. We implement a proof-of-concept library, benchmark its performance across serialization and runtime scenarios, and validate its compatibility with existing inference frameworks, including Hugging Face Transformers and vLLM. Our results highlight CryptoTensors as a light-weight, efficient, and developer-friendly solution for safeguarding LLM weights in real-world and widespread deployments.
要約:
Proof-of-Vulnerability (PoV) input generation is a critical task in software security and supports downstream applications such as path generation and validation. Generating a PoV input requires solving two sets of constraints: (1) reachability constraints for reaching vulnerable code locations, and (2) triggering constraints for activating the target vulnerability. Existing approaches, including directed greybox fuzzing and LLM-assisted fuzzing, struggle to efficiently satisfy these constraints. This work presents an agentic method that mimics human experts. Human analysts iteratively study code to extract semantic reachability and triggering constraints, form hypotheses about PoV triggering strategies, encode them as test inputs, and refine their understanding using debugging feedback. We automate this process with an agentic directed fuzzing framework called PBFuzz. PBFuzz tackles four challenges in agentic PoV generation: autonomous code reasoning for semantic constraint extraction, custom program-analysis tools for targeted inference, persistent memory to avoid hypothesis drift, and property-based testing for efficient constraint solving while preserving input structure. Experiments on the Magma benchmark show strong results. PBFuzz triggered 57 vulnerabilities, surpassing all baselines, and uniquely triggered 17 vulnerabilities not exposed by existing fuzzers. PBFuzz achieved this within a 30-minute budget per target, while conventional approaches use 24 hours. Median time-to-exposure was 339 seconds for PBFuzz versus 8680 seconds for AFL++ with CmpLog, giving a 25.6x efficiency improvement with an API cost of 1.83 USD per vulnerability.
要約:
Graph topology is a fundamental determinant of memory leakage in multi-agent LLM systems, yet its effects remain poorly quantified. We introduce MAMA (Multi-Agent Memory Attack), a framework that measures how network structure shapes leakage. MAMA operates on synthetic documents containing labeled Personally Identifiable Information (PII) entities, from which we generate sanitized task instructions. We execute a two-phase protocol: Engram (seeding private information into a target agent's memory) and Resonance (multi-round interaction where an attacker attempts extraction). Over up to 10 interaction rounds, we quantify leakage as the fraction of ground-truth PII recovered from attacking agent outputs via exact matching. We systematically evaluate six common network topologies (fully connected, ring, chain, binary tree, star, and star-ring), varying agent counts $n\in\{4,5,6\}$, attacker-target placements, and base models. Our findings reveal consistent patterns: fully connected graphs exhibit maximum leakage while chains provide strongest protection; shorter attacker-target graph distance and higher target centrality significantly increase vulnerability; leakage rises sharply in early rounds before plateauing; model choice shifts absolute leakage rates but preserves topology rankings; temporal/locational PII attributes leak more readily than identity credentials or regulated identifiers. These results provide the first systematic mapping from architectural choices to measurable privacy risk, yielding actionable guidance: prefer sparse or hierarchical connectivity, maximize attacker-target separation, limit node degree and network radius, avoid shortcuts bypassing hubs, and implement topology-aware access controls.
要約:
Gleeok is a family of low latency keyed pseudorandom functions (PRFs) consisting of three parallel SPN based permutations whose outputs are XORed to form the final value. Both Gleeok-128 and Gleeok-256 use a 256 bit key, with block sizes of 128 and 256 bits, respectively. Owing to its multi branch structure, evaluating security margins and mounting effective key recovery attacks present nontrivial challenges. This paper provides the first comprehensive third party cryptanalysis of Gleeok-128. We introduce a two stage MILP based framework for constructing branch wise and full cipher differential linear (DL) distinguishers, together with an integral based key recovery framework tailored to multi branch designs. Our DL analysis yields 7, 7, 8, and 4 round distinguishers for Branch 1, Branch 2, Branch 3, and Gleeok-128, respectively, with squared correlations approximately 2 to the power minus 88.12, 2 to the power minus 88.12, 2 to the power minus 38.73, and 2 to the power minus 49.04, outperforming those in the design document except for the full PRF case. By tightening algebraic degree bounds, we further derive 9, 9, and 7 round integral distinguishers for the three branches and a 7 round distinguisher for the full PRF, extending the designers results by 3, 3, and 2 rounds and by 2 rounds, respectively. These integral properties enable 7 round and 8 round key recovery attacks in the non full codebook and full codebook settings. In addition, we identify a flaw in the original linear security evaluation of Branch 3, showing that it can be distinguished over all 12 rounds with data complexity about 2 to the power 48. We also propose optimized linear layer parameters that significantly improve linear resistance without sacrificing diffusion. Our results advance the understanding of Gleeok-128 and provide general methods for analyzing multi branch symmetric designs.
要約:
Large Language Models (LLMs) exhibit remarkable capabilities but remain vulnerable to adversarial manipulations such as jailbreaking, where crafted prompts bypass safety mechanisms. Understanding the causal factors behind such vulnerabilities is essential for building reliable defenses.
In this work, we introduce a unified causality analysis framework that systematically supports all levels of causal investigation in LLMs, ranging from token-level, neuron-level, and layer-level interventions to representation-level analysis. The framework enables consistent experimentation and comparison across diverse causality-based attack and defense methods. Accompanying this implementation, we provide the first comprehensive survey of causality-driven jailbreak studies and empirically evaluate the framework on multiple open-weight models and safety-critical benchmarks including jailbreaks, hallucination detection, backdoor identification, and fairness evaluation. Our results reveal that: (1) targeted interventions on causally critical components can reliably modify safety behavior; (2) safety-related mechanisms are highly localized (i.e., concentrated in early-to-middle layers with only 1--2\% of neurons exhibiting causal influence); and (3) causal features extracted from our framework achieve over 95\% detection accuracy across multiple threat types.
By bridging theoretical causality analysis and practical model safety, our framework establishes a reproducible foundation for research on causality-based attacks, interpretability, and robust attack detection and mitigation in LLMs. Code is available at https://github.com/Amadeuszhao/SOK_Casuality.
要約:
As the frequency and complexity of Distributed Denial-of-Service (DDoS) attacks continue to increase, the level of threats posed to Smart Internet of Things (SIoT) business environments have also increased. These environments generally have several interconnected SIoT systems and devices that are integral to daily operations, usually depending on cloud infrastructure and real-time data analytics, which require continuous availability and secure data exchange. Conventional detection mechanisms, while useful in static or traditional network environments, often are inadequate in responding to the needs of these dynamic and diverse SIoT networks. In this paper, we introduce a novel trust-based DDoS detection model tailored to meet the unique requirements of smart business environments. The proposed model incorporates a trust evaluation engine that continuously monitors node behaviour, calculating trust scores based on packet delivery ratio, response time, and anomaly detection. These trust metrics are then aggregated by a central trust-based repository that uses inherent trust values to identify traffic patterns indicative of DDoS attacks. By integrating both trust scores and central trust-based outputs, the trust calculation is enhanced, ensuring that threats are accurately identified and addressed in real-time. The model demonstrated a significant improvement in detection accuracy, and a low false-positive rate with enhanced scalability and adaptability under TCP SYN, Ping Flood, and UDP Flood attacks. The results show that a trust-based approach provides an effective, lightweight alternative for securing resource-constrained business IoT environments.
要約:
This study explores the application of Answer Set Programming (ASP) for detecting anomalies in system logs, addressing the challenges posed by evolving cyber threats. We propose a novel framework that leverages ASP's declarative nature and logical reasoning capabilities to encode complex security rules as logical predicates. Our ASP-based system was applied to a real-world Linux system log dataset, demonstrating its effectiveness in identifying various anomalies such as potential brute-force attacks, privilege escalations, frequent network connections from specific IPs, and various system-level issues. Key findings highlight ASP's strengths in handling structured log data, rule flexibility, and event correlation. The approach shows promise in providing explainable alerts from real-world data. This research contributes to computer forensics by demonstrating a logic-based paradigm for log analysis on a practical dataset, opening avenues for more nuanced and adaptive cyber intelligence systems.
要約:
Cyber-physical systems can be subject to information leakage; in the presence of continuous variables such as time and energy, these leaks can be subtle to detect. We study here the verification of opacity problems over systems with observation over both timing and energy information. We introduce guarded multi-energy timed automata as an extension of timed automata with multiple energy variables and guards over such variables. Despite undecidability of this general formalism, we establish positive results over a number of subclasses, notably when the attacker observes the final energy and/or the execution time, but also when they have access to the value of the energy variables every time unit.
要約:
Personal language model-based agents are becoming more widespread for completing tasks on behalf of users; however, this raises serious privacy questions regarding whether these models will appropriately disclose user data. While prior work has evaluated language models on data-sharing scenarios based on general privacy norms, we focus on personalizing language models' privacy decisions, grounding their judgments directly in prior user privacy decisions. Our findings suggest that general privacy norms are insufficient for effective personalization of privacy decisions. Furthermore, we find that eliciting privacy judgments from the model through In-context Learning (ICL) is unreliable to due misalignment with the user's prior privacy judgments and opaque reasoning traces, which make it difficult for the user to interpret the reasoning behind the model's decisions. To address these limitations, we propose ARIEL (Agentic Reasoning with Individualized Entailment Logic), a framework that jointly leverages a language model and rule-based logic for structured data-sharing reasoning. ARIEL is based on formulating personalization of data sharing as an entailment, whether a prior user judgment on a data-sharing request implies the same judgment for an incoming request. Our experimental evaluations on advanced models and publicly-available datasets demonstrate that ARIEL can reduce the F1 score error by $\textbf{39.1%}$ over language model-based reasoning (ICL), demonstrating that ARIEL is effective at correctly judging requests where the user would approve data sharing. Overall, our findings suggest that combining LLMs with strict logical entailment is a highly effective strategy for enabling personalized privacy judgments for agents.
要約:
Few-shot prompting has emerged as a practical alternative to fine-tuning for leveraging the capabilities of large language models (LLMs) in specialized tasks. However, its effectiveness depends heavily on the selection and quality of in-context examples, particularly in complex domains. In this work, we examine retrieval-augmented prompting as a strategy to improve few-shot performance in code vulnerability detection, where the goal is to identify one or more security-relevant weaknesses present in a given code snippet from a predefined set of vulnerability categories. We perform a systematic evaluation using the Gemini-1.5-Flash model across three approaches: (1) standard few-shot prompting with randomly selected examples, (2) retrieval-augmented prompting using semantically similar examples, and (3) retrieval-based labeling, which assigns labels based on retrieved examples without model inference. Our results show that retrieval-augmented prompting consistently outperforms the other prompting strategies. At 20 shots, it achieves an F1 score of 74.05% and a partial match accuracy of 83.90%. We further compare this approach against zero-shot prompting and several fine-tuned models, including Gemini-1.5-Flash and smaller open-source models such as DistilBERT, DistilGPT2, and CodeBERT. Retrieval-augmented prompting outperforms both zero-shot (F1 score: 36.35%, partial match accuracy: 20.30%) and fine-tuned Gemini (F1 score: 59.31%, partial match accuracy: 53.10%), while avoiding the training time and cost associated with model fine-tuning. On the other hand, fine-tuning CodeBERT yields higher performance (F1 score: 91.22%, partial match accuracy: 91.30%) but requires additional training, maintenance effort, and resources.
要約:
This paper proposes using the Linux kernel ftrace framework, particularly the function graph tracer, to generate informative system level data for machine learning (ML) applications. Experiments on a real world encryption detection task demonstrate the efficacy of the proposed features across several learning algorithms. The learner faces the problem of detecting encryption activities across a large dataset of files, using function call traces and graph based features. Empirical results highlight an outstanding accuracy of 99.28 on the task at hand, underscoring the efficacy of features derived from the function graph tracer. The results were further validated in an additional experiment targeting a multilabel classification problem, in which running programs were identified from trace data. This work provides comprehensive methodologies for preprocessing raw trace data and extracting graph based features, offering significant advancements in applying ML to system behavior analysis, program identification, and anomaly detection. By bridging the gap between system tracing and ML, this paper paves the way for innovative solutions in performance monitoring and security analytics.
要約:
AI agent-based systems are becoming increasingly integral to modern software architectures, enabling autonomous decision-making, dynamic task execution, and multimodal interactions through large language models (LLMs). However, these systems introduce novel and evolving security challenges, including prompt injection attacks, context poisoning, model manipulation, and opaque agent-to-agent communication, that are not effectively captured by traditional threat modeling frameworks. In this paper, we introduce ASTRIDE, an automated threat modeling platform purpose-built for AI agent-based systems. ASTRIDE extends the classical STRIDE framework by introducing a new threat category, A for AI Agent-Specific Attacks, which encompasses emerging vulnerabilities such as prompt injection, unsafe tool invocation, and reasoning subversion, unique to agent-based applications. To automate threat modeling, ASTRIDE combines a consortium of fine-tuned vision-language models (VLMs) with the OpenAI-gpt-oss reasoning LLM to perform end-to-end analysis directly from visual agent architecture diagrams, such as data flow diagrams(DFDs). LLM agents orchestrate the end-to-end threat modeling automation process by coordinating interactions between the VLM consortium and the reasoning LLM. Our evaluations demonstrate that ASTRIDE provides accurate, scalable, and explainable threat modeling for next-generation intelligent systems. To the best of our knowledge, ASTRIDE is the first framework to both extend STRIDE with AI-specific threats and integrate fine-tuned VLMs with a reasoning LLM to fully automate diagram-driven threat modeling in AI agent-based applications.
要約:
Unsupervised anomaly-based intrusion detection requires models that can generalize to attack patterns not observed during training. This work presents the first large-scale evaluation of hybrid quantum-classical (HQC) autoencoders for this task. We construct a unified experimental framework that iterates over key quantum design choices, including quantum-layer placement, measurement approach, variational and non-variational formulations, and latent-space regularization. Experiments across three benchmark NIDS datasets show that HQC autoencoders can match or exceed classical performance in their best configurations, although they exhibit higher sensitivity to architectural decisions. Under zero-day evaluation, well-configured HQC models provide stronger and more stable generalization than classical and supervised baselines. Simulated gate-noise experiments reveal early performance degradation, indicating the need for noise-aware HQC designs. These results provide the first data-driven characterization of HQC autoencoder behavior for network intrusion detection and outline key factors that govern their practical viability. All experiment code and configurations are available at https://github.com/arasyi/hqcae-network-intrusion-detection.
要約:
Backdoor attacks are emerging threats to deep neural networks, which typically embed malicious behaviors into a victim model by injecting poisoned samples. Adversaries can activate the injected backdoor during inference by presenting the trigger on input images. Prior defensive methods have achieved remarkable success in countering dirty-label backdoor attacks where the labels of poisoned samples are often mislabeled. However, these approaches do not work for a recent new type of backdoor -- clean-label backdoor attacks that imperceptibly modify poisoned data and hold consistent labels. More complex and powerful algorithms are demanded to defend against such stealthy attacks. In this paper, we propose UltraClean, a general framework that simplifies the identification of poisoned samples and defends against both dirty-label and clean-label backdoor attacks. Given the fact that backdoor triggers introduce adversarial noise that intensifies in feed-forward propagation, UltraClean first generates two variants of training samples using off-the-shelf denoising functions. It then measures the susceptibility of training samples leveraging the error amplification effect in DNNs, which dilates the noise difference between the original image and denoised variants. Lastly, it filters out poisoned samples based on the susceptibility to thwart the backdoor implantation. Despite its simplicity, UltraClean achieves a superior detection rate across various datasets and significantly reduces the backdoor attack success rate while maintaining a decent model accuracy on clean data, outperforming existing defensive methods by a large margin. Code is available at https://github.com/bxz9200/UltraClean.
要約:
The advent of large-scale quantum computers implies that our existing public-key cryptography infrastructure has become insecure. That means that the privacy of many mobile applications involving dynamic peer groups, such as multicast messaging or pay-per-view, could be compromised. In this work we propose a generalization of the well known group key exchange protocol proposed by Burmester and Desmedt to the non-abelian case by the use of finite group actions and we prove that the presented protocol is secure in Katz and Yung's model.
要約:
The Operating System (OS) kernel is foundational in modern computing, especially with the proliferation of diverse computing devices. However, its development also comes with vulnerabilities that can lead to severe security breaches. Kernel fuzzing, a technique used to uncover these vulnerabilities, poses distinct challenges when compared to user-space fuzzing. These include the complexity of configuring the testing environment and addressing the statefulness inherent to both the kernel and the fuzzing process. Despite the significant interest from the community, a comprehensive understanding of kernel fuzzing remains lacking, hindering further progress in the field. In this paper, we present the first systematic study focused specifically on OS kernel fuzzing. We begin by outlining the unique challenges of kernel fuzzing, which distinguish it from those in user space. Following this, we summarize the progress of 107 academic studies from top-tier venues between 2017 and 2025. To structure this analysis, we introduce a stage-based fuzzing model and a novel fuzzing taxonomy that highlights nine core functionalities unique to kernel fuzzing. Each of these functionalities is examined in conjunction with the methodological approaches employed to address them. Finally, we identify remaining gaps in addressing challenges and outline promising directions to guide forthcoming research in kernel security.
要約:
AI agents can autonomously perform tasks and, often without explicit user consent, collect or disclose users' sensitive local data, which raises serious privacy concerns. Although AI agents' privacy policies describe their intended data practices, there remains limited transparency and accountability about whether runtime behavior matches those policies. To close this gap, we introduce AudAgent, a visual tool that continuously monitors AI agents' data practices in real time and guards compliance with stated privacy policies.
AudAgent consists of four components for automated privacy auditing of AI agents. (i) Policy formalization: a novel cross-LLM voting mechanism to guarantee confidence of the parsed privacy policy model. (ii) Runtime annotation: a lightweight Presidio-based analyzer detects sensitive data and annotates data practices based on the AI agent's context and the privacy policy model. (iii) Compliance auditing: ontology graphs and automata-based checking connect the privacy policy model with runtime annotations, enabling on-the-fly compliance checking. (iv) User interface: an infrastructure-independent implementation visualizes the real-time execution trace of AI agents along with potential privacy policy violations, providing user-friendly transparency and accountability.
We evaluate AudAgent with AI agents built using mainstream frameworks, demonstrating its effectiveness in detecting and visualizing privacy policy violations in real time. Using AudAgent, we also find that most privacy policies omit explicit safeguards for highly sensitive data such as SSNs, whose misuse violates legal requirements, and that many agents do not refuse handling such data via third-party tools, including those controlled by Claude, Gemini, and DeepSeek. AudAgent proactively blocks operations on such data, overriding the agents' original privacy policy and behavior.
要約:
Recently, web-based Large Language Model (LLM) agents autonomously perform increasingly complex tasks, thereby bringing significant convenience. However, they also amplify the risks of malicious misuse cases such as unauthorized collection of personally identifiable information (PII), generation of socially divisive content, and even automated web hacking. To address these threats, we propose an AI Kill Switch technique that can immediately halt the operation of malicious web-based LLM agents. To achieve this, we introduce AutoGuard - the key idea is generating defensive prompts that trigger the safety mechanisms of malicious LLM agents. In particular, generated defense prompts are transparently embedded into the website's DOM so that they remain invisible to human users but can be detected by the crawling process of malicious agents, triggering its internal safety mechanisms to abort malicious actions once read. To evaluate our approach, we constructed a dedicated benchmark consisting of three representative malicious scenarios. Experimental results show that AutoGuard achieves over 80% Defense Success Rate (DSR) across diverse malicious agents, including GPT-4o, Claude-4.5-Sonnet and generalizes well to advanced models like GPT-5.1, Gemini-2.5-flash, and Gemini-3-pro. Also, our approach demonstrates robust defense performance in real-world website environments without significant performance degradation for benign agents. Through this research, we demonstrate the controllability of web-based LLM agents, thereby contributing to the broader effort of AI control and safety.
要約:
Secure Delegated Quantum Computation (SDQC) protocols are a vital piece of the future quantum information processing global architecture since they allow end-users to perform their valuable computations on remote quantum servers without fear that a malicious quantum service provider or an eavesdropper might acquire some information about their data or algorithm. They also allow end-users to check that their computation has been performed as they have specified it.
However, existing protocols all have drawbacks that limit their usage in the real world. Most require the client to either operate a single-qubit source or perform single-qubit measurements, thus requiring them to still have some quantum technological capabilities albeit restricted, or require the server to perform operations which are hard to implement on real hardware (e.g isolate single photons from laser pulses and polarisation-preserving photon-number quantum non-demolition measurements). Others remove the need for quantum communications entirely but this comes at a cost in terms of security guarantees and memory overhead on the server's side.
We present an SDQC protocol which drastically reduces the technological requirements of both the client and the server while providing information-theoretic composable security. More precisely, the client only manipulates an attenuated laser pulse, while the server only handles interacting quantum emitters with a structure capable of generating spin-photon entanglement. The quantum emitter acts as both a converter from coherent laser pulses to polarisation-encoded qubits and an entanglement generator. Such devices have recently been used to demonstrate the largest entangled photonic state to date, thus hinting at the readiness of our protocol for experimental implementations.
要約:
Static analysis is a classical technique for improving software security and software quality in general. Fairly recently, a new static analyzer was implemented in the GNU Compiler Collection (GCC). The present paper uses the GCC's analyzer to empirically examine popular Linux packages. The dataset used is based on those packages in the Gentoo Linux distribution that are either written in C or contain C code. In total, 3,538 such packages are covered. According to the results, uninitialized variables and NULL pointer dereference issues are the most common problems according to the analyzer. Classical memory management issues are relatively rare. The warnings also follow a long-tailed probability distribution across the packages; a few packages are highly warning-prone, whereas no warnings are present for as much as 89% of the packages. Furthermore, the warnings do not vary across different application domains. With these results, the paper contributes to the domain of large-scale empirical research on software quality and security. In addition, a discussion is presented about practical implications of the results.
要約:
Centralization enhances the efficiency of Artificial Intelligence (AI) but also introduces critical challenges, including single points of failure, inherent biases, data privacy risks, and scalability limitations. To address these issues, blockchain-based Decentralized Artificial Intelligence (DeAI) has emerged as a promising paradigm that leverages decentralization and transparency to improve the trustworthiness of AI systems. Despite rapid adoption in industry, the academic community lacks a systematic analysis of DeAI's technical foundations, opportunities, and challenges. This work presents the first Systematization of Knowledge (SoK) on DeAI, offering a formal definition, a taxonomy of existing solutions based on the AI lifecycle, and an in-depth investigation of the roles of blockchain in enabling secure and incentive-compatible collaboration. We further review security risks across the DeAI lifecycle and empirically evaluate representative mitigation techniques. Finally, we highlight open research challenges and future directions for advancing blockchain-based DeAI.
要約:
Computing matchings in graphs is a foundational algorithmic task. Despite extensive interest in differentially private (DP) graph analysis, work on privately computing matching solutions, rather than just their size, has been sparse. The sole prior work in the standard model of pure $\varepsilon$-differential privacy, by Hsu, Huang, Roth, Roughgarden, and Wu [HHR+14, STOC'14], focused on allocations and was thus restricted to bipartite graphs. This paper presents a comprehensive study of differentially private algorithms for maximum matching and b-matching in general graphs, which also yields techniques that directly improve upon prior work in the bipartite setting. En route to solving these matching problems, we develop a set of novel techniques with broad applicability, including a new symmetry argument for DP lower bounds, the first private arboricity-based sparsifiers for node-DP, and the novel Public Vertex Subset Mechanism. We demonstrate the versatility of these tools by applying them to other DP problems, such as vertex cover [GLM+10, SODA'10], and beyond DP, such as low-sensitivity algorithms [VY23, SODA'21, SICOMP'23]. [...]
要約:
This study explores the use of Generative Adversarial Networks (GANs) to detect AI deepfakes and fraudulent activities in online payment systems. With the growing prevalence of deepfake technology, which can manipulate facial features in images and videos, the potential for fraud in online transactions has escalated. Traditional security systems struggle to identify these sophisticated forms of fraud. This research proposes a novel GAN-based model that enhances online payment security by identifying subtle manipulations in payment images. The model is trained on a dataset consisting of real-world online payment images and deepfake images generated using advanced GAN architectures, such as StyleGAN and DeepFake. The results demonstrate that the proposed model can accurately distinguish between legitimate transactions and deepfakes, achieving a high detection rate above 95%. This approach significantly improves the robustness of payment systems against AI-driven fraud. The paper contributes to the growing field of digital security, offering insights into the application of GANs for fraud detection in financial services. Keywords- Payment Security, Image Recognition, Generative Adversarial Networks, AI Deepfake, Fraudulent Activities
要約:
We introduce $\textbf{Doublespeak}$, a simple in-context representation hijacking attack against large language models (LLMs). The attack works by systematically replacing a harmful keyword (e.g., bomb) with a benign token (e.g., carrot) across multiple in-context examples, provided a prefix to a harmful request. We demonstrate that this substitution leads to the internal representation of the benign token converging toward that of the harmful one, effectively embedding the harmful semantics under a euphemism. As a result, superficially innocuous prompts (e.g., "How to build a carrot?") are internally interpreted as disallowed instructions (e.g., "How to build a bomb?"), thereby bypassing the model's safety alignment. We use interpretability tools to show that this semantic overwrite emerges layer by layer, with benign meanings in early layers converging into harmful semantics in later ones. Doublespeak is optimization-free, broadly transferable across model families, and achieves strong success rates on closed-source and open-source systems, reaching 74% ASR on Llama-3.3-70B-Instruct with a single-sentence context override. Our findings highlight a new attack surface in the latent space of LLMs, revealing that current alignment strategies are insufficient and should instead operate at the representation level.