要約:
Fact-checking systems with search-enabled large language models (LLMs) have shown strong potential for verifying claims by dynamically retrieving external evidence. However, the robustness of such systems against adversarial attack remains insufficiently understood. In this work, we study adversarial claim attacks against search-enabled LLM-based fact-checking systems under a realistic input-only threat model. We propose DECEIVE-AFC, an agent-based adversarial attack framework that integrates novel claim-level attack strategies and adversarial claim validity evaluation principles. DECEIVE-AFC systematically explores adversarial attack trajectories that disrupt search behavior, evidence retrieval, and LLM-based reasoning without relying on access to evidence sources or model internals. Extensive evaluations on benchmark datasets and real-world systems demonstrate that our attacks substantially degrade verification performance, reducing accuracy from 78.7% to 53.7%, and significantly outperform existing claim-based attack baselines with strong cross-system transferability.
要約:
For over a decade, cybersecurity has relied on human labor scarcity to limit attackers to high-value targets manually or generic automated attacks at scale. Building sophisticated exploits requires deep expertise and manual effort, leading defenders to assume adversaries cannot afford tailored attacks at scale. AI agents break this balance by automating vulnerability discovery and exploitation across thousands of targets, needing only small success rates to remain profitable. Current developers focus on preventing misuse through data filtering, safety alignment, and output guardrails. Such protections fail against adversaries who control open-weight models, bypass safety controls, or develop offensive capabilities independently. We argue that AI-agent-driven cyber attacks are inevitable, requiring a fundamental shift in defensive strategy. In this position paper, we identify why existing defenses cannot stop adaptive adversaries and demonstrate that defenders must develop offensive security intelligence. We propose three actions for building frontier offensive AI capabilities responsibly. First, construct comprehensive benchmarks covering the full attack lifecycle. Second, advance from workflow-based to trained agents for discovering in-wild vulnerabilities at scale. Third, implement governance restricting offensive agents to audited cyber ranges, staging release by capability tier, and distilling findings into safe defensive-only agents. We strongly recommend treating offensive AI capabilities as essential defensive infrastructure, as containing cybersecurity risks requires mastering them in controlled settings before adversaries do.
要約:
3D content acquisition and creation are expanding rapidly in the new era of machine learning and AI. 3D Gaussian Splatting (3DGS) has become a promising high-fidelity and real-time representation for 3D content. Similar to the initial wave of digital audio-visual content at the turn of the millennium, the demand for intellectual property protection is also increasing, since explicit and editable 3D parameterization makes unauthorized use and dissemination easier. In this position paper, we argue that effective progress in watermarking 3D assets requires articulated security objectives and realistic threat models, incorporating the lessons learned from digital audio-visual asset protection over the past decades. To address this gap in security specification and evaluation, we advocate a scenario-driven formulation, in which adversarial capabilities are formalized through a security model. Based on this formulation, we construct a reference framework that organizes existing methods and clarifies how specific design choices map to corresponding adversarial assumptions. Within this framework, we also examine a legacy spread-spectrum embedding scheme, characterizing its advantages and limitations and highlighting the important trade-offs it entails. Overall, this work aims to foster effective intellectual property protection for 3D assets.
要約:
Consent is an ethical cornerstone of clinical research and healthcare in general. Although the ethical principles of consent - providing information, ensuring comprehension, and ensuring voluntariness - are well-defined, the technological infrastructure remains outdated. Clinicians are responsible for obtaining informed consent from research subjects or patients, and for managing it before, during, and after clinical trials or care, which is a burden for them. The voluntary nature of participating in clinical research or undergoing medical treatment implies the need for a participant-centric consent management system. However, this is not reflected in most established systems. Not only do most healthcare information systems not follow a user-centric model, but they also create data silos, which significantly reduce the mobility of patient data between different healthcare institutions and impact personalized medicine. Furthermore, consent management tools are outdated. We propose ClinConNet (Clinical Consent Network), a platform that connects researchers and participants based on clinical research projects. ClinConNet is powered by a dynamic consent model based on blockchain and take advantage of dynamic consent interfaces, as well as blockchain and Self-Sovereign Identity systems. ClinConNet is user-centric and provides important privacy features for patients, such as unlinkability, confidentiality, and ownership of identity data. It is also compatible with the right to be forgotten, as defined in many personal data protection regulations, such as the GDPR. We provide a detailed privacy and security analysis in an adversarial model, as well as a Proof of Concept implementation with detailed performance measures that demonstrate the feasibility of our blockchain-based consent management system with a median end-to-end consent establishment time of under 200ms and a throughput of 250TPS.
要約:
Existing Byzantine robust aggregation mechanisms typically rely on fulldimensional gradi ent comparisons or pairwise distance computations, resulting in computational overhead that limits applicability in large scale and resource constrained federated systems. This paper proposes TinyGuard, a lightweight Byzantine defense that augments the standard FedAvg algorithm via statistical update f ingerprinting. Instead of operating directly on high-dimensional gradients, TinyGuard extracts compact statistical fingerprints cap turing key behavioral properties of client updates, including norm statistics, layer-wise ratios, sparsity measures, and low-order mo ments. Byzantine clients are identified by measuring robust sta tistical deviations in this low-dimensional fingerprint space with nd complexity, without modifying the underlying optimization procedure. Extensive experiments on MNIST, Fashion-MNIST, ViT-Lite, and ViT-Small with LoRA adapters demonstrate that TinyGuard pre serves FedAvg convergence in benign settings and achieves up to 95 percent accuracy under multiple Byzantine attack scenarios, including sign-flipping, scaling, noise injection, and label poisoning. Against adaptive white-box adversaries, Pareto frontier analysis across four orders of magnitude confirms that attackers cannot simultaneously evade detection and achieve effective poisoning, features we term statistical handcuffs. Ablation studies validate stable detection precision 0.8 across varying client counts (50-150), threshold parameters and extreme data heterogeneity . The proposed framework is architecture-agnostic and well-suited for federated fine-tuning of foundation models where traditional Byzantine defenses become impractical
要約:
The digitization of healthcare has generated massive volumes of Electronic Health Records (EHRs), offering unprecedented opportunities for training Artificial Intelligence (AI) models. However, stringent privacy regulations such as GDPR and HIPAA have created data silos that prevent centralized training. Federated Learning (FL) has emerged as a promising solution that enables collaborative model training without sharing raw patient data. Despite its potential, FL remains vulnerable to poisoning and Sybil attacks, in which malicious participants corrupt the global model or infiltrate the network using fake identities. While recent approaches integrate Blockchain technology for auditability, they predominantly rely on probabilistic reputation systems rather than robust cryptographic identity verification. This paper proposes a Trustworthy Blockchain-based Federated Learning (TBFL) framework integrating Self-Sovereign Identity (SSI) standards. By leveraging Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs), our architecture ensures only authenticated healthcare entities contribute to the global model. Through comprehensive evaluation using the MIMIC-IV dataset, we demonstrate that anchoring trust in cryptographic identity verification rather than behavioral patterns significantly mitigates security risks while maintaining clinical utility. Our results show the framework successfully neutralizes 100% of Sybil attacks, achieves robust predictive performance (AUC = 0.954, Recall = 0.890), and introduces negligible computational overhead (<0.12%). The approach provides a secure, scalable, and economically viable ecosystem for inter-institutional health data collaboration, with total operational costs of approximately $18 for 100 training rounds across multiple institutions.
要約:
The Uniform Resource Locator (URL), introduced in a connectivity-first era to define access and locate resources, remains historically limited, lacking future-proof mechanisms for security, trust, or resilience against fraud and abuse, despite the introduction of reactive protections like HTTPS during the cybersecurity era. In the current AI-first threatscape, deceptive URLs have reached unprecedented sophistication due to the widespread use of generative AI by cybercriminals and the AI-vs-AI arms race to produce context-aware phishing websites and URLs that are virtually indistinguishable to both users and traditional detection tools. Although AI-generated phishing accounted for a small fraction of filter-bypassing attacks in 2024, phishing volume has escalated over 4,000% since 2022, with nearly 50% more attacks evading detection. At the rate the threatscape is escalating, and phishing tactics are emerging faster than labeled data can be produced, zero-shot and few-shot learning with large language models (LLMs) offers a timely and adaptable solution, enabling generalization with minimal supervision. Given the critical importance of phishing URL detection in large-scale cybersecurity defense systems, we present a comprehensive benchmark of LLMs under a unified zero-shot and few-shot prompting framework and reveal operational trade-offs. Our evaluation uses a balanced dataset with consistent prompts, offering detailed analysis of performance, generalization, and model efficacy, quantified by accuracy, precision, recall, F1 score, AUROC, and AUPRC, to reflect both classification quality and practical utility in threat detection settings. We conclude few-shot prompting improves performance across multiple LLMs.
要約:
We propose Eidolon, a practical post-quantum signature scheme based on the NP-complete k-colorability problem. Our construction generalizes the Goldreich-Micali-Wigderson zero-knowledge protocol to arbitrary k >= 3, applies the Fiat-Shamir transform, and uses Merkle-tree commitments to compress signatures from O(tn) to O(t log n). Crucially, we generate hard instances via planted "quiet" colorings that preserve the statistical profile of random graphs. We present the first empirical security analysis of such a scheme against both classical solvers (ILP, DSatur) and a custom graph neural network (GNN) attacker. Experiments show that for n >= 60, neither approach recovers the secret coloring, demonstrating that well-engineered k-coloring instances can resist modern cryptanalysis, including machine learning. This revives combinatorial hardness as a credible foundation for post-quantum signatures.
要約:
Many Intelligent Transportation Systems (ITS) applications require strong privacy guarantees for both users and their data. Homomorphic encryption (HE) enables computation directly on encrypted messages and thus offers a compelling approach to privacy-preserving data processing in ITS. However, practical HE schemes incur substantial ciphertext expansion and communication overhead, which limits their suitability for time-critical transportation systems. Hybrid homomorphic encryption (HHE) addresses this challenge by combining a homomorphic encryption scheme with a symmetric cipher, enabling efficient encrypted computation while dramatically reducing communication cost. In this paper, we develop theoretical models of representative ITS applications that integrate HHE to protect sensitive vehicular data. We then perform a parameter-based evaluation of the HHE scheme Rubato to estimate ciphertext sizes and communication overhead under realistic ITS workloads. Our results show that HHE achieves orders-of-magnitude reductions in ciphertext size compared with conventional HE while maintaining cryptographic security, making it significantly more practical for latency-constrained ITS communication.
要約:
When creating public data products out of confidential datasets, inferential/posterior-based privacy definitions, such as Pufferfish, provide compelling privacy semantics for data with correlations. However, such privacy definitions are rarely used in practice because they do not always compose. For example, it is possible to design algorithms for these privacy definitions that have no leakage when run once but reveal the entire dataset when run more than once. We prove necessary and sufficient conditions that must be added to ensure linear composition for Pufferfish mechanisms, hence avoiding such privacy collapse. These extra conditions turn out to be differential privacy-style inequalities, indicating that achieving both the interpretable semantics of Pufferfish for correlated data and composition benefits requires adopting differentially private mechanisms to Pufferfish. We show that such translation is possible through a concept called the $(a,b)$-influence curve, and many existing differentially private algorithms can be translated with our framework into a composable Pufferfish algorithm. We illustrate the benefit of our new framework by designing composable Pufferfish algorithms for Markov chains that significantly outperform prior work.
要約:
Modern vehicles rely on electronic control units (ECUs) interconnected through the Controller Area Network (CAN), making in-vehicle communication a critical security concern. Machine learning (ML)-based intrusion detection systems (IDS) are increasingly deployed to protect CAN traffic, yet their robustness against adversarial manipulation remains largely unexplored. We present a systematic adversarial evaluation of CAN IDS using the ROAD dataset, comparing four shallow learning models with a deep neural network-based detector. Using protocol-compliant, payload-level perturbations generated via FGSM, BIM and PGD, we evaluate adversarial effects on both benign and malicious CAN frames. While all models achieve strong baseline performance under benign conditions, adversarial perturbations reveal substantial vulnerabilities. Although shallow and deep models are robust to false-alarm induction, with the deep neural network (DNN) performing best on benign traffic, all architectures suffer significant increases in missed attacks. Notably, under gradient-based attacks, the shallow model extra trees (ET) demonstrates improved robustness to missed-attack induction compared to the other models. Our results demonstrate that adversarial manipulation can simultaneously trigger false alarms and evade detection, underscoring the need for adversarial robustness evaluation in safety-critical automotive IDS.
要約:
Evaluating and improving the security capabilities of code agents requires high-quality, executable vulnerability tasks. However, existing works rely on costly, unscalable manual reproduction and suffer from outdated data distributions. To address these, we present CVE-Factory, the first multi-agent framework to achieve expert-level quality in automatically transforming sparse CVE metadata into fully executable agentic tasks. Cross-validation against human expert reproductions shows that CVE-Factory achieves 95\% solution correctness and 96\% environment fidelity, confirming its expert-level quality. It is also evaluated on the latest realistic vulnerabilities and achieves a 66.2\% verified success. This automation enables two downstream contributions. First, we construct LiveCVEBench, a continuously updated benchmark of 190 tasks spanning 14 languages and 153 repositories that captures emerging threats including AI-tooling vulnerabilities. Second, we synthesize over 1,000 executable training environments, the first large-scale scaling of agentic tasks in code security. Fine-tuned Qwen3-32B improves from 5.3\% to 35.8\% on LiveCVEBench, surpassing Claude 4.5 Sonnet, with gains generalizing to Terminal Bench (12.5\% to 31.3\%). We open-source CVE-Factory, LiveCVEBench, Abacus-cve (fine-tuned model), training dataset, and leaderboard. All resources are available at https://github.com/livecvebench/CVE-Factory .
要約:
Deep neural networks (DNNs) have achieved remarkable success in radio frequency (RF) fingerprinting for wireless device authentication. However, their practical deployment faces two major limitations: domain shift, where models trained in one environment struggle to generalize to others, and the black-box nature of DNNs, which limits interpretability. To address these issues, we propose a novel framework that integrates a group of variable-length two-dimensional (2D) shapelets with a pre-trained large language model (LLM) to achieve efficient, interpretable, and generalizable RF fingerprinting. The 2D shapelets explicitly capture diverse local temporal patterns across the in-phase and quadrature (I/Q) components, providing compact and interpretable representations. Complementarily, the pre-trained LLM captures more long-range dependencies and global contextual information, enabling strong generalization with minimal training overhead. Moreover, our framework also supports prototype generation for few-shot inference, enhancing cross-domain performance without additional retraining. To evaluate the effectiveness of our proposed method, we conduct extensive experiments on six datasets across various protocols and domains. The results show that our method achieves superior standard and few-shot performance across both source and unseen domains.
要約:
The widespread adoption of Vision Transformers (ViTs) elevates supply-chain risk on third-party model hubs, where an adversary can implant backdoors into released checkpoints. Existing ViT backdoor attacks largely rely on poisoned-data training, while prior data-free attempts typically require synthetic-data fine-tuning or extra model components. This paper introduces Data-Free Logic-Gated Backdoor Attacks (DF-LoGiT), a truly data-free backdoor attack on ViTs via direct weight editing. DF-LoGiT exploits ViT's native multi-head architecture to realize a logic-gated compositional trigger, enabling a stealthy and effective backdoor. We validate its effectiveness through theoretical analysis and extensive experiments, showing that DF-LoGiT achieves near-100% attack success with negligible degradation in benign accuracy and remains robust against representative classical and ViT-specific defenses.
要約:
Detecting whether a model has been poisoned is a longstanding problem in AI security. In this work, we present a practical scanner for identifying sleeper agent-style backdoors in causal language models. Our approach relies on two key findings: first, sleeper agents tend to memorize poisoning data, making it possible to leak backdoor examples using memory extraction techniques. Second, poisoned LLMs exhibit distinctive patterns in their output distributions and attention heads when backdoor triggers are present in the input. Guided by these observations, we develop a scalable backdoor scanning methodology that assumes no prior knowledge of the trigger or target behavior and requires only inference operations. Our scanner integrates naturally into broader defensive strategies and does not alter model performance. We show that our method recovers working triggers across multiple backdoor scenarios and a broad range of models and fine-tuning methods.
要約:
AI agents that autonomously interact with external tools and environments show great promise across real-world applications. However, the external data which agent consumes also leads to the risk of indirect prompt injection attacks, where malicious instructions embedded in third-party content hijack agent behavior. Guided by benchmarks, such as AgentDojo, there has been significant amount of progress in developing defense against the said attacks. As the technology continues to mature, and that agents are increasingly being relied upon for more complex tasks, there is increasing pressing need to also evolve the benchmark to reflect threat landscape faced by emerging agentic systems. In this work, we reveal three fundamental flaws in current benchmarks and push the frontier along these dimensions: (i) lack of dynamic open-ended tasks, (ii) lack of helpful instructions, and (iii) simplistic user tasks. To bridge this gap, we introduce AgentDyn, a manually designed benchmark featuring 60 challenging open-ended tasks and 560 injection test cases across Shopping, GitHub, and Daily Life. Unlike prior static benchmarks, AgentDyn requires dynamic planning and incorporates helpful third-party instructions. Our evaluation of ten state-of-the-art defenses suggests that almost all existing defenses are either not secure enough or suffer from significant over-defense, revealing that existing defenses are still far from real-world deployment. Our benchmark is available at https://github.com/leolee99/AgentDyn.
要約:
Cyber insurance, which protects insured organizations against financial losses from cyberattacks and data breaches, can be difficult and expensive to obtain for many organizations. These difficulties stem from insurers difficulty in understanding and accurately assessing the risks that they are undertaking. Cybersecurity audits, which are already implemented in many organizations for compliance and other purposes, present a potential solution to this challenge. This paper provides a structured review and analysis of prior work in this area, analysis of the challenges and potential benefits that cyber audits provide and recommendations for the use of cyber audits to reduce cyber insurance costs and improve its availability.
要約:
Business logic vulnerabilities have become one of the most damaging yet least understood classes of smart contract vulnerabilities. Unlike traditional bugs such as reentrancy or arithmetic errors, these vulnerabilities arise from missing or incorrectly enforced business invariants and are tightly coupled with protocol semantics. Existing static analysis techniques struggle to capture such high-level logic, while recent large language model based approaches often suffer from unstable outputs and low accuracy due to hallucination and limited verification.
In this paper, we propose LogicScan, an automated contrastive auditing framework for detecting business logic vulnerabilities in smart contracts. The key insight behind LogicScan is that mature, widely deployed on-chain protocols implicitly encode well-tested and consensus-driven business invariants. LogicScan systematically mines these invariants from large-scale on-chain contracts and reuses them as reference constraints to audit target contracts. To achieve this, LogicScan introduces a Business Specification Language (BSL) to normalize diverse implementation patterns into structured, verifiable logic representations. It further combines noise-aware logic aggregation with contrastive auditing to identify missing or weakly enforced invariants while mitigating LLM-induced false positives.
We evaluate LogicScan on three real-world datasets, including DeFiHacks, Web3Bugs, and a set of top-200 audited contracts. The results show that LogicScan achieves an F1 score of 85.2%, significantly outperforming state-of-the-art tools while maintaining a low false-positive rate on production-grade contracts. Additional experiments demonstrate that LogicScan maintains consistent performance across different LLMs and is cost-effective, and that its false-positive suppression mechanisms substantially improve robustness.
要約:
Spiking neural networks (SNNs) compute with discrete spikes and exploit temporal structure, yet most adversarial attacks change intensities or event counts instead of timing. We study a timing-only adversary that retimes existing spikes while preserving spike counts and amplitudes in event-driven SNNs, thus remaining rate-preserving. We formalize a capacity-1 spike-retiming threat model with a unified trio of budgets: per-spike jitter $\mathcal{B}_{\infty}$, total delay $\mathcal{B}_{1}$, and tamper count $\mathcal{B}_{0}$. Feasible adversarial examples must satisfy timeline consistency and non-overlap, which makes the search space discrete and constrained. To optimize such retimings at scale, we use projected-in-the-loop (PIL) optimization: shift-probability logits yield a differentiable soft retiming for backpropagation, and a strict projection in the forward pass produces a feasible discrete schedule that satisfies capacity-1, non-overlap, and the chosen budget at every step. The objective maximizes task loss on the projected input and adds a capacity regularizer together with budget-aware penalties, which stabilizes gradients and aligns optimization with evaluation. Across event-driven benchmarks (CIFAR10-DVS, DVS-Gesture, N-MNIST) and diverse SNN architectures, we evaluate under binary and integer event grids and a range of retiming budgets, and also test models trained with timing-aware adversarial training designed to counter timing-only attacks. For example, on DVS-Gesture the attack attains high success (over $90\%$) while touching fewer than $2\%$ of spikes under $\mathcal{B}_{0}$. Taken together, our results show that spike retiming is a practical and stealthy attack surface that current defenses struggle to counter, providing a clear reference for temporal robustness in event-driven SNNs. Code is available at https://github.com/yuyi-sd/Spike-Retiming-Attacks.
要約:
We present GuardReasoner-Omni, a reasoning-based guardrail model designed to moderate text, image, and video data. First, we construct a comprehensive training corpus comprising 148k samples spanning these three modalities. Our training pipeline follows a two-stage paradigm to incentivize the model to deliberate before making decisions: (1) conducting SFT to cold-start the model with explicit reasoning capabilities and structural adherence; and (2) performing RL, incorporating an error-driven exploration reward to incentivize deeper reasoning on hard samples. We release a suite of models scaled at 2B and 4B parameters. Extensive experiments demonstrate that GuardReasoner-Omni achieves superior performance compared to existing state-of-the-art baselines across various guardrail benchmarks. Notably, GuardReasoner-Omni (2B) significantly surpasses the runner-up by 5.3% F1 score.
要約:
To ensure the responsible distribution and use of open-source deep neural networks (DNNs), DNN watermarking has become a crucial technique to trace and verify unauthorized model replication or misuse. In practice, black-box watermarks manifest as specific predictive behaviors for specially crafted samples. However, due to the generalization nature of DNNs, the keys to extracting the watermark message are not unique, which would provide attackers with more opportunities. Advanced attack techniques can reverse-engineer approximate replacements for the original watermark keys, enabling subsequent watermark removal. In this paper, we explore black-box DNN watermarking specificity, which refers to the accuracy of a watermark's response to a key. Using this concept, we introduce Specificity-Enhanced Watermarking (SEW), a new method that improves specificity by reducing the association between the watermark and approximate keys. Through extensive evaluation using three popular watermarking benchmarks, we validate that enhancing specificity significantly contributes to strengthening robustness against removal attacks. SEW effectively defends against six state-of-the-art removal attacks, while maintaining model usability and watermark verification performance.
要約:
The proliferation of generative AI poses challenges for information integrity assurance, requiring systems that connect model governance with end-user verification. We present Origin Lens, a privacy-first mobile framework that targets visual disinformation through a layered verification architecture. Unlike server-side detection systems, Origin Lens performs cryptographic image provenance verification and AI detection locally on the device via a Rust/Flutter hybrid architecture. Our system integrates multiple signals - including cryptographic provenance, generative model fingerprints, and optional retrieval-augmented verification - to provide users with graded confidence indicators at the point of consumption. We discuss the framework's alignment with regulatory requirements (EU AI Act, DSA) and its role in verification infrastructure that complements platform-level mechanisms.
要約:
Static Analysis Tools (SATs) are central to security engineering activities, as they enable early identification of code weaknesses without requiring execution. However, their effectiveness is often limited by high false-positive rates and incomplete coverage of vulnerability classes. At the same time, developers frequently document security-related shortcuts and compromises as Self-Admitted Technical Debt (SATD) in software artifacts, such as code comments. While prior work has recognized SATD as a rich source of security information, it remains unclear whether -and in what ways- it is utilized during SAT-aided security analysis. OBJECTIVE: This work investigates the extent to which security-related SATD complements the output produced by SATs and helps bridge some of their well-known limitations. METHOD: We followed a mixed-methods approach consisting of (i) the analysis of a SATD-annotated vulnerability dataset using three state-of-the-art SATs and (ii) an online survey with 72 security practitioners. RESULTS: The combined use of all SATs flagged 114 of the 135 security-related SATD instances, spanning 24 distinct Common Weakness Enumeration (CWE) identifiers. A manual mapping of the SATD comments revealed 33 unique CWE types, 6 of which correspond to categories that SATs commonly overlook or struggle to detect (e.g., race conditions). Survey responses further suggest that developers frequently pair SAT outputs with SATD insights to better understand the impact and root causes of security weaknesses and to identify suitable fixes. IMPLICATIONS: Our findings show that such SATD-encoded information can be a meaningful complement to SAT-driven security analysis, while helping to overcome some of SATs' practical shortcomings.
要約:
Malware detection and classification into families are critical tasks in cybersecurity, complicated by the continual evolution of malware to evade detection. This evolution introduces concept drift, in which the statistical properties of malware features change over time, reducing the effectiveness of static machine learning models. Understanding and explaining this drift is essential for maintaining robust and trustworthy malware detectors. In this paper, we propose an interpretable approach to concept drift detection. Our method uses a rule-based classifier to generate human-readable descriptions of both original and evolved malware samples belonging to the same malware family. By comparing the resulting rule sets using a similarity function, we can detect and quantify concept drift. Crucially, this comparison also identifies the specific features and feature values that have changed, providing clear explanations of how malware has evolved to bypass detection. Experimental results demonstrate that the proposed method not only accurately detects drift but also provides actionable insights into the behavior of evolving malware families, supporting both detection and threat analysis.
要約:
The Model Context Protocol (MCP) enables large language models to invoke external tools through natural-language descriptions, forming the foundation of many AI agent applications. However, MCP does not enforce consistency between documented tool behavior and actual code execution, even though MCP Servers often run with broad system privileges. This gap introduces a largely unexplored security risk. We study how mismatches between externally presented tool descriptions and underlying implementations systematically shape the mental models and decision-making behavior of intelligent agents. Specifically, we present the first large-scale study of description-code inconsistency in the MCP ecosystem. We design an automated static analysis framework and apply it to 10,240 real-world MCP Servers across 36 categories. Our results show that while most servers are highly consistent, approximately 13% exhibit substantial mismatches that can enable undocumented privileged operations, hidden state mutations, or unauthorized financial actions. We further observe systematic differences across application categories, popularity levels, and MCP marketplaces. Our findings demonstrate that description-code inconsistency is a concrete and prevalent attack surface in MCP-based AI agents, and motivate the need for systematic auditing and stronger transparency guarantees in future agent ecosystems.
要約:
We investigated the capabilities of GPT-4o and Gemini 2.0 Flash for secure Infrastructure as Code (IaC) development. For security smell detection, on the Stack Overflow dataset, which primarily contains small, simplified code snippets, the models detected at least 71% of security smells when prompted to analyze code from a security perspective (general prompt). With a guided prompt (adding clear, step-by-step instructions), this increased to 78%.In GitHub repositories, which contain complete, real-world project scripts, a general prompt was less effective, leaving more than half of the smells undetected. However, with the guided prompt, the models uncovered at least 67% of the smells. For secure code generation, we prompted LLMs with 89 vulnerable synthetic scenarios and observed that only 7% of the generated scripts were secure. Adding an explicit instruction to generate secure code increased GPT secure output rate to 17%, while Gemini changed little (8%). These results highlight the need for further research to improve LLMs' capabilities in assisting developers with secure IaC development.
要約:
Hardware Trojans (HTs) threaten the trust and reliability of integrated circuits (ICs), particularly when triggered HTs remain dormant during standard testing and activate only under rare conditions. Existing electromagnetic (EM) side-channel-based detection techniques often rely on golden references or labeled data, which are infeasible in modern distributed manufacturing. This paper introduces a reference-free, design-agnostic framework for detecting triggered HTs directly from post-silicon EM emissions. The proposed flow converts each EM trace into a time-frequency scalogram using Continuous Wavelet Transform (CWT), extracts discriminative features through a convolutional neural network (CNN), reduces dimensionality with principal component analysis (PCA), and applies Bayesian Gaussian Mixture Modeling (BGMM) for unsupervised probabilistic clustering. The framework quantifies detection confidence using posterior-based metrics (alpha_{post}, beta_{post}), Bayesian information criterion (Delta BIC), and Mahalanobis cluster separation (D), enabling interpretable anomaly decisions without golden data. Experimental validation on AES-128 designs embedded with four different HTs demonstrates high separability between HT-free and HT-activated conditions and robustness to PCA variance thresholds. The results highlight the method's scalability, statistical interpretability, and potential for extension to runtime and in-field HT monitoring in trusted microelectronics.
要約:
Everyday services of society increasingly rely on mobile applications, resulting in a conflicting situation between the possibility of participation on the one side and user privacy and digital freedom on the other. In order to protect users' rights to informational self-determination, regulatory approaches for the collection and processing of personal data have been developed, such as the EU's GDPR. However, inspecting the compliance of mobile apps with privacy regulations remains difficult. Thus, in order to enable end users and enforcement bodies to verify and enforce data protection compliance, we propose mopri, a conceptual framework designed for analyzing the behavior of mobile apps through a comprehensive, adaptable, and user-centered approach. Recognizing the gaps in existing frameworks, mopri serves as a foundation for integrating various analysis tools into a streamlined, modular pipeline that employs static and dynamic analysis methods. Building on this concept, a prototype has been developed which effectively extracts permissions and tracking libraries while employing robust methods for dynamic traffic recording and decryption. Additionally, it incorporates result enrichment and reporting features that enhance the clarity and usability of the analysis outcomes. The prototype showcases the feasibility of a holistic and modular approach to privacy analysis, emphasizing the importance of continuous adaptation to the evolving challenges presented by the mobile app ecosystem.
要約:
Prompt injection attacks manipulate webpage content to cause web agents to execute attacker-specified tasks instead of the user's intended ones. Existing methods for detecting and localizing such attacks achieve limited effectiveness, as their underlying assumptions often do not hold in the web-agent setting. In this work, we propose WebSentinel, a two-step approach for detecting and localizing prompt injection attacks in webpages. Given a webpage, Step I extracts \emph{segments of interest} that may be contaminated, and Step II evaluates each segment by checking its consistency with the webpage content as context. We show that WebSentinel is highly effective, substantially outperforming baseline methods across multiple datasets of both contaminated and clean webpages that we collected. Our code is available at: https://github.com/wxl-lxw/WebSentinel.
要約:
Training AI models in cybersecurity with help of vast datasets offers significant opportunities to mimic real-world behaviors effectively. However, challenges like data drift and scarcity of labelled data lead to frequent updates of models and the risk of overfitting. To address these challenges, we used parameter-efficient fine-tuning techniques for pre-trained language models wherein we combine compacters with various layer freezing strategies. To enhance the capabilities of these pre-trained language models, in this work we introduce two strategies that use large language models. In the first strategy, we utilize large language models as data-labelling tools wherein they generate labels for unlabeled data. In the second strategy, large language modes are utilized as fallback mechanisms for predictions having low confidence scores. We perform comprehensive experimental analysis on the proposed strategies on different downstream tasks specific to cybersecurity domain. We empirically demonstrate that by combining parameter-efficient pre-trained models with large language models, we can improve the reliability and robustness of models, making them more suitable for real-world cybersecurity applications.
要約:
The proliferation of AI-assisted "vibe coding" enables rapid software development but introduces significant security risks, as Large Language Models (LLMs) prioritize functional correctness over security. We present Constitutional Spec-Driven Development, a methodology that embeds non-negotiable security principles into the specification layer, ensuring AI-generated code adheres to security requirements by construction rather than inspection. Our approach introduces a Constitution: a versioned, machine-readable document encoding security constraints derived from Common Weakness Enumeration (CWE)/MITRE Top 25 vulnerabilities and regulatory frameworks. We demonstrate the methodology through a banking microservices application, selected as a representative example domain due to its stringent regulatory and security requirements, implementing customer management, account operations, and transaction processing. The methodology itself is domain-agnostic. The implementation addresses 10 critical CWE vulnerabilities through constitutional constraints with full traceability from principles to code locations. Our case study shows that constitutional constraints reduce security defects by 73% compared to unconstrained AI generation while maintaining developer velocity. We contribute a formal framework for constitutional security, a complete development methodology, and empirical evidence that proactive security specification outperforms reactive security verification in AI-assisted development workflows.
要約:
Storage systems are fundamental to modern computing infrastructures, yet ensuring their correctness remains challenging in practice. Despite decades of research on system testing, many storage-system failures (including durability, ordering, recovery, and consistency violations) remain difficult to expose systematically. This difficulty stems not primarily from insufficient testing tooling, but from intrinsic properties of storage-system execution, including nondeterministic interleavings, long-horizon state evolution, and correctness semantics that span multiple layers and execution phases.
This survey adopts a storage-centric view of system testing and organizes existing techniques according to the execution properties and failure mechanisms they target. We review a broad spectrum of approaches, ranging from concurrency testing and long-running workloads to crash-consistency analysis, hardware-level semantic validation, and distributed fault injection, and analyze their fundamental strengths and limitations. Within this framework, we examine fuzzing as an automated testing paradigm, highlighting systematic mismatches between conventional fuzzing assumptions and storage-system semantics, and discuss how recent artificial intelligence advances may complement fuzzing through state-aware and semantic guidance. Overall, this survey provides a unified perspective on storage-system correctness testing and outlines key challenges
要約:
Large language models (LLMs) are known to exhibit brittle behavior under adversarial prompts and jailbreak attacks, even after extensive alignment and fine-tuning. This fragility reflects a broader challenge of modern neural language models: small, carefully structured perturbations in high-dimensional input spaces can induce large and unpredictable changes in internal semantic representations and output.
We investigate monotonicity as an architectural inductive bias for improving the robustness of Transformer-based language models. Monotonicity constrains semantic transformations so that strengthening information, evidence, or constraints cannot lead to regressions in the corresponding internal representations. Such order-preserving behavior has long been exploited in control and safety-critical systems to simplify reasoning and improve robustness, but has traditionally been viewed as incompatible with the expressivity required by neural language models.
We show that this trade-off is not inherent. By enforcing monotonicity selectively in the feed-forward sublayers of sequence-to-sequence Transformers -- while leaving attention mechanisms unconstrained -- we obtain monotone language models that preserve the performance of their pretrained counterparts. This architectural separation allows negation, contradiction, and contextual interactions to be introduced explicitly through attention, while ensuring that subsequent semantic refinement is order-preserving. Empirically, monotonicity substantially improves robustness: adversarial attack success rates drop from approximately 69% to 19%, while standard summarization performance degrades only marginally.
要約:
The design of protocols for local differential privacy (or LDP) has been a topic of considerable research interest in recent years. LDP protocols utilise the randomised encoding of outcomes of an experiment using a transition probability matrix (TPM). Several authors have observed that balanced incomplete block designs (BIBDs) provide nice examples of TPMs for LDP protocols. Indeed, it has been shown that such BIBD-based LDP protocols provide optimal estimators.
In this primarily expository paper, we give a detailed introduction to LDP protocols and their connections with block designs. We prove that a subclass of LDP protocols known as pure LDP protocols are equivalent to $(r,\lambda)$-designs (which contain balanced incomplete block designs as a special case). An unbiased estimator for an LDP scheme is a left inverse of the transition probability matrix. We show that the optimal estimators for BIBD-based TPMs are precisely those obtained from the Moore-Penrose inverse of the corresponding TPM. We also review some existing work on optimal LDP protocols in the context of pure protocols.
要約:
Research on differentially private synthetic tabular data has largely focused on independent and identically distributed rows where each record corresponds to a unique individual. This perspective neglects the temporal complexity in longitudinal datasets, such as electronic health records, where a user contributes an entire (sub) table of sequential events. While practitioners might attempt to model such data by flattening user histories into high-dimensional vectors for use with standard marginal-based mechanisms, we demonstrate that this strategy is insufficient. Flattening fails to preserve temporal coherence even when it maintains valid marginal distributions. We introduce PATH, a novel generative framework that treats the full table as the unit of synthesis and leverages the autoregressive capabilities of privately fine-tuned large language models. Extensive evaluations show that PATH effectively captures long-range dependencies that traditional methods miss. Empirically, our method reduces the distributional distance to real trajectories by over 60% and reduces state transition errors by nearly 50% compared to leading marginal mechanisms while achieving similar marginal fidelity.
要約:
Detecting rare and diverse anomalies in highly imbalanced datasets-such as Advanced Persistent Threats (APTs) in cybersecurity-remains a fundamental challenge for machine learning systems. Active learning offers a promising direction by strategically querying an oracle to minimize labeling effort, yet conventional approaches often fail to exploit the intrinsic geometric structure of the feature space for model refinement. In this paper, we introduce SDA2E, a Sparse Dual Adversarial Attention-based AutoEncoder designed to learn compact and discriminative latent representations from imbalanced, high-dimensional data. We further propose a similarity-guided active learning framework that integrates three novel strategies to refine decision boundaries efficiently: mormal-like expansion, which enriches the training set with points similar to labeled normals to improve reconstruction fidelity; anomaly-like prioritization, which boosts ranking accuracy by focusing on points resembling known anomalies; and a hybrid strategy that combines both for balanced model refinement and ranking. A key component of our framework is a new similarity measure, Normalized Matching 1s (SIM_NM1), tailored for sparse binary embeddings. We evaluate SDA2E extensively across 52 imbalanced datasets, including multiple DARPA Transparent Computing scenarios, and benchmark it against 15 state-of-the-art anomaly detection methods. Results demonstrate that SDA2E consistently achieves superior ranking performance (nDCG up to 1.0 in several cases) while reducing the required labeled data by up to 80% compared to passive training. Statistical tests confirm the significance of these improvements. Our work establishes a robust, efficient, and statistically validated framework for anomaly detection that is particularly suited to cybersecurity applications such as APT detection.
要約:
Advanced Persistent Threats (APTs) are sophisticated, long-term cyberattacks that are difficult to detect because they operate stealthily and often blend into normal system behavior. This paper presents a neuro-symbolic anomaly detection framework that combines a Graph Autoencoder (GAE) with rare pattern mining to identify APT-like activities in system-level provenance data. Our approach first constructs a process behavioral graph using k-Nearest Neighbors based on feature similarity, then learns normal relational structure using a Graph Autoencoder. Anomaly candidates are identified through deviations between observed and reconstructed graph structure. To further improve detection, we integrate an rare pattern mining module that discovers infrequent behavioral co-occurrences and uses them to boost anomaly scores for processes exhibiting rare signatures. We evaluate the proposed method on the DARPA Transparent Computing datasets and show that rare-pattern boosting yields substantial gains in anomaly ranking quality over the baseline GAE. Compared with existing unsupervised approaches on the same benchmark, our single unified model consistently outperforms individual context-based detectors and achieves performance competitive with ensemble aggregation methods that require multiple separate detectors. These results highlight the value of coupling graph-based representation learning with classical pattern mining to improve both effectiveness and interpretability in provenance-based security anomaly detection.
要約:
Quantum Machine Learning (QML) promises significant computational advantages, but preserving training data privacy remains challenging. Classical approaches like differentially private stochastic gradient descent (DP-SGD) add noise to gradients but fail to exploit the unique properties of quantum gradient estimation. In this work, we introduce the Differentially Private Parameter-Shift Rule (Q-ShiftDP), the first privacy mechanism tailored to QML. By leveraging the inherent boundedness and stochasticity of quantum gradients computed via the parameter-shift rule, Q-ShiftDP enables tighter sensitivity analysis and reduces noise requirements. We combine carefully calibrated Gaussian noise with intrinsic quantum noise to provide formal privacy and utility guarantees, and show that harnessing quantum noise further improves the privacy-utility trade-off. Experiments on benchmark datasets demonstrate that Q-ShiftDP consistently outperforms classical DP methods in QML.
要約:
This work investigates the fundamental limits of implementing network oblivious transfer via noisy multiple access channels and broadcast channels between honest-but-curious parties when the parties have access to general tripartite non-signaling correlations. By modeling the shared resource as an arbitrary tripartite non-signaling box, we obtain a unified perspective on both the channel behavior and the resulting correlations. Our main result demonstrates that perfect oblivious transfer is impossible. In the asymptotic regime, we further show that even negligible leakage cannot be achieved, as repeated use of the resource amplifies the receiver(s)'s ability to distinguish messages that were not intended for him/them. In contrast, the receiver(s)'s own privacy is not subject to a universal impossibility limitation.
要約:
Decentralized Pliable Index Coding (DPIC) problem addresses efficient information exchange in distributed systems where clients communicate among themselves without a central server. An important consideration in DPIC is the heterogeneity of side-information and demand sizes. Although many prior works assume homogeneous settings with identical side-information cardinality and single message demands, these assumptions limit real-world applicability where clients typically possess unequal amounts of prior information. In this paper, we study DPIC problem under heterogeneous side-information cardinalities. We propose a transmission scheme that coordinates client broadcasts to maximize coding efficiency while ensuring that each client achieves a common target level $T$. In addition, we impose a strict security constraint that no client acquires more than the target $T$ number of messages, guaranteeing that each client ends up with exactly $T$ messages. We analyze the communication cost incurred by the proposed scheme under this security constraint.
要約:
The imminent realization of fault-tolerant quantum computing precipitates a systemic collapse of classical public-key infrastructure and necessitates an urgent transition to post-quantum cryptography. However, current standardization efforts predominantly rely on structured mathematical problems that may remain vulnerable to unforeseen algorithmic breakthroughs, highlighting a critical need for fundamentally orthogonal security paradigms. Here, we introduce \emph{Crypto-ncRNA} as a biophysically inspired cryptographic primitive that exploits the thermodynamic complexity of non-coding RNA folding as a computational work-factor amplifier. By leveraging the rugged energy landscape inherent to RNA secondary structure prediction, a problem intractable to rapid inversion, we establish a security foundation independent of conventional number-theoretic assumptions. We validate this approach by mapping the folding problem to a Quadratic Unconstrained Binary Optimization model and demonstrate theoretical resilience against quantum optimization attacks including the Quantum Approximate Optimization Algorithm. Functioning as a symmetric key encapsulation and derivation primitive dependent on pre-shared seeds, Crypto-ncRNA achieves throughputs competitive with software-based Advanced Encryption Standard implementations. By utilizing the generated high-entropy keys within a standard stream cipher framework, it exhibits ciphertext entropy that satisfies rigorous NIST SP 800-22 statistical standards. These findings not only articulate a novel bio-computational pathway for cryptographic defense but also provide a rigorous algorithmic blueprint for future physical realization, demonstrating that the thermodynamic complexity of biological systems offers a robust and physically grounded frontier for securing digital infrastructure in the post-quantum era.
要約:
Jailbreak attacks to Large audio-language models (LALMs) are studied recently, but they exclusively focused on the attack scenario where the adversary can fully manipulate user prompts (named strong adversary) and limited in effectiveness, applicability, and practicability. In this work, we first conduct an extensive evaluation showing that advanced text jailbreak attacks cannot be easily ported to end-to-end LALMs via text-to-speech (TTS) techniques. We then propose AUDIOJAILBREAK, a novel audio jailbreak attack, featuring (1) asynchrony: the jailbreak audios do not need to align with user prompts in the time axis by crafting suffixal jailbreak audios; (2) universality: a single jailbreak perturbation is effective for different prompts by incorporating multiple prompts into the perturbation generation; (3) stealthiness: the malicious intent of jailbreak audios is concealed by proposing various intent concealment strategies; and (4) over-the-air robustness: the jailbreak audios remain effective when being played over the air by incorporating reverberation into the perturbation generation. In contrast, all prior audio jailbreak attacks cannot offer asynchrony, universality, stealthiness, and/or over-the-air robustness. Moreover, AUDIOJAILBREAK is also applicable to a more practical and broader attack scenario where the adversary cannot fully manipulate user prompts (named weak adversary). Extensive experiments with thus far the most LALMs demonstrate the high effectiveness of AUDIOJAILBREAK, in particular, it can jailbreak openAI's GPT-4o-Audio and bypass Meta's Llama-Guard-3 safeguard, in the weak adversary scenario. We highlight that our work peeks into the security implications of audio jailbreak attacks against LALMs, and realistically fosters improving their robustness, especially for the newly proposed weak adversary.
要約:
Large language models (LLMs) have demonstrated remarkable capabilities across a variety of domains. However, their applications in cryptography, which serve as a foundational pillar of cybersecurity, remain largely unexplored. To address this gap, we build \textbf{AICrypto}, a comprehensive benchmark designed to evaluate the cryptography capabilities of LLMs. The benchmark comprises 135 multiple-choice questions, 150 capture-the-flag challenges, and 30 proof problems, covering a broad range of skills from knowledge memorization to vulnerability exploitation and formal reasoning. All tasks are carefully reviewed or constructed by cryptography experts to improve correctness and rigor. For each proof problem, we provide detailed scoring rubrics and reference solutions that enable automated grading, achieving high correlation with human expert evaluations. We introduce strong human expert performance baselines for comparison across all task types. Our evaluation of 17 leading LLMs reveals that state-of-the-art models match or even surpass human experts in memorizing cryptographic concepts, exploiting common vulnerabilities, and routine proofs. However, our analysis reveals that they still lack a deep understanding of abstract mathematical concepts and struggle with tasks that require multi-step reasoning and dynamic analysis. We hope this work could provide insights for future research on LLMs in cryptographic applications. Our code and dataset are available at https://aicryptobench.github.io/.
要約:
We evaluate the performance of two architectures for network-wide quantum key distribution (QKD): Relayed QKD, which relays keys over multi-link QKD paths for non-adjacent nodes, and Switched QKD, which uses optical switches to dynamically connect arbitrary QKD modules to form direct QKD links between them. An advantage of Switched QKD is that it distributes quantum keys end-to-end, whereas Relayed relies on trusted nodes. However, Switched depends on arbitrary matching of QKD modules. We first experimentally evaluate the performance of commercial DV-QKD modules; for each of three vendors we benchmark the performance in standard/matched module pairs and in unmatched pairs to emulate configurations in the Switched QKD network architecture. The analysis reveals that in some cases a notable variation in the generated secret key rate (SKR) between the matched and unmatched pairs is observed. Driven by these experimental findings, we conduct a comprehensive theoretical analysis that evaluates the network-wide performance of the two architectures. Our analysis is based on uniform ring networks, where we derive optimal key management configurations and analytical formulas for the achievable consumed SKR. We compare network performance under varying ring sizes, QKD link losses, QKD receivers' sensitivity and performance penalties of unmatched modules. Our findings indicate that Switched QKD performs better in dense rings (short distances, large node counts), while Relayed QKD is more effective in longer distances and large node counts. Moreover, we confirm that unmatched QKD modules penalties significantly impact the efficiency of Switched QKD architecture.
要約:
The proliferation of powerful large language models (LLMs) has necessitated robust safety alignment, yet these models remain vulnerable to evolving adversarial attacks, including multi-turn jailbreaks that iteratively search for successful queries. Current defenses, which are primarily reactive and static, often fail to handle these iterative attacks. In this paper, we introduce ProAct, a novel proactive defense framework designed to disrupt and mislead these iterative search jailbreak methods. Our core idea is to intentionally mislead these jailbreak methods into thinking that the model has been jailbroken with "spurious responses". These misleading responses provide false signals to the attacker's internal optimization loop, causing the adversarial search to terminate prematurely and effectively jailbreaking the jailbreak. By conducting extensive experiments across state-of-the-art LLMs, jailbreaking frameworks, and safety benchmarks, we demonstrate that our method consistently and significantly reduces attack success rates by up to 94% without affecting utility. When combined with other defense fraeworks, it further reduces the latest attack strategies' success rate to 0%. ProActrepresents an orthogonal defense strategy that serves as an additional guardrail to enhance LLM safety against the most effective jailbreaking attacks.
著者: V\'ictor Mayoral-Vilches, Stefan Rass, Martin Pinzger, Endika Gil-Uriarte, Unai Ayucar-Carbajo, Jon Ander Ruiz-Alcalde, Maite del Mundo de Torres, Luis Javier Navarrete-Lozano, Mar\'ia Sanz-G\'omez, Francesco Balassone, Crist\'obal R. J. Veas-Chavez, Vanesa Turiel, Alfonso Glera-Pic\'on, Daniel S\'anchez-Prieto, Yuri Salvatierra, Paul Zabalegui-Landa, Ruffino Reydel Cabrera-\'Alvarez, Patxi Mayoral-Pizarroso
要約:
Cybersecurity superintelligence -- artificial intelligence exceeding the best human capability in both speed and strategic reasoning -- represents the next frontier in security. This paper documents the emergence of such capability through three major contributions that have pioneered the field of AI Security. First, PentestGPT (2023) established LLM-guided penetration testing, achieving 228.6% improvement over baseline models through an architecture that externalizes security expertise into natural language guidance. Second, Cybersecurity AI (CAI, 2025) demonstrated automated expert-level performance, operating 3,600x faster than humans while reducing costs 156-fold, validated through #1 rankings at international competitions including the $50,000 Neurogrid CTF prize. Third, Generative Cut-the-Rope (G-CTR, 2026) introduces a neurosymbolic architecture embedding game-theoretic reasoning into LLM-based agents: symbolic equilibrium computation augments neural inference, doubling success rates while reducing behavioral variance 5.2x and achieving 2:1 advantage over non-strategic AI in Attack & Defense scenarios. Together, these advances establish a clear progression from AI-guided humans to human-guided game-theoretic cybersecurity superintelligence.
要約:
Many safety-critical systems require timely processing of sensor inputs to avoid potential safety hazards. Additionally, to support useful application features, such systems increasingly have a large rich operating system (OS) at the cost of potential security bugs. Thus, if a malicious party gains supervisor privileges, they could cause real-world damage by denying service to time-sensitive programs. Many past approaches to this problem completely isolate time-sensitive programs with a hypervisor; however, this prevents the programs from accessing useful OS services. We introduce Ringmaster, a novel framework that enables enclaves or TEEs (Trusted Execution Environments) to asynchronously access rich, but potentially untrusted, OS services via Linux's io_uring. When service is denied by the untrusted OS, enclaves continue to operate on Ringmaster's minimal ARM TrustZone kernel with access to small, critical device drivers. This approach balances the need for secure, time-sensitive processing with the convenience of rich OS services. Additionally, Ringmaster supports large unmodified programs as enclaves, offering lower overhead compared to existing systems. We demonstrate how Ringmaster helps us build a working highly-secure system with minimal engineering. In our experiments with an unmanned aerial vehicle, Ringmaster achieved nearly 1GiB/sec of data into enclave on a Raspberry Pi4b, 0-3% throughput overhead compared to non-enclave tasks.
要約:
Large Vision-Language Models (VLMs) have achieved remarkable performance across a wide range of tasks. However, their deployment in safety-critical domains poses significant challenges. Existing safety fine-tuning methods, which focus on textual or multimodal content, fall short in addressing challenging cases or disrupt the balance between helpfulness and harmlessness. Our evaluation highlights a safety reasoning gap: these methods lack safety visual reasoning ability, leading to such bottlenecks. To address this limitation and enhance both visual perception and reasoning in safety-critical contexts, we propose a novel dataset that integrates multi-image inputs with safety Chain-of-Thought (CoT) labels as fine-grained reasoning logic to improve model performance. Specifically, we introduce the Multi-Image Safety (MIS) dataset, an instruction-following dataset tailored for multi-image safety scenarios, consisting of training and test splits. Our experiments demonstrate that fine-tuning InternVL2.5-8B with MIS significantly outperforms both powerful open-source models and API-based models in challenging multi-image tasks requiring safety-related visual reasoning. This approach not only delivers exceptional safety performance but also preserves general capabilities without any trade-offs. Specifically, fine-tuning with MIS increases average accuracy by 0.83% across five general benchmarks and reduces the Attack Success Rate (ASR) on multiple safety benchmarks by a large margin.
要約:
Most flagship language models generate explicit reasoning chains, enabling inference-time scaling. However, producing these reasoning chains increases token usage (i.e., reasoning tokens), which in turn increases latency and costs. Our OverThink attack increases overhead for applications that rely on reasoning language models (RLMs) and external context by forcing them to spend substantially more reasoning tokens while still producing contextually correct answers. An adversary mounts an attack by injecting decoy reasoning problems into public content that is consumed by RLM at inference time. Because our decoys (e.g., Markov decision processes, Sudokus, etc.) are benign, they evade safety filters. We evaluate OverThink on both closed-source and open-source reasoning models across the FreshQA, SQuAD, and MuSR datasets. We also explore the attack in multi-modal settings by creating images that cause excessive reasoning. We show that the resulting slowdown transfers across models. Finally, we explore both LLM-based and systems-level defenses, and discuss the societal, financial, and energy implications of the OverThink attacks.
要約:
Learning with relational and network-structured data is increasingly vital in sensitive domains where protecting the privacy of individual entities is paramount. Differential Privacy (DP) offers a principled approach for quantifying privacy risks, with DP-SGD emerging as a standard mechanism for private model training. However, directly applying DP-SGD to relational learning is challenging due to two key factors: (i) entities often participate in multiple relations, resulting in high and difficult-to-control sensitivity; and (ii) relational learning typically involves multi-stage, potentially coupled (interdependent) sampling procedures that make standard privacy amplification analyses inapplicable. This work presents a principled framework for relational learning with formal entity-level DP guarantees. We provide a rigorous sensitivity analysis and introduce an adaptive gradient clipping scheme that modulates clipping thresholds based on entity occurrence frequency. We also extend the privacy amplification results to a tractable subclass of coupled sampling, where the dependence arises only through sample sizes. These contributions lead to a tailored DP-SGD variant for relational data with provable privacy guarantees. Experiments on fine-tuning text encoders over text-attributed network-structured relational data demonstrate the strong utility-privacy trade-offs of our approach. Our code is available at https://github.com/Graph-COM/Node_DP.
要約:
We show that continual pretraining on plausible misinformation can overwrite specific factual knowledge in large language models without degrading overall performance. Unlike prior poisoning work under static pretraining, we study repeated exposure to counterfactual claims during continual updates. Using paired fact-counterfact items with graded poisoning ratios, we track how internal preferences between competing facts evolve across checkpoints, layers, and model scales. Even moderate poisoning (50-100%) flips over 55% of responses from correct to counterfactual while leaving ambiguity nearly unchanged. These belief flips emerge abruptly, concentrate in late layers (e.g., Layers 29-36 in 3B models), and are partially reversible via patching (up to 56.8%). The corrupted beliefs generalize beyond poisoned prompts, selectively degrading commonsense reasoning while leaving alignment benchmarks largely intact and transferring imperfectly across languages. These results expose a failure mode of continual pre-training in which targeted misinformation replaces internal factual representations without triggering broad performance collapse, motivating representation-level monitoring of factual integrity during model updates.
要約:
We demonstrate that charge teleportation serves as a superior observable for Quantum Energy Teleportation (QET)-based cryptographic primitives. While following the LOCC protocol structure of earlier proposals, we show that decoding key bits via local charge rather than energy provides exact bit symmetry and enhanced robustness: by Local Operations and Classical Communication (LOCC) on an entangled many-body ground state, Alice's one-bit choice steers the sign of a local charge shift at Bob, which directly encodes the key bit. Relative to energy teleportation schemes, the charge signal is bit-symmetric, measured in a single basis, and markedly more robust to realistic noise and model imperfections. We instantiate the protocol on transverse-field Ising models, star-coupled and one-dimensional chain, obtain closed-form results for two qubits, and for larger systems confirm performance via exact diagonalization, circuit-level simulations, and a proof-of-principle hardware run. We quantify resilience to classical bit flips and local quantum noise, identifying regimes where sign integrity, and hence key correctness, is preserved. These results position charge teleportation as a practical, low-rate QKD primitive compatible with near-term platforms.