要約:
As large language models (LLMs) are increasingly deployed in high-stakes domains, ensuring their security and alignment has become a critical challenge. Existing red-teaming practices depend heavily on manual testing, which limits scalability and fails to comprehensively cover the vast space of potential adversarial behaviors. This paper introduces an automated red-teaming framework that systematically generates, executes, and evaluates adversarial prompts to uncover security vulnerabilities in LLMs. Our framework integrates meta-prompting-based attack synthesis, multi-modal vulnerability detection, and standardized evaluation protocols spanning six major threat categories -- reward hacking, deceptive alignment, data exfiltration, sandbagging, inappropriate tool use, and chain-of-thought manipulation. Experiments on the GPT-OSS-20B model reveal 47 distinct vulnerabilities, including 21 high-severity and 12 novel attack patterns, achieving a $3.9\times$ improvement in vulnerability discovery rate over manual expert testing while maintaining 89\% detection accuracy. These results demonstrate the framework's effectiveness in enabling scalable, systematic, and reproducible AI safety evaluations. By providing actionable insights for improving alignment robustness, this work advances the state of automated LLM red-teaming and contributes to the broader goal of building secure and trustworthy AI systems.
要約:
Detecting business logic vulnerabilities is a critical challenge in software security. These flaws come from mistakes in an application's design or implementation and allow attackers to trigger unintended application behavior. Traditional fuzzing sanitizers for dynamic analysis excel at finding vulnerabilities related to memory safety violations but largely fail to detect business logic vulnerabilities, as these flaws require understanding application-specific semantic context. Recent attempts to infer this context, due to their reliance on heuristics and non-portable language features, are inherently brittle and incomplete. As business logic vulnerabilities constitute a majority (27/40) of the most dangerous software weaknesses in practice, this is a worrying blind spot of existing tools. In this paper, we tackle this challenge with ANOTA, a novel human-in-the-loop sanitizer framework. ANOTA introduces a lightweight, user-friendly annotation system that enables users to directly encode their domain-specific knowledge as lightweight annotations that define an application's intended behavior. A runtime execution monitor then observes program behavior, comparing it against the policies defined by the annotations, thereby identifying deviations that indicate vulnerabilities. To evaluate the effectiveness of ANOTA, we combine ANOTA with a state-of-the-art fuzzer and compare it against other popular bug finding methods compatible with the same targets. The results show that ANOTA+FUZZER outperforms them in terms of effectiveness. More specifically, ANOTA+FUZZER can successfully reproduce 43 known vulnerabilities, and discovered 22 previously unknown vulnerabilities (17 CVEs assigned) during the evaluation. These results demonstrate that ANOTA provides a practical and effective approach for uncovering complex business logic flaws often missed by traditional security testing techniques.
要約:
Radio frequency (RF) based systems are increasingly used to detect drones by analyzing their RF signal patterns, converting them into spectrogram images which are processed by object detection models. Existing RF attacks against image based models alter digital features, making over-the-air (OTA) implementation difficult due to the challenge of converting digital perturbations to transmittable waveforms that may introduce synchronization errors and interference, and encounter hardware limitations. We present the first physical attack on RF image based drone detectors, optimizing class-specific universal complex baseband (I/Q) perturbation waveforms that are transmitted alongside legitimate communications. We evaluated the attack using RF recordings and OTA experiments with four types of drones. Our results show that modest, structured I/Q perturbations are compatible with standard RF chains and reliably reduce target drone detection while preserving detection of legitimate drones.
要約:
While Ethereum has successfully achieved dynamic availability together with safety, a fundamental delay remains between transaction execution and immutable finality. In Ethereum's current Gasper protocol, this latency is on the order of 15 minutes, exposing the network to ex ante reorganization attacks, enabling MEV extraction, and limiting the efficiency of economic settlement. These limitations have motivated a growing body of work on Speedy Secure Finality (SSF), which aims to minimize confirmation latency without weakening formal security guarantees.
This paper surveys the state of the art in fast finality protocol design. We introduce the core theoretical primitives underlying this space, including reorganization resilience and the generalized sleepy model, and trace their development from Goldfish to RLMD-GHOST. We then analyze the communication and aggregation bottlenecks faced by single-slot finality protocols in large validator settings. Finally, we survey the 3-slot finality (3SF) protocol as a practical synthesis that balances fast finality with the engineering constraints of the Ethereum network.
要約:
6G networks promise to be the proper technology to support a wide deployment of highly demanding services, satisfying key users-related aspects such as extremely high quality, and persistent communications. However, there is no service to support if the network is not reliable enough. In this direction, it is with no doubt that security guarantees become a must. Traditional security approaches have focused on providing specific and attack-tailored solutions that will not properly meet the uncertainties driven by a technology yet under development and showing an attack surface not completely identified either. In this positioning paper we propose a softwarized solution, defining a Security Plane built on a top of programmable and adaptable set of live Security Functions under a proactive strategy. In addition, in order to address the inaccuracies driven by the predictive models a pre-assessment scenario is also considered ensuring that no action will be deployed if not previously verified. Although more efforts are required to develop this initiative, we think that such a shift paradigm is the only way to face security provisioning challenges in 6G ecosystems.
要約:
In this paper, we introduce Sark, a reference architecture implementing the Unforgeable, Stateful, and Oblivious (USO) asset system as described by Goodell, Toliver, and Nakib. We describe the motivation, design, and implementation of Sloop, a permissioned, crash fault-tolerant (CFT) blockchain that forms a subsystem of Sark, and the other core subsystems, Porters, which accumulate and roll-up commitments from Clients. We analyse the operation of the system using the 'CIA Triad': Confidentiality, Availability, and Integrity. We then introduce the concept of Integrity Locus and use it to address design trade-offs related to decentralization. Finally, we point to future work on Byzantine fault-tolerance (BFT), and mitigating the local centrality of Porters.
要約:
Dynamic malware analysis requires executing untrusted binaries inside strongly isolated, rapidly resettable environments. In practice, many detonation workflows remain tied to heavyweight hypervisors or dedicated bare-metal labs, limiting portability and automation. This challenge has intensified with the adoption of ARM64 developer hardware (e.g., Apple Silicon), where common open-source sandbox recipes and pre-built environments frequently assume x86_64 hosts and do not translate cleanly across architectures. This paper presents pokiSEC, a lightweight, ephemeral malware detonation sandbox that packages the full virtualization and access stack inside a Docker container. pokiSEC integrates QEMU with hardware acceleration (KVM when available) and exposes a browser-based workflow that supports bring-your-own Windows disk images. The key contribution is a Universal Entrypoint that performs runtime host-architecture detection and selects validated hypervisor configurations (machine types, acceleration modes, and device profiles), enabling a single container image and codebase to launch Windows guests on both ARM64 and x86_64 hosts. We validate pokiSEC on Apple Silicon (ARM64) and Ubuntu (AMD64), demonstrating interactive performance suitable for analyst workflows and consistent teardown semantics via ephemeral container lifecycles.
要約:
Function call graphs (FCGs) have emerged as a powerful abstraction for malware detection, capturing the behavioral structure of applications beyond surface-level signatures. Their utility in traditional program analysis has been well established, enabling effective classification and analysis of malicious software. In the mobile domain, especially in the Android ecosystem, FCG-based malware classification is particularly critical due to the platform's widespread adoption and the complex, component-based structure of Android apps. However, progress in this direction is hindered by the lack of large-scale, high-quality Android-specific FCG datasets. Existing datasets are often outdated, dominated by small or redundant graphs resulting from app repackaging, and fail to reflect the diversity of real-world malware. These limitations lead to overfitting and unreliable evaluation of graph-based classification methods. To address this gap, we introduce Better Call Graphs (BCG), a comprehensive dataset of large and unique FCGs extracted from recent Android application packages (APKs). BCG includes both benign and malicious samples spanning various families and types, along with graph-level features for each APK. Through extensive experiments using baseline classifiers, we demonstrate the necessity and value of BCG compared to existing datasets. BCG is publicly available at https://erdemub.github.io/BCG-dataset.
要約:
Autonomous Vehicles (AVs) refer to systems capable of perceiving their states and moving without human intervention. Among the factors required for autonomous decision-making in mobility, positional awareness of the vehicle itself is the most critical. Accordingly, extensive research has been conducted on defense mechanisms against GPS spoofing attacks, which threaten AVs by disrupting position recognition. Among these, detection methods based on internal IMU sensors are regarded as some of the most effective. In this paper, we propose a spoofing attack system designed to neutralize IMU sensor-based detection. First, we present an attack modeling approach for bypassing such detection. Then, based on EKF sensor fusion, we experimentally analyze both the impact of GPS spoofing values on the internal target system and how our proposed methodology reduces anomaly detection within the target system. To this end, this paper proposes an attack model that performs GPS spoofing by stealing internal dynamic state information using an external IMU sensor, and the experimental results demonstrate that attack values can be injected without being detected.
要約:
The integration of Large Language Models (LLMs) into wearable sensing is creating a new class of mobile applications capable of nuanced human activity understanding. However, the reliability of these systems is critically undermined by their vulnerability to prompt injection attacks, where attackers deliberately input deceptive instructions into LLMs. Traditional defenses, based on static filters and rigid rules, are insufficient to address the semantic complexity of these new attacks. We argue that a paradigm shift is needed -- from passive filtering to active protection and autonomous reasoning. We introduce AegisAgent, an autonomous agent system designed to ensure the security of LLM-driven HAR systems. Instead of merely blocking threats, AegisAgent functions as a cognitive guardian. It autonomously perceives potential semantic inconsistencies, reasons about the user's true intent by consulting a dynamic memory of past interactions, and acts by generating and executing a multi-step verification and repair plan. We implement AegisAgent as a lightweight, full-stack prototype and conduct a systematic evaluation on 15 common attacks with five state-of-the-art LLM-based HAR systems on three public datasets. Results show it reduces attack success rate by 30\% on average while incurring only 78.6 ms of latency overhead on a GPU workstation. Our work makes the first step towards building secure and trustworthy LLM-driven HAR systems.
要約:
Mixture-of-Experts (MoE) architectures have advanced the scaling of Large Language Models (LLMs) by activating only a sparse subset of parameters per input, enabling state-of-the-art performance with reduced computational cost. As these models are increasingly deployed in critical domains, understanding and strengthening their alignment mechanisms is essential to prevent harmful outputs. However, existing LLM safety research has focused almost exclusively on dense architectures, leaving the unique safety properties of MoEs largely unexamined. The modular, sparsely-activated design of MoEs suggests that safety mechanisms may operate differently than in dense models, raising questions about their robustness.
In this paper, we present GateBreaker, the first training-free, lightweight, and architecture-agnostic attack framework that compromises the safety alignment of modern MoE LLMs at inference time. GateBreaker operates in three stages: (i) gate-level profiling, which identifies safety experts disproportionately routed on harmful inputs, (ii) expert-level localization, which localizes the safety structure within safety experts, and (iii) targeted safety removal, which disables the identified safety structure to compromise the safety alignment. Our study shows that MoE safety concentrates within a small subset of neurons coordinated by sparse routing. Selective disabling of these neurons, approximately 3% of neurons in the targeted expert layers, significantly increases the averaged attack success rate (ASR) from 7.4% to 64.9% against the eight latest aligned MoE LLMs with limited utility degradation. These safety neurons transfer across models within the same family, raising ASR from 17.9% to 67.7% with one-shot transfer attack. Furthermore, GateBreaker generalizes to five MoE vision language models (VLMs) with 60.9% ASR on unsafe image inputs.
要約:
Healthcare AI needs large, diverse datasets, yet strict privacy and governance constraints prevent raw data sharing across institutions. Federated learning (FL) mitigates this by training where data reside and exchanging only model updates, but practical deployments still face two core risks: (1) privacy leakage via gradients or updates (membership inference, gradient inversion) and (2) trust in the aggregator, a single point of failure that can drop, alter, or inject contributions undetected. We present zkFL-Health, an architecture that combines FL with zero-knowledge proofs (ZKPs) and Trusted Execution Environments (TEEs) to deliver privacy-preserving, verifiably correct collaborative training for medical AI. Clients locally train and commit their updates; the aggregator operates within a TEE to compute the global update and produces a succinct ZK proof (via Halo2/Nova) that it used exactly the committed inputs and the correct aggregation rule, without revealing any client update to the host. Verifier nodes validate the proof and record cryptographic commitments on-chain, providing an immutable audit trail and removing the need to trust any single party. We outline system and threat models tailored to healthcare, the zkFL-Health protocol, security/privacy guarantees, and a performance evaluation plan spanning accuracy, privacy risk, latency, and cost. This framework enables multi-institutional medical AI with strong confidentiality, integrity, and auditability, key properties for clinical adoption and regulatory compliance.
要約:
As LLMs see wide adoption in software engineering, the reliable assessment of the correctness and security of LLM-generated code is crucial. Notably, prior work has demonstrated that security is often overlooked, exposing that LLMs are prone to generating code with security vulnerabilities. These insights were enabled by specialized benchmarks, crafted through significant manual effort by security experts. However, relying on manually-crafted benchmarks is insufficient in the long term, because benchmarks (i) naturally end up contaminating training data, (ii) must extend to new tasks to provide a more complete picture, and (iii) must increase in difficulty to challenge more capable LLMs. In this work, we address these challenges and present AutoBaxBuilder, a framework that generates tasks and tests for code security benchmarking from scratch. We introduce a robust pipeline with fine-grained plausibility checks, leveraging the code understanding capabilities of LLMs to construct functionality tests and end-to-end security-probing exploits. To confirm the quality of the generated benchmark, we conduct both a qualitative analysis and perform quantitative experiments, comparing it against tasks constructed by human experts. We use AutoBaxBuilder to construct entirely new tasks and release them to the public as AutoBaxBench, together with a thorough evaluation of the security capabilities of LLMs on these tasks. We find that a new task can be generated in under 2 hours, costing less than USD 10.
要約:
Large language models (LLMs) have revolutionized software development through AI-assisted coding tools, enabling developers with limited programming expertise to create sophisticated applications. However, this accessibility extends to malicious actors who may exploit these powerful tools to generate harmful software. Existing jailbreaking research primarily focuses on general attack scenarios against LLMs, with limited exploration of malicious code generation as a jailbreak target. To address this gap, we propose SPELL, a comprehensive testing framework specifically designed to evaluate the weakness of security alignment in malicious code generation. Our framework employs a time-division selection strategy that systematically constructs jailbreaking prompts by intelligently combining sentences from a prior knowledge dataset, balancing exploration of novel attack patterns with exploitation of successful techniques. Extensive evaluation across three advanced code models (GPT-4.1, Claude-3.5, and Qwen2.5-Coder) demonstrates SPELL's effectiveness, achieving attack success rates of 83.75%, 19.38%, and 68.12% respectively across eight malicious code categories. The generated prompts successfully produce malicious code in real-world AI development tools such as Cursor, with outputs confirmed as malicious by state-of-the-art detection systems at rates exceeding 73%. These findings reveal significant security gaps in current LLM implementations and provide valuable insights for improving AI safety alignment in code generation applications.
要約:
Lateral movement is a tactic that adversaries employ most frequently in enterprise IT environments to traverse between assets. In operational technology (OT) environments, however, few methods exist for lateral movement between domain-specific devices, particularly programmable logic controllers (PLCs). Existing techniques often rely on complex chains of vulnerabilities, which are noisy and can be patched. This paper describes the first PLC-centric lateral movement technique that relies exclusively on the native functionality of the victim environment. This OT-specific form of `living off the land' is herein distinguished as `living off the plant' (LOTP). The described technique also facilitates escape from IP networks onto legacy serial networks via dual-homed PLCs. Furthermore, this technique is covert, leveraging common network communication functions that are challenging to detect. This serves as a reminder of the risks posed by LOTP techniques within OT, highlighting the need for a fundamental reconsideration of traditional OT defensive practices.
要約:
LLM-based code agents(e.g., ChatGPT Codex) are increasingly deployed as detector for code review and security auditing tasks. Although CoT-enhanced LLM vulnerability detectors are believed to provide improved robustness against obfuscated malicious code, we find that their reasoning chains and semantic abstraction processes exhibit exploitable systematic weaknesses.This allows attackers to covertly embed malicious logic, bypass code review, and propagate backdoored components throughout real-world software supply chains.To investigate this issue, we present CoTDeceptor, the first adversarial code obfuscation framework targeting CoT-enhanced LLM detectors. CoTDeceptor autonomously constructs evolving, hard-to-reverse multi-stage obfuscation strategy chains that effectively disrupt CoT-driven detection logic.We obtained malicious code provided by security enterprise, experimental results demonstrate that CoTDeceptor achieves stable and transferable evasion performance against state-of-the-art LLMs and vulnerability detection agents. CoTDeceptor bypasses 14 out of 15 vulnerability categories, compared to only 2 bypassed by prior methods. Our findings highlight potential risks in real-world software supply chains and underscore the need for more robust and interpretable LLM-powered security analysis systems.
要約:
My main worry, and the core of my research, is that our cybersecurity ecosystem is slowly but surely aging and getting old and that aging is becoming an operational risk. This is happening not only because of growing complexity, but more importantly because of accumulation of controls and measures whose effectiveness are uncertain. I introduce a new term for this aging phenomenon: cyber senescence. I will begin my lecture with a short historical overview in which I sketch a development over time that led to this worry for the future of cybersecurity. It is this worry that determined my research agenda and its central theme of the role of uncertainty in cybersecurity. My worry is that waste is accumulating in cyberspace. This waste consists of a multitude of overlapping controls whose risk reductions are uncertain. Unless we start pruning these control frameworks, this waste accumulation causes aging of cyberspace and could ultimately lead to a system collapse.
要約:
Privacy concerns around 5G, the latest generation of mobile networks, are growing, with fears that its deployment may increase exposure to privacy risks. This perception is largely driven by the use of denser deployments of small antenna systems, which enable highly accurate data collection at higher speeds and closer proximity to mobile users. At the same time, 5G's unique radio communication features can make the reproduction of known network attacks more challenging. In particular, passive network attacks, which do not involve direct interaction with the target network and are therefore nearly impossible to detect, remain a pressing concern. Such attacks can reveal sensitive information about users, their devices, and active applications, which may then be exploited through known vulnerabilities or spear-phishing schemes. This survey examines the feasibility of passive network attacks in 5G and beyond (B5G/6G) networks, with emphasis on two major categories: information extraction (system identification, website and application fingerprinting) and geolocation (user identification and position tracking). These attacks are well documented and reproducible in existing wireless and mobile systems, including short-range networks (IEEE 802.11) and, to a lesser extent, LTE. Current evidence suggests that while such attacks remain theoretically possible in 5G, their practical execution is significantly constrained by directional beamforming, high-frequency propagation characteristics, and encryption mechanisms. For B5G and early 6G networks, the lack of public tools and high hardware cost currently renders these attacks infeasible in practice, which highlights a critical gap in our understanding of future network threat models.
要約:
Leveraging a validated set of reconstructed Lightning Network topology snapshots spanning five years (2019-2023), we computed 47 computationally intensive metrics and network attributes, enabling a comprehensive analysis of the network's structure and temporal dynamics. Our results corroborate prior topology studies while offering deeper insight into the network's structural evolution. In particular, we quantify the network's topological stability over time, yielding implications for the design of heuristic-based pathfinding and routing protocols. More broadly, this work provides a detailed characterization of publicly available Lightning Network snapshots, supporting future research in Payment Channel Network analysis and network science.
要約:
The rapid expansion of AI-driven applications powered by large language models has led to a surge in AI interaction data, raising urgent challenges in security, accountability, and risk traceability. This paper presents AiAuditTrack (AAT), a blockchain-based framework for AI usage traffic recording and governance. AAT leverages decentralized identity (DID) and verifiable credentials (VC) to establish trusted and identifiable AI entities, and records inter-entity interaction trajectories on-chain to enable cross-system supervision and auditing. AI entities are modeled as nodes in a dynamic interaction graph, where edges represent time-specific behavioral trajectories. Based on this model, a risk diffusion algorithm is proposed to trace the origin of risky behaviors and propagate early warnings across involved entities. System performance is evaluated using blockchain Transactions Per Second (TPS) metrics, demonstrating the feasibility and stability of AAT under large-scale interaction recording. AAT provides a scalable and verifiable solution for AI auditing, risk management, and responsibility attribution in complex multi-agent environments.
要約:
Divisor methods are well known to satisfy house monotonicity, which allows representative seats to be allocated sequentially. We focus on stationary divisor methods defined by a rounding cut point $c \in [0,1]$. For such methods with integer-valued votes, the resulting apportionment sequences are periodic. Restricting attention to two-party allocations, we characterize the set of possible sequences and establish a connection between the lexicographical ordering of these sequences and the parameter $c$. We then show how sequences for all pairs of parties can be systematically extended to the $n$-party setting. Further, we determine the number of distinct sequences in the $n$-party problem for all $c$. Our approach offers a refined perspective on large-party bias: rather than viewing large parties as simply receiving more seats, we show that they instead obtain their seats earlier in the apportionment sequence. Of particular interest is a new relationship we uncover between the sequences generated by the smallest divisors (Adams) and greatest divisors (d'Hondt or Jefferson) methods.
要約:
Anonymity is a fundamental cryptographic primitive that hides the identities of both senders and receivers during message transmission over a network. Classical protocols cannot provide information-theoretic security for such task, and existing quantum approaches typically depend on classical subroutines and multiple private channels, thereby weakening their security in fully adversarial settings. In this work, we introduce the first fully quantum protocol for anonymous communication in realistic quantum networks with a device-independent security proof.
要約:
Current Large Language Models (LLMs) safety approaches focus on explicitly harmful content while overlooking a critical vulnerability: the inability to understand context and recognize user intent. This creates exploitable vulnerabilities that malicious users can systematically leverage to circumvent safety mechanisms. We empirically evaluate multiple state-of-the-art LLMs, including ChatGPT, Claude, Gemini, and DeepSeek. Our analysis demonstrates the circumvention of reliable safety mechanisms through emotional framing, progressive revelation, and academic justification techniques. Notably, reasoning-enabled configurations amplified rather than mitigated the effectiveness of exploitation, increasing factual precision while failing to interrogate the underlying intent. The exception was Claude Opus 4.1, which prioritized intent detection over information provision in some use cases. This pattern reveals that current architectural designs create systematic vulnerabilities. These limitations require paradigmatic shifts toward contextual understanding and intent recognition as core safety capabilities rather than post-hoc protective mechanisms.
要約:
Large language models (LLMs) are increasingly used in software development, but their level of software security expertise remains unclear. This work systematically evaluates the security comprehension of five leading LLMs: GPT-4o-Mini, GPT-5-Mini, Gemini-2.5-Flash, Llama-3.1, and Qwen-2.5, using Blooms Taxonomy as a framework. We assess six cognitive dimensions: remembering, understanding, applying, analyzing, evaluating, and creating. Our methodology integrates diverse datasets, including curated multiple-choice questions, vulnerable code snippets (SALLM), course assessments from an Introduction to Software Security course, real-world case studies (XBOW), and project-based creation tasks from a Secure Software Engineering course. Results show that while LLMs perform well on lower-level cognitive tasks such as recalling facts and identifying known vulnerabilities, their performance degrades significantly on higher-order tasks that require reasoning, architectural evaluation, and secure system creation. Beyond reporting aggregate accuracy, we introduce a software security knowledge boundary that identifies the highest cognitive level at which a model consistently maintains reliable performance. In addition, we identify 51 recurring misconception patterns exhibited by LLMs across Blooms levels.
要約:
In hard-label black-box adversarial attacks, where only the top-1 predicted label is accessible, the prohibitive query complexity poses a major obstacle to practical deployment. In this paper, we focus on optimizing a representative class of attacks that search for the optimal ray direction yielding the minimum $\ell_2$-norm perturbation required to move a benign image into the adversarial region. Inspired by Nesterov's Accelerated Gradient (NAG), we propose a momentum-based algorithm, ARS-OPT, which proactively estimates the gradient with respect to a future ray direction inferred from accumulated momentum. We provide a theoretical analysis of its convergence behavior, showing that ARS-OPT enables more accurate directional updates and achieves faster, more stable optimization. To further accelerate convergence, we incorporate surrogate-model priors into ARS-OPT's gradient estimation, resulting in PARS-OPT with enhanced performance. The superiority of our approach is supported by theoretical guarantees under standard assumptions. Extensive experiments on ImageNet and CIFAR-10 demonstrate that our method surpasses 13 state-of-the-art approaches in query efficiency.
要約:
In this work we present a publicly verifiable quantum money protocol which assumes close to no quantum computational capabilities. We rely on one-time memories which in turn can be built from quantum conjugate coding and hardware-based assumptions. Specifically, our scheme allows for a limited number of verifications and also allows for quantum tokens for digital signatures. Double spending is prevented by the no-cloning principle of conjugate coding states. An implementation of the concepts presented in this work can be found at https://github.com/neverlocal/otm_billz.
要約:
Conventional double-spending attack models ignore the revenue losses stemming from the orphan blocks. On the other hand, selfish mining literature usually ignores the chance of the attacker to double-spend at no-cost in each attack cycle. In this paper, we give a rigorous stochastic analysis of an attack where the goal of the adversary is to double-spend while mining selfishly. To do so, we first combine stubborn and selfish mining attacks, i.e., construct a strategy where the attacker acts stubborn until its private branch reaches a certain length and then switches to act selfish. We provide the optimal stubbornness for each parameter regime. Next, we provide the maximum stubbornness that is still more profitable than honest mining and argue a connection between the level of stubbornness and the $k$-confirmation rule. We show that, at each attack cycle, if the level of stubbornness is higher than $k$, the adversary gets a free shot at double-spending. At each cycle, for a given stubbornness level, we rigorously formulate how great the probability of double-spending is. We further modify the attack in the stubborn regime in order to conceal the attack and increase the double-spending probability.
要約:
Large Language Models (LLMs) are increasingly popular, powering a wide range of applications. Their widespread use has sparked concerns, especially through jailbreak attacks that bypass safety measures to produce harmful content.
In this paper, we present a comprehensive security analysis of large language models (LLMs), addressing critical research questions on the evolution and determinants of model safety.
Specifically, we begin by identifying the most effective techniques for detecting jailbreak attacks. Next, we investigate whether newer versions of LLMs offer improved security compared to their predecessors. We also assess the impact of model size on overall security and explore the potential benefits of integrating multiple defense strategies to enhance the security.
Our study evaluates both open-source (e.g., LLaMA and Mistral) and closed-source models (e.g., GPT-4) by employing four state-of-the-art attack techniques and assessing the efficacy of three new defensive approaches.
要約:
Graph Neural Networks (GNNs) are increasingly deployed in real-world applications, making ownership verification critical to protect their intellectual property against model theft. Fingerprinting and black-box watermarking are two main methods. However, the former relies on determining model similarity, which is computationally expensive and prone to ownership collisions after model post-processing. The latter embeds backdoors, exposing watermarked models to the risk of backdoor attacks. Moreover, both previous methods enable ownership verification but do not convey additional information about the copy model. If the owner has multiple models, each model requires a distinct trigger graph.
To address these challenges, this paper proposes WGLE, a novel black-box watermarking paradigm for GNNs that enables embedding the multi-bit string in GNN models without using backdoors. WGLE builds on a key insight we term Layer-wise Distance Difference on an Edge (LDDE), which quantifies the difference between the feature distance and the prediction distance of two connected nodes in a graph. By assigning unique LDDE values to the edges and employing the LDDE sequence as the watermark, WGLE supports multi-bit capacity without relying on backdoor mechanisms. We evaluate WGLE on six public datasets across six mainstream GNN architectures, and compare WGLE with state-of-the-art GNN watermarking and fingerprinting methods. WGLE achieves 100% ownership verification accuracy, with an average fidelity degradation of only 1.41%. Additionally, WGLE exhibits robust resilience against potential attacks. The code is available in the repository.
要約:
Large Language Models (LLMs) continue to exhibit vulnerabilities to jailbreaking attacks: carefully crafted malicious inputs intended to circumvent safety guardrails and elicit harmful responses. As such, we present AutoAdv, a novel framework that automates adversarial prompt generation to systematically evaluate and expose vulnerabilities in LLM safety mechanisms. Our approach leverages a parametric attacker LLM to produce semantically disguised malicious prompts through strategic rewriting techniques, specialized system prompts, and optimized hyperparameter configurations. The primary contribution of our work is a dynamic, multi-turn attack methodology that analyzes failed jailbreak attempts and iteratively generates refined follow-up prompts, leveraging techniques such as roleplaying, misdirection, and contextual manipulation. We quantitatively evaluate attack success rate (ASR) using the StrongREJECT (arXiv:2402.10260 [cs.CL]) framework across sequential interaction turns. Through extensive empirical evaluation of state-of-the-art models--including ChatGPT, Llama, and DeepSeek--we reveal significant vulnerabilities, with our automated attacks achieving jailbreak success rates of up to 86% for harmful content generation. Our findings reveal that current safety mechanisms remain susceptible to sophisticated multi-turn attacks, emphasizing the urgent need for more robust defense strategies.
要約:
Autonomous vehicles (AVs) rely on pervasive connectivity to enable cooperative and safety-critical applications, but this connectivity also exposes them to a wide range of cybersecurity threats. Existing perimeter-based security and centralized identity management approaches are inadequate for highly dynamic V2X environments, as they depend on implicit trust and suffer from scalability and single-point-of-failure limitations. This paper proposes D-IM, a Zero Trust-based decentralized identity management and authentication framework for secure V2X communication. D-IM integrates continuous verification with a permissioned blockchain to eliminate centralized trust assumptions and enforce explicit, verifiable identity relationships among vehicles and infrastructure. The framework is designed around clear Zero Trust-aligned goals, including mutual authentication, decentralization, privacy protection, non-repudiation, and traceability, and addresses a comprehensive attacker model covering identity, data integrity, collusion, availability, and accountability threats. We present the D-IM system architecture and identification and authorization protocol, and validate its security properties through both qualitative analysis and a formal BAN logic-based verification. Simulation results in urban and highway scenarios using DSRC and C-V2X demonstrate that D-IM introduces limited overhead while preserving network performance, supporting its practicality for real-world AV deployments.
要約:
The 5G protocol lacks a robust base station (BS) authentication mechanism during the initial bootstrapping phase, leaving it susceptible to threats such as fake BSs, spoofed broadcasts, and large-scale manipulation of System Information Blocks (SIBs). Despite real-world 5G deployments increasingly relying on multi-BS communication and user multi-connectivity, existing solutions incur high communication overheads, rely on centralized trust, and lack accountability and long-term breach resiliency. Given the inevitability of BS compromise and the severe impact of forged SIBs as the root of trust (e.g., fake alerts, tracking, false roaming), distributed trust, verifiable forgery detection, and audit logging are essential, yet remain largely unexplored in 5G authentication. These challenges are further amplified by the emergence of quantum-capable adversaries. While integration of NIST PQC standards is widely viewed as a path toward long-term security and future-proofing 5G authentication, their feasibility under strict packet size, latency, and broadcast constraints has not been systematically studied. This work presents, to our knowledge, the first comprehensive network-level performance characterization of integrating NIST-PQC standards and conventional digital signatures into 5G BS authentication, showing that direct PQC adoption is impractical due to protocol constraints, delays, and large signature sizes. To address these challenges, we propose BORG, a future-proof authentication framework based on a hierarchical identity-based threshold signature with fail-stop properties. BORG distributes trust across multiple BSs, enables post-mortem forgery detection, and provides tamper-evident, post-quantum secure audit logging, while maintaining compact signatures, avoiding fragmentation, and incurring minimal UE overhead, as shown in our 5G testbed implementation.
要約:
To date, traffic obfuscation techniques have been widely adopted to protect network data privacy and security by obscuring the true patterns of traffic. Nevertheless, as the pre-trained models emerge, especially transformer-based classifiers, existing traffic obfuscation methods become increasingly vulnerable, as witnessed by current studies reporting the traffic classification accuracy up to 99\% or higher. To counter such high-performance transformer-based classification models, we in this paper propose a novel and effective \underline{adv}ersarial \underline{traffic}-generating approach (AdvTraffic\footnote{The code and data are available at: https://anonymous.4open.science/r/TrafficD-C461}). Our approach has two key innovations: (i) a pre-padding strategy is proposed to modify packets, which effectively overcomes the limitations of existing research against transformer-based models for network traffic classification; and (ii) a reinforcement learning model is employed to optimize network traffic perturbations, aiming to maximize adversarial effectiveness against transformer-based classification models. To the best of our knowledge, this is the first attempt to apply adversarial perturbation techniques to defend against transformer-based traffic classifiers. Furthermore, our method can be easily deployed into practical network environments. Finally, multi-faceted experiments are conducted across several real-world datasets, and the experimental results demonstrate that our proposed method can effectively undermine transformer-based classifiers, significantly reducing classification accuracy from 99\% to as low as 25.68\%.
要約:
In this paper, we make the first attempt towards defining cost function of steganography with large language models (LLMs), which is totally different from previous works that rely heavily on expert knowledge or require large-scale datasets for cost learning. To achieve this goal, a two-stage strategy combining LLM-guided program synthesis with evolutionary search is applied in the proposed method. In the first stage, a certain number of cost functions in the form of computer programs are synthesized from LLM responses to structured prompts. These cost functions are then evaluated with pretrained steganalysis models so that candidate cost functions suited to steganography can be collected. In the second stage, by retraining a steganalysis model for each candidate cost function, the optimal cost function(s) can be determined according to the detection accuracy. This two-stage strategy is performed by an iterative fashion so that the best cost function can be collected at the last iteration. Experiments show that the proposed method enables LLMs to design new cost functions of steganography that significantly outperform existing works in terms of resisting steganalysis tools, which verifies the superiority of the proposed method. To the best knowledge of the authors, this is the first work applying LLMs to the design of advanced cost function of steganography, which presents a novel perspective for steganography design and may shed light on further research.
要約:
Detecting the source of a gossip is a critical issue, related to identifying patient zero in an epidemic, or the origin of a rumor in a social network. Although it is widely acknowledged that random and local gossip communications make source identification difficult, there exists no general quantification of the level of anonymity provided to the source. This paper presents a principled method based on $\varepsilon$-differential privacy to analyze the inherent source anonymity of gossiping for a large class of graphs. First, we quantify the fundamental limit of source anonymity any gossip protocol can guarantee in an arbitrary communication graph. In particular, our result indicates that when the graph has poor connectivity, no gossip protocol can guarantee any meaningful level of differential privacy. This prompted us to further analyze graphs with controlled connectivity. We prove on these graphs that a large class of gossip protocols, namely cobra walks, offers tangible differential privacy guarantees to the source. In doing so, we introduce an original proof technique based on the reduction of a gossip protocol to what we call a random walk with probabilistic die out. This proof technique is of independent interest to the gossip community and readily extends to other protocols inherited from the security community, such as the Dandelion protocol. Interestingly, our tight analysis precisely captures the trade-off between dissemination time of a gossip protocol and its source anonymity.
要約:
We present an algorithm that releases a pure differentially private (under the replacement neighboring relation) sparse histogram for $n$ participants over a domain of size $d \gg n$. Our method achieves the optimal $\ell_\infty$-estimation error and runs in strictly $O(n)$ time in the Word-RAM model, improving upon the previous best deterministic-time bound of $\tilde{O}(n^2)$ and resolving the open problem of breaking this quadratic barrier (Balcer and Vadhan, 2019). Moreover, the algorithm admits an efficient circuit implementation, enabling the first near-linear communication and computation cost pure DP histogram MPC protocol with optimal $\ell_\infty$-estimation error. Central to our algorithm is a novel **private item blanket** technique with target-length padding, which hides differences in the supports of neighboring histograms while remaining efficiently implementable.