cs.CR updates on arXiv.org

更新日時: Thu, 13 Nov 2025 05:00:32 +0000
論文数: 46件
0件選択中

📋 論文タイトル一覧

1. QOC DAO - Stepwise Development Towards an AI Driven Decentralized Autonomous Organization
2. Binary and Multiclass Cyberattack Classification on GeNIS Dataset
3. Automated Hardware Trojan Insertion in Industrial-Scale Designs
4. Channel-Robust RFF for Low-Latency 5G Device Identification in SIMO Scenarios
5. iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification
6. DeepTracer: Tracing Stolen Model via Deep Coupled Watermarks intellectual property
7. MedHE: Communication-Efficient Privacy-Preserving Federated Learning with Adaptive Gradient Sparsification for Healthcare privacy
8. Attack-Centric by Design: A Program-Structure Taxonomy of Smart Contract Vulnerabilities
9. Toward an Intrusion Detection System for a Virtualization Framework in Edge Computing
10. Improving Sustainability of Adversarial Examples in Class-Incremental Learning
11. Differentially Private Rankings via Outranking Methods and Performance Data Aggregation privacy
12. One Signature, Multiple Payments: Demystifying and Detecting Signature Replay Vulnerabilities in Smart Contracts
13. Unveiling Hidden Threats: Using Fractal Triggers to Boost Stealthiness of Distributed Backdoor Attacks in Federated Learning backdoor
14. SecTracer: A Framework for Uncovering the Root Causes of Network Intrusions via Security Provenance
15. Quantum Meet-in-the-Middle Attacks on Key-Length Extension Constructions
16. Enhancing Password Security Through a High-Accuracy Scoring Framework Using Random Forests
17. Intelligent Carrier Allocation: A Cross-Modal Reasoning Framework for Adaptive Multimodal Steganography
18. How do data owners say no? A case study of data consent mechanisms in web-scraped vision-language AI training datasets
19. FAIRPLAI: A Human-in-the-Loop Approach to Fair and Private Machine Learning privacy
20. Enhancing DPSGD via Per-Sample Momentum and Low-Pass Filtering
21. 3D Guard-Layer: An Integrated Agentic AI Safety System for Edge Artificial Intelligence agent
22. Robust Backdoor Removal by Reconstructing Trigger-Activated Changes in Latent Representation backdoor
23. Towards a Generalisable Cyber Defence Agent for Real-World Computer Networks agent
24. Slaying the Dragon: The Quest for Democracy in Decentralized Autonomous Organizations (DAOs)
25. AdaptDel: Adaptable Deletion Rate Randomized Smoothing for Certified Robustness
26. Formalizing and Benchmarking Prompt Injection Attacks and Defenses
27. Privacy-Preserving Retrieval-Augmented Generation with Differential Privacy privacy
28. UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning agent
29. Large Language Models Are Unreliable for Cyber Threat Intelligence
30. DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks
31. The Feasibility of Topic-Based Watermarking on Academic Peer Reviews intellectual property
32. Joint-GCG: Unified Gradient-Based Poisoning Attacks on Retrieval-Augmented Generation Systems backdoor
33. EFPIX: A zero-trust encrypted flood protocol
34. Removal Attack and Defense on AI-generated Content Latent-based Watermarking intellectual property
35. Towards Adapting Federated & Quantum Machine Learning for Network Intrusion Detection: A Survey
36. SecInfer: Preventing Prompt Injection via Inference-time Scaling
37. Prompt Injection as an Emerging Threat: Evaluating the Resilience of Large Language Models
38. Exploiting Data Structures for Bypassing and Crashing Anti-Malware Solutions via Telemetry Complexity Attacks
39. MCP-RiskCue: Can LLM Infer Risk Information From MCP Server System Logs?
40. Privacy on the Fly: A Predictive Adversarial Transformation Network for Mobile Sensor Data privacy
41. QLCoder: A Query Synthesizer For Static Analysis of Security Vulnerabilities
42. Improving Adversarial Transferability with Neighbourhood Gradient Information
43. GreedyPixel: Fine-Grained Black-Box Adversarial Attack Via Greedy Algorithm
44. How Well Can Differential Privacy Be Audited in One Run? privacy
45. When AI Takes the Wheel: Security Analysis of Framework-Constrained Program Generation
46. HLPD: Aligning LLMs to Human Language Preference for Machine-Revised Text Detection
📄 論文詳細
著者: Marc Jansen, Christophe Verdot
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
This paper introduces a structured approach to improving decision making in Decentralized Autonomous Organizations (DAO) through the integration of the Question-Option-Criteria (QOC) model and AI agents. We outline a stepwise governance framework that evolves from human led evaluations to fully autonomous, AI-driven processes. By decomposing decisions into weighted, criterion based evaluations, the QOC model enhances transparency, fairness, and explainability in DAO voting. We demonstrate how large language models (LLMs) and stakeholder aligned AI agents can support or automate evaluations, while statistical safeguards help detect manipulation. The proposed framework lays the foundation for scalable and trustworthy governance in the Web3 ecosystem.
著者: Miguel Silva, Daniela Pinto, Jo\~ao Vitorino, Eva Maia, Isabel Pra\c{c}a, Ivone Amorim, Maria Jo\~ao Viamonte
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
The integration of Artificial Intelligence (AI) in Network Intrusion Detection Systems (NIDS) is a promising approach to tackle the increasing sophistication of cyberattacks. However, since Machine Learning (ML) and Deep Learning (DL) models rely heavily on the quality of their training data, the lack of diverse and up-to-date datasets hinders their generalization capability to detect malicious activity in previously unseen network traffic. This study presents an experimental validation of the reliability of the GeNIS dataset for AI-based NIDS, to serve as a baseline for future benchmarks. Five feature selection methods, Information Gain, Chi-Squared Test, Recursive Feature Elimination, Mean Absolute Deviation, and Dispersion Ratio, were combined to identify the most relevant features of GeNIS and reduce its dimensionality, enabling a more computationally efficient detection. Three decision tree ensembles and two deep neural networks were trained for both binary and multiclass classification tasks. All models reached high accuracy and F1-scores, and the ML ensembles achieved slightly better generalization while remaining more efficient than DL models. Overall, the obtained results indicate that the GeNIS dataset supports intelligent intrusion detection and cyberattack classification with time-based and quantity-based behavioral features.
著者: Yaroslav Popryho, Debjit Pal, Inna Partin-Vaisband
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Industrial Systems-on-Chips (SoCs) often comprise hundreds of thousands to millions of nets and millions to tens of millions of connectivity edges, making empirical evaluation of hardware-Trojan (HT) detectors on realistic designs both necessary and difficult. Public benchmarks remain significantly smaller and hand-crafted, while releasing truly malicious RTL raises ethical and operational risks. This work presents an automated and scalable methodology for generating HT-like patterns in industry-scale netlists whose purpose is to stress-test detection tools without altering user-visible functionality. The pipeline (i) parses large gate-level designs into connectivity graphs, (ii) explores rare regions using SCOAP testability metrics, and (iii) applies parameterized, function-preserving graph transformations to synthesize trigger-payload pairs that mimic the statistical footprint of stealthy HTs. When evaluated on the benchmarks generated in this work, representative state-of-the-art graph-learning models fail to detect Trojans. The framework closes the evaluation gap between academic circuits and modern SoCs by providing reproducible challenge instances that advance security research without sharing step-by-step attack instructions.
著者: Yingjie Sun, Guyue Li, Hongfu Chou, Aiqun Hu
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Ultra-low latency, the hallmark of fifth-generation mobile communications (5G), imposes exacting timing demands on identification as well. Current cryptographic solutions introduce additional computational overhead, which results in heightened identification delays. Radio frequency fingerprint (RFF) identifies devices at the physical layer, blocking impersonation attacks while significantly reducing latency. Unfortunately, multipath channels compromise RFF accuracy, and existing channel-resilient methods demand feedback or processing across multiple time points, incurring extra signaling latency. To address this problem, the paper introduces a new RFF extraction technique that employs signals from multiple receiving antennas to address multipath issues without adding latency. Unlike single-domain methods, the Log-Linear Delta Ratio (LLDR) of co-temporal channel frequency responses (CFRs) from multiple antennas is employed to preserve discriminative RFF features, eliminating multi-time sampling and reducing acquisition time. To overcome the challenge of the reliance on minimal channel variation, the frequency band is segmented into sub-bands, and the LLDR is computed within each sub-band individually. Simulation results indicate that the proposed scheme attains a 96.13% identification accuracy for 30 user equipments (UEs) within a 20-path channel under a signal-to-noise ratio (SNR) of 20 dB. Furthermore, we evaluate the theoretical latency using the Roofline model, resulting in the air interface latency of 0.491 ms, which satisfies ultra-reliable and low-latency communications (URLLC) latency requirements.
著者: Zixun Xiong, Gaoyi Wu, Qingyang Yu, Mingyu Derek Ma, Lingfeng Yao, Miao Pan, Xiaojiang Du, Hao Wang
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Given the high cost of large language model (LLM) training from scratch, safeguarding LLM intellectual property (IP) has become increasingly crucial. As the standard paradigm for IP ownership verification, LLM fingerprinting thus plays a vital role in addressing this challenge. Existing LLM fingerprinting methods verify ownership by extracting or injecting model-specific features. However, they overlook potential attacks during the verification process, leaving them ineffective when the model thief fully controls the LLM's inference process. In such settings, attackers may share prompt-response pairs to enable fingerprint unlearning or manipulate outputs to evade exact-match verification. We propose iSeal, the first fingerprinting method designed for reliable verification when the model thief controls the suspected LLM in an end-to-end manner. It injects unique features into both the model and an external module, reinforced by an error-correction mechanism and a similarity-based verification strategy. These components are resistant to verification-time attacks, including collusion-based fingerprint unlearning and response manipulation, backed by both theoretical analysis and empirical results. iSeal achieves 100 percent Fingerprint Success Rate (FSR) on 12 LLMs against more than 10 attacks, while baselines fail under unlearning and response manipulations.
intellectual property
著者: Yunfei Yang, Xiaojun Chen, Yuexin Xuan, Zhendong Zhao, Xin Zhao, He Li
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Model watermarking techniques can embed watermark information into the protected model for ownership declaration by constructing specific input-output pairs. However, existing watermarks are easily removed when facing model stealing attacks, and make it difficult for model owners to effectively verify the copyright of stolen models. In this paper, we analyze the root cause of the failure of current watermarking methods under model stealing scenarios and then explore potential solutions. Specifically, we introduce a robust watermarking framework, DeepTracer, which leverages a novel watermark samples construction method and a same-class coupling loss constraint. DeepTracer can incur a high-coupling model between watermark task and primary task that makes adversaries inevitably learn the hidden watermark task when stealing the primary task functionality. Furthermore, we propose an effective watermark samples filtering mechanism that elaborately select watermark key samples used in model ownership verification to enhance the reliability of watermarks. Extensive experiments across multiple datasets and models demonstrate that our method surpasses existing approaches in defending against various model stealing attacks, as well as watermark attacks, and achieves new state-of-the-art effectiveness and robustness.
privacy
著者: Farjana Yesmin
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Healthcare federated learning requires strong privacy guarantees while maintaining computational efficiency across resource-constrained medical institutions. This paper presents MedHE, a novel framework combining adaptive gradient sparsification with CKKS homomorphic encryption to enable privacy-preserving collaborative learning on sensitive medical data. Our approach introduces a dynamic threshold mechanism with error compensation for top-k gradient selection, achieving 97.5 percent communication reduction while preserving model utility. We provide formal security analysis under Ring Learning with Errors assumptions and demonstrate differential privacy guarantees with epsilon less than or equal to 1.0. Statistical testing across 5 independent trials shows MedHE achieves 89.5 percent plus or minus 0.8 percent accuracy, maintaining comparable performance to standard federated learning (p=0.32) while reducing communication from 1277 MB to 32 MB per training round. Comprehensive evaluation demonstrates practical feasibility for real-world medical deployments with HIPAA compliance and scalability to 100 plus institutions.
著者: Parsa Hedayatnia, Tina Tavakkoli, Hadi Amini, Mohammad Allahbakhsh, Haleh Amintoosi
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Smart contracts concentrate high value assets and complex logic in small, immutable programs, where even minor bugs can cause major losses. Existing taxonomies and tools remain fragmented, organized around symptoms such as reentrancy rather than structural causes. This paper introduces an attack-centric, program-structure taxonomy that unifies Solidity vulnerabilities into eight root-cause families covering control flow, external calls, state integrity, arithmetic safety, environmental dependencies, access control, input validation, and cross-domain protocol assumptions. Each family is illustrated through concise Solidity examples, exploit mechanics, and mitigations, and linked to the detection signals observable by static, dynamic, and learning-based tools. We further cross-map legacy datasets (SmartBugs, SolidiFI) to this taxonomy to reveal label drift and coverage gaps. The taxonomy provides a consistent vocabulary and practical checklist that enable more interpretable detection, reproducible audits, and structured security education for both researchers and practitioners.
著者: Everton de Matos, Hazaa Alameri, Willian Tessaro Lunardi, Martin Andreoni, Eduardo Viegas
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Edge computing pushes computation closer to data sources, but it also expands the attack surface on resource-constrained devices. This work explores the deployment of the Lightweight Deep Anomaly Detection for Network Traffic (LDPI) integrated as an isolated service within a virtualization framework that provides security by separation. LDPI, adopting a Deep Learning approach, achieved strong training performance, reaching AUC 0.999 (5-fold mean) across the evaluated packet-window settings (n, l), with high F1 at conservative operating points. We deploy LDPI on a laptop-class edge node and evaluate its overhead and performance in two scenarios: (i) comparing it with representative signature-based IDSes (Suricata and Snort) deployed on the same framework under identical workloads, and (ii) while detecting network flooding attacks.
著者: Taifeng Liu, Xinjing Liu, Liangqiu Dong, Yang Liu, Yilong Yang, Zhuo Ma
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Current adversarial examples (AEs) are typically designed for static models. However, with the wide application of Class-Incremental Learning (CIL), models are no longer static and need to be updated with new data distributed and labeled differently from the old ones. As a result, existing AEs often fail after CIL updates due to significant domain drift. In this paper, we propose SAE to enhance the sustainability of AEs against CIL. The core idea of SAE is to enhance the robustness of AE semantics against domain drift by making them more similar to the target class while distinguishing them from all other classes. Achieving this is challenging, as relying solely on the initial CIL model to optimize AE semantics often leads to overfitting. To resolve the problem, we propose a Semantic Correction Module. This module encourages the AE semantics to be generalized, based on a visual-language model capable of producing universal semantics. Additionally, it incorporates the CIL model to correct the optimization direction of the AE semantics, guiding them closer to the target class. To further reduce fluctuations in AE semantics, we propose a Filtering-and-Augmentation Module, which first identifies non-target examples with target-class semantics in the latent space and then augments them to foster more stable semantics. Comprehensive experiments demonstrate that SAE outperforms baselines by an average of 31.28% when updated with a 9-fold increase in the number of classes.
privacy
著者: Luis Del Vasto-Terrientes
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Multiple-Criteria Decision Making (MCDM) is a sub-discipline of Operations Research that helps decision-makers in choosing, ranking, or sorting alternatives based on conflicting criteria. Over time, its application has been expanded into dynamic and data-driven domains, such as recommender systems. In these contexts, the availability and handling of personal and sensitive data can play a critical role in the decision-making process. Despite this increased reliance on sensitive data, the integration of privacy mechanisms with MCDM methods is underdeveloped. This paper introduces an integrated approach that combines MCDM outranking methods with Differential Privacy (DP), safeguarding individual contributions' privacy in ranking problems. This approach relies on a pre-processing step to aggregate multiple user evaluations into a comprehensive performance matrix. The evaluation results show a strong to very strong statistical correlation between the true rankings and their anonymized counterparts, ensuring robust privacy parameter guarantees.
著者: Zexu Wang, Jiachi Chen, Zewei Lin, Wenqing Chen, Kaiwen Ning, Jianxing Yu, Yuming Feng, Yu Zhang, Weizhe Zhang, Zibin Zheng
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Smart contracts have significantly advanced blockchain technology, and digital signatures are crucial for reliable verification of contract authority. Through signature verification, smart contracts can ensure that signers possess the required permissions, thus enhancing security and scalability. However, lacking checks on signature usage conditions can lead to repeated verifications, increasing the risk of permission abuse and threatening contract assets. We define this issue as the Signature Replay Vulnerability (SRV). In this paper, we conducted the first empirical study to investigate the causes and characteristics of the SRVs. From 1,419 audit reports across 37 blockchain security companies, we identified 108 with detailed SRV descriptions and classified five types of SRVs. To detect these vulnerabilities automatically, we designed LASiR, which utilizes the general semantic understanding ability of Large Language Models (LLMs) to assist in the static taint analysis of the signature state and identify the signature reuse behavior. It also employs path reachability verification via symbolic execution to ensure effective and reliable detection. To evaluate the performance of LASiR, we conducted large-scale experiments on 15,383 contracts involving signature verification, selected from the initial dataset of 918,964 contracts across four blockchains: Ethereum, Binance Smart Chain, Polygon, and Arbitrum. The results indicate that SRVs are widespread, with affected contracts holding $4.76 million in active assets. Among these, 19.63% of contracts that use signatures on Ethereum contain SRVs. Furthermore, manual verification demonstrates that LASiR achieves an F1-score of 87.90% for detection. Ablation studies and comparative experiments reveal that the semantic information provided by LLMs aids static taint analysis, significantly enhancing LASiR's detection performance.
backdoor
著者: Jian Wang, Hong Shen, Chan-Tong Lam
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Traditional distributed backdoor attacks (DBA) in federated learning improve stealthiness by decomposing global triggers into sub-triggers, which however requires more poisoned data to maintian the attck strength and hence increases the exposure risk. To overcome this defect, This paper proposes a novel method, namely Fractal-Triggerred Distributed Backdoor Attack (FTDBA), which leverages the self-similarity of fractals to enhance the feature strength of sub-triggers and hence significantly reduce the required poisoning volume for the same attack strength. To address the detectability of fractal structures in the frequency and gradient domains, we introduce a dynamic angular perturbation mechanism that adaptively adjusts perturbation intensity across the training phases to balance efficiency and stealthiness. Experiments show that FTDBA achieves a 92.3\% attack success rate with only 62.4\% of the poisoning volume required by traditional DBA methods, while reducing the detection rate by 22.8\% and KL divergence by 41.2\%. This study presents a low-exposure, high-efficiency paradigm for federated backdoor attacks and expands the application of fractal features in adversarial sample generation.
著者: Seunghyeon Lee, Hyunmin Seo, Hwanjo Heo, Anduo Wang, Seungwon Shin, Jinwoo Kim
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Modern enterprise networks comprise diverse and heterogeneous systems that support a wide range of services, making it challenging for administrators to track and analyze sophisticated attacks such as advanced persistent threats (APTs), which often exploit multiple vectors. To address this challenge, we introduce the concept of network-level security provenance, which enables the systematic establishment of causal relationships across hosts at the network level, facilitating the accurate identification of the root causes of security incidents. Building on this concept, we present SecTracer as a framework for a network-wide provenance analysis. SecTracer offers three main contributions: (i) comprehensive and efficient forensic data collection in enterprise networks via software-defined networking (SDN), (ii) reconstruction of attack histories through provenance graphs to provide a clear and interpretable view of intrusions, and (iii) proactive attack prediction using probabilistic models. We evaluated the effectiveness and efficiency of SecTracer through a real-world APT simulation, demonstrating its capability to enhance threat mitigation while introducing less than 1% network throughput overhead and negligible latency impact.
著者: Min Liang, Ruihao Gao, Jiali Wu
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Key-length extension (KLE) techniques provide a general approach to enhancing the security of block ciphers by using longer keys. There are mainly two classes of KLE techniques, cascade encryption and XOR-cascade encryption. This paper presents several quantum meet-in-the-middle (MITM) attacks against two specific KLE constructions. For the two-key triple encryption (2kTE), we propose two quantum MITM attacks under the Q2 model. The first attack, leveraging the quantum claw-finding (QCF) algorithm, achieves a time complexity of $O(2^{2\kappa/3})$ with $O(2^{2\kappa/3})$ quantum random access memory (QRAM). The second attack, based on Grover's algorithm, achieves a time complexity of $O(2^{\kappa/2})$ with $O(2^\kappa)$ QRAM. The latter complexity is nearly identical to Grover-based brute-force attack on the underlying block cipher, indicating that 2kTE does not enhance security under the Q2 model when sufficient QRAM resources are available. For the 3XOR-cascade encryption (3XCE), we propose a quantum MITM attack applicable to the Q1 model. This attack requires no QRAM and has a time complexity of $O(2^{(\kappa+n)/2})$ ($\kappa$ and $n$ are the key length and block length of the underlying block cipher, respectively.), achieving a quadratic speedup over classical MITM attack. Furthermore, we extend the quantum MITM attack to quantum sieve-in-the-middle (SITM) attack, which is applicable for more constructions. We present a general quantum SITM framework for the construction $ELE=E^2\circ L\circ E^1$ and provide specific attack schemes for three different forms of the middle layer $L$. The quantum SITM attack technique can be further applied to a broader range of quantum cryptanalysis scenarios.
著者: Muhammed El Mustaqeem Mazelan, Noor Hazlina Abdul, Nouar AlDahoul
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Password security plays a crucial role in cybersecurity, yet traditional password strength meters, which rely on static rules like character-type requirements, often fail. Such methods are easily bypassed by common password patterns (e.g., 'P@ssw0rd1!'), giving users a false sense of security. To address this, we implement and evaluate a password strength scoring system by comparing four machine learning models: Random Forest (RF), Support Vector Machine (SVM), a Convolutional Neural Network (CNN), and Logistic Regression with a dataset of over 660,000 real-world passwords. Our primary contribution is a novel hybrid feature engineering approach that captures nuanced vulnerabilities missed by standard metrics. We introduce features like leetspeak-normalized Shannon entropy to assess true randomness, pattern detection for keyboard walks and sequences, and character-level TF-IDF n-grams to identify frequently reused substrings from breached password datasets. our RF model achieved superior performance, achieving 99.12% accuracy on a held-out test set. Crucially, the interpretability of the Random Forest model allows for feature importance analysis, providing a clear pathway to developing security tools that offer specific, actionable feedback to users. This study bridges the gap between predictive accuracy and practical usability, resulting in a high-performance scoring system that not only reduces password-based vulnerabilities but also empowers users to make more informed security decisions.
著者: Abhirup Das, Pranav Dudani, Shruti Sharma, Ravi Kumar C. V
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
In today's digital world, which has many different types of media, steganography, the art of secret communication, has a lot of problems to deal with. Traditional methods are often fixed and only work with one type of carrier media. This means they don't work well with all the different types of media that are out there. This system doesn't send data to "weak" or easily detectable carriers because it can't adapt. This makes the system less safe and less secret in general. This paper proposes a novel Intelligent Carrier Allocation framework founded on a Cross-Modal Reasoning (CMR) Engine. This engine looks at a wide range of carriers, such as images, audio, and text, to see if they are good for steganography. It uses important measurements like entropy, signal complexity, and vocabulary richness to come up with a single reliability score for each modality. The framework uses these scores to fairly and intelligently share the secret bitstream, giving more data to carriers that are thought to be stronger and more complex. This adaptive allocation strategy makes the system as hard to find as possible and as strong as possible against steganalysis. We demonstrate that this reasoning-based approach is more secure and superior in data protection compared to static, non-adaptive multimodal techniques. This makes it possible to build stronger and smarter secret communication systems.
著者: Chung Peng Lee, Rachel Hong, Harry Jiang, Aster Plotnik, William Agnew, Jamie Morgenstern
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
The internet has become the main source of data to train modern text-to-image or vision-language models, yet it is increasingly unclear whether web-scale data collection practices for training AI systems adequately respect data owners' wishes. Ignoring the owner's indication of consent around data usage not only raises ethical concerns but also has recently been elevated into lawsuits around copyright infringement cases. In this work, we aim to reveal information about data owners' consent to AI scraping and training, and study how it's expressed in DataComp, a popular dataset of 12.8 billion text-image pairs. We examine both the sample-level information, including the copyright notice, watermarking, and metadata, and the web-domain-level information, such as a site's Terms of Service (ToS) and Robots Exclusion Protocol. We estimate at least 122M of samples exhibit some indication of copyright notice in CommonPool, and find that 60\% of the samples in the top 50 domains come from websites with ToS that prohibit scraping. Furthermore, we estimate 9-13\% with 95\% confidence interval of samples from CommonPool to contain watermarks, where existing watermark detection methods fail to capture them in high fidelity. Our holistic methods and findings show that data owners rely on various channels to convey data consent, of which current AI data collection pipelines do not entirely respect. These findings highlight the limitations of the current dataset curation/release practice and the need for a unified data consent framework taking AI purposes into consideration.
privacy
著者: David Sanchez Jr., Holly Lopez, Michelle Buraczyk, Anantaa Kotal
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
As machine learning systems move from theory to practice, they are increasingly tasked with decisions that affect healthcare access, financial opportunities, hiring, and public services. In these contexts, accuracy is only one piece of the puzzle - models must also be fair to different groups, protect individual privacy, and remain accountable to stakeholders. Achieving all three is difficult: differential privacy can unintentionally worsen disparities, fairness interventions often rely on sensitive data that privacy restricts, and automated pipelines ignore that fairness is ultimately a human and contextual judgment. We introduce FAIRPLAI (Fair and Private Learning with Active Human Influence), a practical framework that integrates human oversight into the design and deployment of machine learning systems. FAIRPLAI works in three ways: (1) it constructs privacy-fairness frontiers that make trade-offs between accuracy, privacy guarantees, and group outcomes transparent; (2) it enables interactive stakeholder input, allowing decision-makers to select fairness criteria and operating points that reflect their domain needs; and (3) it embeds a differentially private auditing loop, giving humans the ability to review explanations and edge cases without compromising individual data security. Applied to benchmark datasets, FAIRPLAI consistently preserves strong privacy protections while reducing fairness disparities relative to automated baselines. More importantly, it provides a straightforward, interpretable process for practitioners to manage competing demands of accuracy, privacy, and fairness in socially impactful applications. By embedding human judgment where it matters most, FAIRPLAI offers a pathway to machine learning systems that are effective, responsible, and trustworthy in practice. GitHub: https://github.com/Li1Davey/Fairplai
著者: Xincheng Xu, Thilina Ranbaduge, Qing Wang, Thierry Rakotoarivelo, David Smith
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Differentially Private Stochastic Gradient Descent (DPSGD) is widely used to train deep neural networks with formal privacy guarantees. However, the addition of differential privacy (DP) often degrades model accuracy by introducing both noise and bias. Existing techniques typically address only one of these issues, as reducing DP noise can exacerbate clipping bias and vice-versa. In this paper, we propose a novel method, \emph{DP-PMLF}, which integrates per-sample momentum with a low-pass filtering strategy to simultaneously mitigate DP noise and clipping bias. Our approach uses per-sample momentum to smooth gradient estimates prior to clipping, thereby reducing sampling variance. It further employs a post-processing low-pass filter to attenuate high-frequency DP noise without consuming additional privacy budget. We provide a theoretical analysis demonstrating an improved convergence rate under rigorous DP guarantees, and our empirical evaluations reveal that DP-PMLF significantly enhances the privacy-utility trade-off compared to several state-of-the-art DPSGD variants.
agent
著者: Eren Kurshan, Yuan Xie, Paul Franzon
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
AI systems have found a wide range of real-world applications in recent years. The adoption of edge artificial intelligence, embedding AI directly into edge devices, is rapidly growing. Despite the implementation of guardrails and safety mechanisms, security vulnerabilities and challenges have become increasingly prevalent in this domain, posing a significant barrier to the practical deployment and safety of AI systems. This paper proposes an agentic AI safety architecture that leverages 3D to integrate a dedicated safety layer. It introduces an adaptive AI safety infrastructure capable of dynamically learning and mitigating attacks against the AI system. The system leverages the inherent advantages of co-location with the edge computing hardware to continuously monitor, detect and proactively mitigate threats to the AI system. The integration of local processing and learning capabilities enhances resilience against emerging network-based attacks while simultaneously improving system reliability, modularity, and performance, all with minimal cost and 3D integration overhead.
backdoor
著者: Kazuki Iwahana, Yusuke Yamasaki, Akira Ito, Takayuki Miura, Toshiki Shibahara
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Backdoor attacks pose a critical threat to machine learning models, causing them to behave normally on clean data but misclassify poisoned data into a poisoned class. Existing defenses often attempt to identify and remove backdoor neurons based on Trigger-Activated Changes (TAC) which is the activation differences between clean and poisoned data. These methods suffer from low precision in identifying true backdoor neurons due to inaccurate estimation of TAC values. In this work, we propose a novel backdoor removal method by accurately reconstructing TAC values in the latent representation. Specifically, we formulate the minimal perturbation that forces clean data to be classified into a specific class as a convex quadratic optimization problem, whose optimal solution serves as a surrogate for TAC. We then identify the poisoned class by detecting statistically small $L^2$ norms of perturbations and leverage the perturbation of the poisoned class in fine-tuning to remove backdoors. Experiments on CIFAR-10, GTSRB, and TinyImageNet demonstrated that our approach consistently achieves superior backdoor suppression with high clean accuracy across different attack types, datasets, and architectures, outperforming existing defense methods.
agent
著者: Tim Dudman, Martyn Bull
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Recent advances in deep reinforcement learning for autonomous cyber defence have resulted in agents that can successfully defend simulated computer networks against cyber-attacks. However, many of these agents would need retraining to defend networks with differing topology or size, making them poorly suited to real-world networks where topology and size can vary over time. In this research we introduce a novel set of Topological Extensions for Reinforcement Learning Agents (TERLA) that provide generalisability for the defence of networks with differing topology and size, without the need for retraining. Our approach involves the use of heterogeneous graph neural network layers to produce a fixed-size latent embedding representing the observed network state. This representation learning stage is coupled with a reduced, fixed-size, semantically meaningful and interpretable action space. We apply TERLA to a standard deep reinforcement learning Proximal Policy Optimisation (PPO) agent model, and to reduce the sim-to-real gap, conduct our research using Cyber Autonomy Gym for Experimentation (CAGE) Challenge 4. This Cyber Operations Research Gym environment has many of the features of a real-world network, such as realistic Intrusion Detection System (IDS) events and multiple agents defending network segments of differing topology and size. TERLA agents retain the defensive performance of vanilla PPO agents whilst showing improved action efficiency. Generalisability has been demonstrated by showing that all TERLA agents have the same network-agnostic neural network architecture, and by deploying a single TERLA agent multiple times to defend network segments with differing topology and size, showing improved defensive performance and efficiency.
著者: Stefano Balietti, Pietro Saggese, Stefan Kitzler, Bernhard Haslhofer
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
This chapter explores how Decentralized Autonomous Organizations (DAOs), a novel institutional form based on blockchain technology, challenge traditional centralized governance structures. DAOs govern projects ranging from finance to science and digital communities. They aim to redistribute decision-making power through programmable, transparent, and participatory mechanisms. This chapter outlines both the opportunities DAOs present, such as incentive alignment, rapid coordination, and censorship resistance, and the challenges they face, including token concentration, low participation, and the risk of de facto centralization. It further discusses the emerging intersection of DAOs and artificial intelligence, highlighting the potential for increased automation alongside the dangers of diminished human oversight and algorithmic opacity. Ultimately, we discuss under what circumstances DAOs can fulfill their democratic promise or risk replicating the very power asymmetries they seek to overcome.
著者: Zhuoqun Huang, Neil G. Marchant, Olga Ohrimenko, Benjamin I. P. Rubinstein
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
We consider the problem of certified robustness for sequence classification against edit distance perturbations. Naturally occurring inputs of varying lengths (e.g., sentences in natural language processing tasks) present a challenge to current methods that employ fixed-rate deletion mechanisms and lead to suboptimal performance. To this end, we introduce AdaptDel methods with adaptable deletion rates that dynamically adjust based on input properties. We extend the theoretical framework of randomized smoothing to variable-rate deletion, ensuring sound certification with respect to edit distance. We achieve strong empirical results in natural language tasks, observing up to 30 orders of magnitude improvement to median cardinality of the certified region, over state-of-the-art certifications.
著者: Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, Neil Zhenqiang Gong
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
A prompt injection attack aims to inject malicious instruction/data into the input of an LLM-Integrated Application such that it produces results as an attacker desires. Existing works are limited to case studies. As a result, the literature lacks a systematic understanding of prompt injection attacks and their defenses. We aim to bridge the gap in this work. In particular, we propose a framework to formalize prompt injection attacks. Existing attacks are special cases in our framework. Moreover, based on our framework, we design a new attack by combining existing ones. Using our framework, we conduct a systematic evaluation on 5 prompt injection attacks and 10 defenses with 10 LLMs and 7 tasks. Our work provides a common benchmark for quantitatively evaluating future prompt injection attacks and defenses. To facilitate research on this topic, we make our platform public at https://github.com/liu00222/Open-Prompt-Injection.
privacy
著者: Tatsuki Koga, Ruihan Wu, Zhiyuan Zhang, Kamalika Chaudhuri
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
With the recent remarkable advancement of large language models (LLMs), there has been a growing interest in utilizing them in the domains with highly sensitive data that lies outside their training data. For this purpose, retrieval-augmented generation (RAG) is particularly effective -- it assists LLMs by directly providing relevant information from the external knowledge sources. However, without extra privacy safeguards, RAG outputs risk leaking sensitive information from the external data source. In this work, we explore RAG under differential privacy (DP), a formal guarantee of data privacy. The main challenge with differentially private RAG is how to generate long accurate answers within a moderate privacy budget. We address this by proposing an algorithm that smartly spends privacy budget only for the tokens that require the sensitive information and uses the non-private LLM for other tokens. Our extensive empirical evaluations reveal that our algorithm outperforms the non-RAG baseline under a reasonable privacy budget of $\epsilon\approx 10$ across different models and datasets.
agent
著者: Jiawei Zhang, Shuang Yang, Bo Li
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Large Language Model (LLM) agents equipped with external tools have become increasingly powerful for complex tasks such as web shopping, automated email replies, and financial trading. However, these advancements amplify the risks of adversarial attacks, especially when agents can access sensitive external functionalities. Nevertheless, manipulating LLM agents into performing targeted malicious actions or invoking specific tools remains challenging, as these agents extensively reason or plan before executing final actions. In this work, we present UDora, a unified red teaming framework designed for LLM agents that dynamically hijacks the agent's reasoning processes to compel malicious behavior. Specifically, UDora first generates the model's reasoning trace for the given task, then automatically identifies optimal points within this trace to insert targeted perturbations. The resulting perturbed reasoning is then used as a surrogate response for optimization. By iteratively applying this process, the LLM agent will then be induced to undertake designated malicious actions or to invoke specific malicious tools. Our approach demonstrates superior effectiveness compared to existing methods across three LLM agent datasets. The code is available at https://github.com/AI-secure/UDora.
著者: Emanuele Mezzi, Fabio Massacci, Katja Tuma
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Several recent works have argued that Large Language Models (LLMs) can be used to tame the data deluge in the cybersecurity field, by improving the automation of Cyber Threat Intelligence (CTI) tasks. This work presents an evaluation methodology that other than allowing to test LLMs on CTI tasks when using zero-shot learning, few-shot learning and fine-tuning, also allows to quantify their consistency and their confidence level. We run experiments with three state-of-the-art LLMs and a dataset of 350 threat intelligence reports and present new evidence of potential security risks in relying on LLMs for CTI. We show how LLMs cannot guarantee sufficient performance on real-size reports while also being inconsistent and overconfident. Few-shot learning and fine-tuning only partially improve the results, thus posing doubts about the possibility of using LLMs for CTI scenarios, where labelled datasets are lacking and where confidence is a fundamental factor.
著者: Yupei Liu, Yuqi Jia, Jinyuan Jia, Dawn Song, Neil Zhenqiang Gong
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
LLM-integrated applications and agents are vulnerable to prompt injection attacks, where an attacker injects prompts into their inputs to induce attacker-desired outputs. A detection method aims to determine whether a given input is contaminated by an injected prompt. However, existing detection methods have limited effectiveness against state-of-the-art attacks, let alone adaptive ones. In this work, we propose DataSentinel, a game-theoretic method to detect prompt injection attacks. Specifically, DataSentinel fine-tunes an LLM to detect inputs contaminated with injected prompts that are strategically adapted to evade detection. We formulate this as a minimax optimization problem, with the objective of fine-tuning the LLM to detect strong adaptive attacks. Furthermore, we propose a gradient-based method to solve the minimax optimization problem by alternating between the inner max and outer min problems. Our evaluation results on multiple benchmark datasets and LLMs show that DataSentinel effectively detects both existing and adaptive prompt injection attacks.
intellectual property
著者: Alexander Nemecek, Yuzhou Jiang, Erman Ayday
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Large language models (LLMs) are increasingly integrated into academic workflows, with many conferences and journals permitting their use for tasks such as language refinement and literature summarization. However, their use in peer review remains prohibited due to concerns around confidentiality breaches, hallucinated content, and inconsistent evaluations. As LLM-generated text becomes more indistinguishable from human writing, there is a growing need for reliable attribution mechanisms to preserve the integrity of the review process. In this work, we evaluate topic-based watermarking (TBW), a semantic-aware technique designed to embed detectable signals into LLM-generated text. We conduct a systematic assessment across multiple LLM configurations, including base, few-shot, and fine-tuned variants, using authentic peer review data from academic conferences. Our results show that TBW maintains review quality relative to non-watermarked outputs, while demonstrating robust detection performance under paraphrasing. These findings highlight the viability of TBW as a minimally intrusive and practical solution for LLM attribution in peer review settings.
backdoor
著者: Haowei Wang, Rupeng Zhang, Junjie Wang, Mingyang Li, Yuekai Huang, Dandan Wang, Qing Wang
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Retrieval-Augmented Generation (RAG) systems enhance Large Language Models (LLMs) by retrieving relevant documents from external corpora before generating responses. This approach significantly expands LLM capabilities by leveraging vast, up-to-date external knowledge. However, this reliance on external knowledge makes RAG systems vulnerable to corpus poisoning attacks that manipulate generated outputs via poisoned document injection. Existing poisoning attack strategies typically treat the retrieval and generation stages as disjointed, limiting their effectiveness. We propose Joint-GCG, the first framework to unify gradient-based attacks across both retriever and generator models through three innovations: (1) Cross-Vocabulary Projection for aligning embedding spaces, (2) Gradient Tokenization Alignment for synchronizing token-level gradient signals, and (3) Adaptive Weighted Fusion for dynamically balancing attacking objectives. Evaluations demonstrate that Joint-GCG achieves at most 25% and an average of 5% higher attack success rate than previous methods across multiple retrievers and generators. While optimized under a white-box assumption, the generated poisons show unprecedented transferability to unseen models. Joint-GCG's innovative unification of gradient-based attacks across retrieval and generation stages fundamentally reshapes our understanding of vulnerabilities within RAG systems. Our code is available at https://github.com/NicerWang/Joint-GCG.
著者: Arin Upadhyay
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
We propose EFPIX (Encrypted Flood Protocol for Information eXchange), a flood-based relay communication protocol that achieves end-to-end encryption, plausible deniability for users, and untraceable messages while hiding metadata, such as sender and receiver, from those not involved. It also has built-in spam resistance and multiple optional enhancements. It can be used in privacy-critical communication, infrastructure-loss scenarios, space/research/military communication, where central servers are infeasible, or general-purpose messaging.
intellectual property
著者: De Zhang Lee, Han Fang, Hanyi Wang, Ee-Chien Chang
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Digital watermarks can be embedded into AI-generated content (AIGC) by initializing the generation process with starting points sampled from a secret distribution. When combined with pseudorandom error-correcting codes, such watermarked outputs can remain indistinguishable from unwatermarked objects, while maintaining robustness under whitenoise. In this paper, we go beyond indistinguishability and investigate security under removal attacks. We demonstrate that indistinguishability alone does not necessarily guarantee resistance to adversarial removal. Specifically, we propose a novel attack that exploits boundary information leaked by the locations of watermarked objects. This attack significantly reduces the distortion required to remove watermarks -- by up to a factor of $15 \times$ compared to a baseline whitenoise attack under certain settings. To mitigate such attacks, we introduce a defense mechanism that applies a secret transformation to hide the boundary, and prove that the secret transformation effectively rendering any attacker's perturbations equivalent to those of a naive whitenoise adversary. Our empirical evaluations, conducted on multiple versions of Stable Diffusion, validate the effectiveness of both the attack and the proposed defense, highlighting the importance of addressing boundary leakage in latent-based watermarking schemes.
著者: Devashish Chaudhary, Sutharshan Rajasegarar, Shiva Raj Pokhrel
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
This survey explores the integration of Federated Learning (FL) with Network Intrusion Detection Systems (NIDS), with particular emphasis on deep learning and quantum machine learning approaches. FL enables collaborative model training across distributed devices while preserving data privacy-a critical requirement in network security contexts where sensitive traffic data cannot be centralized. Our comprehensive analysis systematically examines the full spectrum of FL architectures, deployment strategies, communication protocols, and aggregation methods specifically tailored for intrusion detection. We provide an in-depth investigation of privacy-preserving techniques, model compression approaches, and attack-specific federated solutions for threats including DDoS, MITM, and botnet attacks. The survey further delivers a pioneering exploration of Quantum FL (QFL), discussing quantum feature encoding, quantum machine learning algorithms, and quantum-specific aggregation methods that promise exponential speedups for complex pattern recognition in network traffic. Through rigorous comparative analysis of classical and quantum approaches, identification of research gaps, and evaluation of real-world deployments, we outline a concrete roadmap for industrial adoption and future research directions. This work serves as an authoritative reference for researchers and practitioners seeking to enhance privacy, efficiency, and robustness of federated intrusion detection systems in increasingly complex network environments, while preparing for the quantum-enhanced cybersecurity landscape of tomorrow.
著者: Yupei Liu, Yanting Wang, Yuqi Jia, Jinyuan Jia, Neil Zhenqiang Gong
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Prompt injection attacks pose a pervasive threat to the security of Large Language Models (LLMs). State-of-the-art prevention-based defenses typically rely on fine-tuning an LLM to enhance its security, but they achieve limited effectiveness against strong attacks. In this work, we propose \emph{SecInfer}, a novel defense against prompt injection attacks built on \emph{inference-time scaling}, an emerging paradigm that boosts LLM capability by allocating more compute resources for reasoning during inference. SecInfer consists of two key steps: \emph{system-prompt-guided sampling}, which generates multiple responses for a given input by exploring diverse reasoning paths through a varied set of system prompts, and \emph{target-task-guided aggregation}, which selects the response most likely to accomplish the intended task. Extensive experiments show that, by leveraging additional compute at inference, SecInfer effectively mitigates both existing and adaptive prompt injection attacks, outperforming state-of-the-art defenses as well as existing inference-time scaling approaches.
著者: Daniyal Ganiuly, Assel Smaiyl
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Large Language Models (LLMs) are increasingly used in intelligent systems that perform reasoning, summarization, and code generation. Their ability to follow natural-language instructions, while powerful, also makes them vulnerable to a new class of attacks known as prompt injection. In these attacks, hidden or malicious instructions are inserted into user inputs or external content, causing the model to ignore its intended task or produce unsafe responses. This study proposes a unified framework for evaluating how resistant Large Language Models (LLMs) are to prompt injection attacks. The framework defines three complementary metrics such as the Resilience Degradation Index (RDI), Safety Compliance Coefficient (SCC), and Instructional Integrity Metric (IIM) to jointly measure robustness, safety, and semantic stability. We evaluated four instruction-tuned models (GPT-4, GPT-4o, LLaMA-3 8B Instruct, and Flan-T5-Large) on five common language tasks: question answering, summarization, translation, reasoning, and code generation. Results show that GPT-4 performs best overall, while open-weight models remain more vulnerable. The findings highlight that strong alignment and safety tuning are more important for resilience than model size alone. Results show that all models remain partially vulnerable, especially to indirect and direct-override attacks. GPT-4 achieved the best overall resilience (RDR = 9.8 %, SCR = 96.4 %), while open-source models exhibited higher performance degradation and lower safety scores. The findings demonstrate that alignment strength and safety tuning play a greater role in resilience than model size alone. The proposed framework offers a structured, reproducible approach for assessing model robustness and provides practical insights for improving LLM safety and reliability.
著者: Evgenios Gkritsis, Constantinos Patsakis, George Stergiopoulos
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Anti-malware systems rely on sandboxes, hooks, and telemetry pipelines, including collection agents, serializers, and database backends, to monitor program and system behavior. We show that these data-handling components constitute an exploitable attack surface that can lead to denial-of-analysis (DoA) states without disabling sensors or requiring elevated privileges. As a result, we present Telemetry Complexity Attacks (TCAs), a new class of vulnerabilities that exploit fundamental mismatches between unbounded collection mechanisms and bounded processing capabilities. Our method recursively spawns child processes to generate specially crafted, deeply nested, and oversized telemetry that stresses serialization and storage boundaries, as well as visualization layers, for example, JSON/BSON depth and size limits. Depending on the product, this leads to various inconsistent results, such as truncated or missing behavioral reports, rejected database inserts, serializer recursion and size errors, and unresponsive dashboards. In the latter cases, depending on the solution, the malware under test is either not recorded and/or not presented to the analysts. Therefore, instead of evading sensors, we break the pipeline that stores the data captured by the sensors. We evaluate our technique against twelve commercial and open-source malware analysis platforms and endpoint detection and response (EDR) solutions. Seven products fail in different stages of the telemetry pipeline; two vendors assigned CVE identifiers (CVE-2025-61301 and CVE-2025-61303), and others issued patches or configuration changes. We discuss root causes and propose mitigation strategies to prevent DoA attacks triggered by adversarial telemetry.
著者: Jiayi Fu, Qiyao Sun
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Large language models (LLMs) demonstrate strong capabilities in solving complex tasks when integrated with external tools. The Model Context Protocol (MCP) has become a standard interface for enabling such tool-based interactions. However, these interactions introduce substantial security concerns, particularly when the MCP server is compromised or untrustworthy. While prior benchmarks primarily focus on prompt injection attacks or analyze the vulnerabilities of LLM MCP interaction trajectories, limited attention has been given to the underlying system logs associated with malicious MCP servers. To address this gap, we present the first synthetic benchmark for evaluating LLMs ability to identify security risks from system logs. We define nine categories of MCP server risks and generate 1,800 synthetic system logs using ten state-of-the-art LLMs. These logs are embedded in the return values of 243 curated MCP servers, yielding a dataset of 2,421 chat histories for training and 471 queries for evaluation. Our pilot experiments reveal that smaller models often fail to detect risky system logs, leading to high false negatives. While models trained with supervised fine-tuning (SFT) tend to over-flag benign logs, resulting in elevated false positives, Reinforcement Learning from Verifiable Reward (RLVR) offers a better precision-recall balance. In particular, after training with Group Relative Policy Optimization (GRPO), Llama3.1-8B-Instruct achieves 83% accuracy, surpassing the best-performing large remote model by 9 percentage points. Fine-grained, per-category analysis further underscores the effectiveness of reinforcement learning in enhancing LLM safety within the MCP framework. Code and data are available at: https://github.com/PorUna-byte/MCP-RiskCue
privacy
著者: Tianle Song, Chenhao Lin, Yang Cao, Zhengyu Zhao, Jiahao Sun, Chong Zhang, Le Yang, Chao Shen
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Mobile motion sensors such as accelerometers and gyroscopes are now ubiquitously accessible by third-party apps via standard APIs. While enabling rich functionalities like activity recognition and step counting, this openness has also enabled unregulated inference of sensitive user traits, such as gender, age, and even identity, without user consent. Existing privacy-preserving techniques, such as GAN-based obfuscation or differential privacy, typically require access to the full input sequence, introducing latency that is incompatible with real-time scenarios. Worse, they tend to distort temporal and semantic patterns, degrading the utility of the data for benign tasks like activity recognition. To address these limitations, we propose the Predictive Adversarial Transformation Network (PATN), a real-time privacy-preserving framework that leverages historical signals to generate adversarial perturbations proactively. The perturbations are applied immediately upon data acquisition, enabling continuous protection without disrupting application functionality. Experiments on two datasets demonstrate that PATN substantially degrades the performance of privacy inference models, achieving Attack Success Rate (ASR) of 40.11% and 44.65% (reducing inference accuracy to near-random) and increasing the Equal Error Rate (EER) from 8.30% and 7.56% to 41.65% and 46.22%. On ASR, PATN outperforms baseline methods by 16.16% and 31.96%, respectively.
著者: Claire Wang, Ziyang Li, Saikat Dutta, Mayur Naik
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Static analysis tools provide a powerful means to detect security vulnerabilities by specifying queries that encode vulnerable code patterns. However, writing such queries is challenging and requires diverse expertise in security and program analysis. To address this challenge, we present QLCoder - an agentic framework that automatically synthesizes queries in CodeQL, a powerful static analysis engine, directly from a given CVE metadata. QLCode embeds an LLM in a synthesis loop with execution feedback, while constraining its reasoning using a custom MCP interface that allows structured interaction with a Language Server Protocol (for syntax guidance) and a RAG database (for semantic retrieval of queries and documentation). This approach allows QLCoder to generate syntactically and semantically valid security queries. We evaluate QLCode on 176 existing CVEs across 111 Java projects. Building upon the Claude Code agent framework, QLCoder synthesizes correct queries that detect the CVE in the vulnerable but not in the patched versions for 53.4% of CVEs. In comparison, using only Claude Code synthesizes 10% correct queries.
著者: Haijing Guo, Jiafeng Wang, Zhaoyu Chen, Kaixun Jiang, Lingyi Hong, Pinxue Guo, Jinglun Li, Wenqiang Zhang
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Deep neural networks (DNNs) are known to be susceptible to adversarial examples, leading to significant performance degradation. In black-box attack scenarios, a considerable attack performance gap between the surrogate model and the target model persists. This work focuses on enhancing the transferability of adversarial examples to narrow this performance gap. We observe that the gradient information around the clean image, i.e., Neighbourhood Gradient Information (NGI), can offer high transferability.Based on this insight, we introduce NGI-Attack, incorporating Example Backtracking and Multiplex Mask strategies to exploit this gradient information and enhance transferability. Specifically, we first adopt Example Backtracking to accumulate Neighbourhood Gradient Information as the initial momentum term. Then, we utilize Multiplex Mask to form a multi-way attack strategy that forces the network to focus on non-discriminative regions, which can obtain richer gradient information during only a few iterations. Extensive experiments demonstrate that our approach significantly enhances adversarial transferability. Especially, when attacking numerous defense models, we achieve an average attack success rate of 95.2%. Notably, our method can seamlessly integrate with any off-the-shelf algorithm, enhancing their attack performance without incurring extra time costs.
著者: Hanrui Wang, Ching-Chun Chang, Chun-Shien Lu, Christopher Leckie, Isao Echizen
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Deep neural networks are highly vulnerable to adversarial examples, which are inputs with small, carefully crafted perturbations that cause misclassification--making adversarial attacks a critical tool for evaluating robustness. Existing black-box methods typically entail a trade-off between precision and flexibility: pixel-sparse attacks (e.g., single- or few-pixel attacks) provide fine-grained control but lack adaptability, whereas patch- or frequency-based attacks improve efficiency or transferability, but at the cost of producing larger and less precise perturbations. We present GreedyPixel, a fine-grained black-box attack method that performs brute-force-style, per-pixel greedy optimization guided by a surrogate-derived priority map and refined by means of query feedback. It evaluates each coordinate directly without any gradient information, guaranteeing monotonic loss reduction and convergence to a coordinate-wise optimum, while also yielding near white-box-level precision and pixel-wise sparsity and perceptual quality. On the CIFAR-10 and ImageNet datasets, spanning convolutional neural networks (CNNs) and Transformer models, GreedyPixel achieved state-of-the-art success rates with visually imperceptible perturbations, effectively bridging the gap between black-box practicality and white-box performance. The implementation is available at https://github.com/azrealwang/greedypixel.
privacy
著者: Amit Keinan, Moshe Shenfeld, Katrina Ligett
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
Recent methods for auditing the privacy of machine learning algorithms have improved computational efficiency by simultaneously intervening on multiple training examples in a single training run. Steinke et al. (2024) prove that one-run auditing indeed lower bounds the true privacy parameter of the audited algorithm, and give impressive empirical results. Their work leaves open the question of how precisely one-run auditing can uncover the true privacy parameter of an algorithm, and how that precision depends on the audited algorithm. In this work, we characterize the maximum achievable efficacy of one-run auditing and show that the key barrier to its efficacy is interference between the observable effects of different data elements. We present new conceptual approaches to minimize this barrier, towards improving the performance of one-run auditing of real machine learning algorithms.
著者: Yue Liu, Zhenchang Xing, Shidong Pan, Chakkrit Tantithamthavorn
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
In recent years, the AI wave has grown rapidly in software development. Even novice developers can now design and generate complex framework-constrained software systems based on their high-level requirements with the help of Large Language Models (LLMs). However, when LLMs gradually "take the wheel" of software development, developers may only check whether the program works. They often miss security problems hidden in how the generated programs are implemented. In this work, we investigate the security properties of framework-constrained programs generated by state-of-the-art LLMs. We focus specifically on Chrome extensions due to their complex security model involving multiple privilege boundaries and isolated components. To achieve this, we built ChromeSecBench, a dataset with 140 prompts based on known vulnerable extensions. We used these prompts to instruct nine state-of-the-art LLMs to generate complete Chrome extensions, and then analyzed them for vulnerabilities across three dimensions: scenario types, model differences, and vulnerability categories. Our results show that LLMs produced vulnerable programs at alarmingly high rates (18%-50%), particularly in Authentication & Identity and Cookie Management scenarios (up to 83% and 78% respectively). Most vulnerabilities exposed sensitive browser data like cookies, history, or bookmarks to untrusted code. Interestingly, we found that advanced reasoning models performed worse, generating more vulnerabilities than simpler models. These findings highlight a critical gap between LLMs' coding skills and their ability to write secure framework-constrained programs.
著者: Fangqi Dai, Xingjian Jiang, Zizhuang Deng
公開日: Thu, 13 Nov 2025 00:00:00 -0500
要約:
To prevent misinformation and social issues arising from trustworthy-looking content generated by LLMs, it is crucial to develop efficient and reliable methods for identifying the source of texts. Previous approaches have demonstrated exceptional performance in detecting texts fully generated by LLMs. However, these methods struggle when confronting more advanced LLM output or text with adversarial multi-task machine revision, especially in the black-box setting, where the generating model is unknown. To address this challenge, grounded in the hypothesis that human writing possesses distinctive stylistic patterns, we propose Human Language Preference Detection (HLPD). HLPD employs a reward-based alignment process, Human Language Preference Optimization (HLPO), to shift the scoring model's token distribution toward human-like writing, making the model more sensitive to human writing, therefore enhancing the identification of machine-revised text. We test HLPD in an adversarial multi-task evaluation framework that leverages a five-dimensional prompt generator and multiple advanced LLMs to create diverse revision scenarios. When detecting texts revised by GPT-series models, HLPD achieves a 15.11% relative improvement in AUROC over ImBD, surpassing Fast-DetectGPT by 45.56%. When evaluated on texts generated by advanced LLMs, HLPD achieves the highest average AUROC, exceeding ImBD by 5.53% and Fast-DetectGPT by 34.14%. Code will be made available at https://github.com/dfq2021/HLPD.
生成日時: 2025-11-13 18:00:02