arXiv論文一覧 - cs.CR updates on arXiv.org

#1 AutoVulnPHP: LLM-Powered Two-Stage PHP Vulnerability Detection and Automated Localization

著者: Zhiqiang Wang, Yizhong Ding, Zilong Xiao, Jinyu Lu, Yan Jia, Yanjun Li

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06177

要約:
PHP's dominance in web development is undermined by security challenges: static analysis lacks semantic depth, causing high false positives; dynamic analysis is computationally expensive; and automated vulnerability localization suffers from coarse granularity and imprecise context. Additionally, the absence of large-scale PHP vulnerability datasets and fragmented toolchains hinder real-world deployment. We present AutoVulnPHP, an end-to-end framework coupling two-stage vulnerability detection with fine-grained automated localization. SIFT-VulMiner (Structural Inference for Flaw Triage Vulnerability Miner) generates vulnerability hypotheses using AST structures enhanced with data flow. SAFE-VulMiner (Semantic Analysis for Flaw Evaluation Vulnerability Miner) verifies candidates through pretrained code encoder embeddings, eliminating false positives. ISAL (Incremental Sequence Analysis for Localization) pinpoints root causes via syntax-guided tracing, chain-of-thought LLM inference, and causal consistency checks to ensure precision. We contribute PHPVD, the first large-scale PHP vulnerability dataset with 26,614 files (5.2M LOC) across seven vulnerability types. On public benchmarks and PHPVD, AutoVulnPHP achieves 99.7% detection accuracy, 99.5% F1 score, and 81.0% localization rate. Deployed on real-world repositories, it discovered 429 previously unknown vulnerabilities, 351 assigned CVE identifiers, validating its practical effectiveness.

#2 Leveraging Membership Inference Attacks for Privacy Measurement in Federated Learning for Remote Sensing Images

privacy

著者: Anh-Kiet Duong, Petra Gomez-Kr\"amer, Ho\`ang-\^An L\^e, Minh-Tan Pham

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06200

要約:
Federated Learning (FL) enables collaborative model training while keeping training data localized, allowing us to preserve privacy in various domains including remote sensing. However, recent studies show that FL models may still leak sensitive information through their outputs, motivating the need for rigorous privacy evaluation. In this paper, we leverage membership inference attacks (MIA) as a quantitative privacy measurement framework for FL applied to remote sensing image classification. We evaluate multiple black-box MIA techniques, including entropy-based attacks, modified entropy attacks, and the likelihood ratio attack, across different FL algorithms and communication strategies. Experiments conducted on two public scene classification datasets demonstrate that MIA effectively reveals privacy leakage not captured by accuracy alone. Our results show that communication-efficient FL strategies reduce MIA success rates while maintaining competitive performance. These findings confirm MIA as a practical metric and highlight the importance of integrating privacy measurement into FL system design for remote sensing applications.

#3 Cyber Threat Detection and Vulnerability Assessment System using Generative AI and Large Language Model

著者: Keerthi Kumar. M, Swarun Kumar Joginpelly, Sunil Khemka, Lakshmi. S R, Navin Chhibber

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06213

要約:
Background: Cyber-attacks have evolved rapidly in recent years, many individuals and business owners have been affected by cyber-attacks in various ways. Cyber-attacks include various threats such as ransomware, malware, phishing, and Denial of Service (DoS)-related attacks. Challenges: Traditional models such as Generative Artificial Intelligence (AI) and Security Bidirectional Encoder Representations from Transformers (BERT) were implemented to detect cyber threats. However, the existing Security BERT model has a limited contextual understanding of text data, which has less impact on detecting cyber-attacks. Proposed Methodology: To overcome the above-mentioned challenges, Robustly Optimized Bidirectional Encoder Representations from Transformers Pretraining Approach (RoBERTa) model is proposed which consists of diverse words of vocabulary understanding. Initially, data are extracted from a Packet Capture (PCAP) file and encrypted using Fully Harmonic Encryption (FHE). Subsequently, a Byte-level and Byte Pair Encoding (BBPE) tokenizer was used to generate tokens and help maintain the vocabulary for the encrypted values. Then, these values are applied to the RoBERTa model of the transformer with extensive training. Finally, Softmax is used for the detection and classification of attacks. The proposed RoBERTa model achieved better results than the existing BERT model in terms of accuracy (0.99), recall (0.91), and precision (0.89) respectively.

#4 AI-Powered Algorithms for the Prevention and Detection of Computer Malware Infections

著者: Rakesh Keshava, Sathish Kuppan Pandurangan, M. Sakthivanitha, Sankaranainar Parmsivan, Goutham Sunkara, R. Maruthi

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06219

要約:
The rise in frequency and complexity of malware attacks are viewed as a major threat to modern digital infrastructure, which means that traditional signature-based detection methods are becoming less effective. As cyber threats continue to evolve, there is a growing need for intelligent systems to accurately and proactively identify and prevent malware infections. This study presents a new hybrid context-aware malware detection framework(HCAMDF) based on artificial intelligence (AI), which combines static file analysis, dynamic behavioural analysis, and contextual metadata to provide more accurate and timely detection. HCADMF has a multi-layer architecture, which consists of lightweight static classifiers such as Long Short Term Memory (LSTM) for real-time behavioral analysis, and an ensemble risk scoring through the integration of multiple layers of prediction. Experimental evaluations of the new/methodology with benchmark datasets, EMBER and CIC-MalMem2022, showed that the new approach provides superior performances with an accuracy of 97.3%, only a 1.5% false positive rate and minimal detection delay compared to several existing machine learning(ML) and deep learning(DL) established methods in the same fields. The results show strong evidence that hybrid AI can detect both existing and novel malware variants, and lay the foundation on intelligent security systems that can enable real-time detection and adapt to a rapidly evolving threat landscape.

#5 Multi-Agent Framework for Controllable and Protected Generative Content Creation: Addressing Copyright and Provenance in AI-Generated Media

intellectual propertyagent

著者: Haris Khan, Sadia Asif, Shumaila Asif

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06232

要約:
The proliferation of generative AI systems creates unprecedented opportunities for content creation while raising critical concerns about controllability, copyright infringement, and content provenance. Current generative models operate as "black boxes" with limited user control and lack built-in mechanisms to protect intellectual property or trace content origin. We propose a novel multi-agent framework that addresses these challenges through specialized agent roles and integrated watermarking. Our system orchestrates Director, Generator, Reviewer, Integration, and Protection agents to ensure user intent alignment while embedding digital provenance markers. We demonstrate feasibility through two case studies: creative content generation with iterative refinement and copyright protection for AI-generated art in commercial contexts. Preliminary feasibility evidence from prior work indicates up to 23\% improvement in semantic alignment and 95\% watermark recovery rates. This work contributes to responsible generative AI deployment, positioning multi-agent systems as a solution for trustworthy creative workflows in legal and commercial applications.

#6 Agentic AI Microservice Framework for Deepfake and Document Fraud Detection in KYC Pipelines

agent

著者: Chandra Sekhar Kubam

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06241

要約:
The rapid proliferation of synthetic media, presentation attacks, and document forgeries has created significant vulnerabilities in Know Your Customer (KYC) workflows across financial services, telecommunications, and digital-identity ecosystems. Traditional monolithic KYC systems lack the scalability and agility required to counter adaptive fraud. This paper proposes an Agentic AI Microservice Framework that integrates modular vision models, liveness assessment, deepfake detection, OCR-based document forensics, multimodal identity linking, and a policy driven risk engine. The system leverages autonomous micro-agents for task decomposition, pipeline orchestration, dynamic retries, and human-in-the-loop escalation. Experimental evaluations demonstrate improved detection accuracy, reduced latency, and enhanced resilience against adversarial inputs. The framework offers a scalable blueprint for regulated industries seeking robust, real-time, and privacy-preserving KYC verification.

#7 Automated Generation of Accurate Privacy Captions From Android Source Code Using Large Language Models

privacy

著者: Vijayanta Jain, Sepideh Ghanavati, Sai Teja Peddinti, Collin McMillan

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06276

要約:
Privacy captions are short sentences that succinctly describe what personal information is used, how it is used, and why, within an app. These captions can be utilized in various notice formats, such as privacy policies, app rationales, and app store descriptions. However, inaccurate captions may mislead users and expose developers to regulatory fines. Existing approaches to generating privacy notices or just privacy captions include using questionnaires, templates, static analysis, or machine learning. However, these approaches either rely heavily on developers' inputs and thus strain their efforts, use limited source code context, leading to the incomplete capture of app privacy behaviors, or depend on potentially inaccurate privacy policies as a source for creating notices. In this work, we address these limitations by developing Privacy Caption Generator (PCapGen), an approach that - i) automatically identifies and extracts large and precise source code context that implements privacy behaviors in an app, ii) uses a Large Language Model (LLM) to describe coarse- and fine-grained privacy behaviors, and iii) generates accurate, concise, and complete privacy captions to describe the privacy behaviors of the app. Our evaluation shows PCapGen generates concise, complete, and accurate privacy captions as compared to the baseline approach. Furthermore, privacy experts choose PCapGen captions at least 71\% of the time, whereas LLMs-as-judge prefer PCapGen captions at least 76\% of the time, indicating strong performance of our approach.

#8 Beyond BeautifulSoup: Benchmarking LLM-Powered Web Scraping for Everyday Users

著者: Arth Bhardwaj, Nirav Diwan, Gang Wang

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06301

要約:
Web scraping has historically required technical expertise in HTML parsing, session management, and authentication circumvention, which limited large-scale data extraction to skilled developers. We argue that large language models (LLMs) have democratized web scraping, enabling low-skill users to execute sophisticated operations through simple natural language prompts. While extensive benchmarks evaluate these tools under optimal expert conditions, we show that without extensive manual effort, current LLM-based workflows allow novice users to scrape complex websites that would otherwise be inaccessible. We systematically benchmark what everyday users can do with off-the-shelf LLM tools across 35 sites spanning five security tiers, including authentication, anti-bot, and CAPTCHA controls. We devise and evaluate two distinct workflows: (a) LLM-assisted scripting, where users prompt LLMs to generate traditional scraping code but maintain manual execution control, and (b) end-to-end LLM agents, which autonomously navigate and extract data through integrated tool use. Our results demonstrate that end-to-end agents have made complex scraping accessible - requiring as little as a single prompt with minimal refinement (less than 5 changes) to complete workflows. We also highlight scenarios where LLM-assisted scripting may be simpler and faster for static sites. In light of these findings, we provide simple procedures for novices to use these workflows and gauge what adversaries could achieve using these.

#9 Smart Privacy Policy Assistant: An LLM-Powered System for Transparent and Actionable Privacy Notices

privacy

著者: Sriharshini Kalvakuntla, Luoxi Tang, Yuqiao Meng, Zhaohan Xi

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06357

要約:
Most users agree to online privacy policies without reading or understanding them, even though these documents govern how personal data is collected, shared, and monetized. Privacy policies are typically long, legally complex, and difficult for non-experts to interpret. This paper presents the Smart Privacy Policy Assistant, an LLM-powered system that automatically ingests privacy policies, extracts and categorizes key clauses, assigns human-interpretable risk levels, and generates clear, concise explanations. The system is designed for real-time use through browser extensions or mobile interfaces, surfacing contextual warnings before users disclose sensitive information or grant risky permissions. We describe the end-to-end pipeline, including policy ingestion, clause categorization, risk scoring, and explanation generation, and propose an evaluation framework based on clause-level accuracy, policy-level risk agreement, and user comprehension.

#10 SafeGPT: Preventing Data Leakage and Unethical Outputs in Enterprise LLM Use

著者: Pratyush Desai, Luoxi Tang, Yuqiao Meng, Zhaohan Xi

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06366

要約:
Large Language Models (LLMs) are transforming enterprise workflows but introduce security and ethics challenges when employees inadvertently share confidential data or generate policy-violating content. This paper proposes SafeGPT, a two-sided guardrail system preventing sensitive data leakage and unethical outputs. SafeGPT integrates input-side detection/redaction, output-side moderation/reframing, and human-in-the-loop feedback. Experiments demonstrate SafeGPT effectively reduces data leakage risk and biased outputs while maintaining satisfaction.

#11 From Easy to Hard++: Promoting Differentially Private Image Synthesis Through Spatial-Frequency Curriculum

privacy

著者: Chen Gong, Kecen Li, Zinan Lin, Tianhao Wang

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06368

要約:
To improve the quality of Differentially private (DP) synthetic images, most studies have focused on improving the core optimization techniques (e.g., DP-SGD). Recently, we have witnessed a paradigm shift that takes these techniques off the shelf and studies how to use them together to achieve the best results. One notable work is DP-FETA, which proposes using `central images' for `warming up' the DP training and then using traditional DP-SGD. Inspired by DP-FETA, we are curious whether there are other such tools we can use together with DP-SGD. We first observe that using `central images' mainly works for datasets where there are many samples that look similar. To handle scenarios where images could vary significantly, we propose FETA-Pro, which introduces frequency features as `training shortcuts.' The complexity of frequency features lies between that of spatial features (captured by `central images') and full images, allowing for a finer-grained curriculum for DP training. To incorporate these two types of shortcuts together, one challenge is to handle the training discrepancy between spatial and frequency features. To address it, we leverage the pipeline generation property of generative models (instead of having one model trained with multiple features/objectives, we can have multiple models working on different features, then feed the generated results from one model into another) and use a more flexible design. Specifically, FETA-Pro introduces an auxiliary generator to produce images aligned with noisy frequency features. Then, another model is trained with these images, together with spatial features and DP-SGD. Evaluated across five sensitive image datasets, FETA-Pro shows an average of 25.7% higher fidelity and 4.1% greater utility than the best-performing baseline, under a privacy budget $\epsilon = 1$.

#12 Noise Reduction for Pufferfish Privacy: A Practical Noise Calibration Method

privacy

著者: Wenjin Yang, Ni Ding, Zijian Zhang, Jing Sun, Zhen Li, Yan Wu, Jiahang Sun, Haotian Lin, Yong Liu, Jincheng An, Liehuang Zhu

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06385

要約:
This paper introduces a relaxed noise calibration method to enhance data utility while attaining pufferfish privacy. This work builds on the existing $1$-Wasserstein (Kantorovich) mechanism by alleviating the existing overly strict condition that leads to excessive noise, and proposes a practical mechanism design algorithm as a general solution. We prove that a strict noise reduction by our approach always exists compared to $1$-Wasserstein mechanism for all privacy budgets $\epsilon$ and prior beliefs, and the noise reduction (also represents improvement on data utility) gains increase significantly for low privacy budget situations--which are commonly seen in real-world deployments. We also analyze the variation and optimality of the noise reduction with different prior distributions. Moreover, all the properties of the noise reduction still exist in the worst-case $1$-Wasserstein mechanism we introduced, when the additive noise is largest. We further show that the worst-case $1$-Wasserstein mechanism is equivalent to the $\ell_1$-sensitivity method. Experimental results on three real-world datasets demonstrate $47\%$ to $87\%$ improvement in data utility.

#13 Lightweight Yet Secure: Secure Scripting Language Generation via Lightweight LLMs

著者: Keyang Zhang, Zeyu Chen, Xuan Feng, Dongliang Fang, Yaowen Zheng, Zhi Li, Limin Sun

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06419

要約:
The security of scripting languages such as PowerShell is critical given their powerful automation and administration capabilities, often exercised with elevated privileges. Today, securing these languages still demands substantial human effort to craft and enforce rules, imposing heavy burdens on typical administrators and creating critical production risks (e.g., misoperations that shut down servers).Large language models (LLMs) have demonstrated strong capabilities in code generation, vulnerability detection, and automated repair for languages like Python and JavaScript. However, their ability to assist with generating secure scripting-language code remains largely underexplored. In this paper, we present SecGenEval-PS, a benchmark designed to systematically evaluate LLMs on secure scripting generation, security analysis, and automated repair. Our results show that both proprietary and open-source models fall short in these areas. For instance, over 60% of PowerShell scripts produced by GPT-4o and o3-mini are insecure without structured guidance.To bridge this gap, we propose PSSec, a framework that combines data synthesis with fine-tuning to enhance model security capabilities. We develop a self-debugging agent that integrates static analyzers with the reasoning abilities of advanced LLMs to synthesize large-scale structured triplets of insecure scripts, violation analyses, and corresponding repairs. We then fine-tune lightweight LLMs (as small as 1.7B parameters) using supervised fine-tuning (SFT) and reinforcement learning (RL), enabling security-aware reasoning and the generation of secure PowerShell code.Across multiple LLM families, including GPT and Qwen, \textit{PSSec}-trained models match or surpass general-purpose large models on PowerShell security tasks while reducing inference cost by more than an order of magnitude.

#14 VIPER Strike: Defeating Visual Reasoning CAPTCHAs via Structured Vision-Language Inference

著者: Minfeng Qi, Dongyang He, Qin Wang, Lefeng Zhang

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06461

要約:
Visual Reasoning CAPTCHAs (VRCs) combine visual scenes with natural-language queries that demand compositional inference over objects, attributes, and spatial relations. They are increasingly deployed as a primary defense against automated bots. Existing solvers fall into two paradigms: vision-centric, which rely on template-specific detectors but fail on novel layouts, and reasoning-centric, which leverage LLMs but struggle with fine-grained visual perception. Both lack the generality needed to handle heterogeneous VRC deployments. We present ViPer, a unified attack framework that integrates structured multi-object visual perception with adaptive LLM-based reasoning. ViPer parses visual layouts, grounds attributes to question semantics, and infers target coordinates within a modular pipeline. Evaluated on six major VRC providers (VTT, Geetest, NetEase, Dingxiang, Shumei, Xiaodun), ViPer achieves up to 93.2% success, approaching human-level performance across multiple benchmarks. Compared to prior solvers, GraphNet (83.2%), Oedipus (65.8%), and the Holistic approach (89.5%), ViPer consistently outperforms all baselines. The framework further maintains robustness across alternative LLM backbones (GPT, Grok, DeepSeek, Kimi), sustaining accuracy above 90%. To anticipate defense, we further introduce Template-Space Randomization (TSR), a lightweight strategy that perturbs linguistic templates without altering task semantics. TSR measurably reduces solver (i.e., attacker) performance. Our proposed design suggests directions for human-solvable but machine-resistant CAPTCHAs.

#15 SecureDyn-FL: A Robust Privacy-Preserving Federated Learning Framework for Intrusion Detection in IoT Networks

privacy

著者: Imtiaz Ali Soomro, Hamood Ur Rehman, S. Jawad Hussain ID, Adeel Iqbal, Waqas Khalid, Heejung Yu ID

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06466

要約:
The rapid proliferation of Internet of Things (IoT) devices across domains such as smart homes, industrial control systems, and healthcare networks has significantly expanded the attack surface for cyber threats, including botnet-driven distributed denial-of-service (DDoS), malware injection, and data exfiltration. Conventional intrusion detec- tion systems (IDS) face critical challenges like privacy, scala- bility, and robustness when applied in such heterogeneous IoT environments. To address these issues, we propose SecureDyn- FL, a comprehensive and robust privacy-preserving federated learning (FL) framework tailored for intrusion detection in IoT networks. SecureDyn-FL is designed to simultaneously address multiple security dimensions in FL-based IDS: (1) poisoning detection through dynamic temporal gradient auditing, (2) privacy protection against inference and eavesdrop- ping attacks through secure aggregation, and (3) adaptation to heterogeneous non-IID data via personalized learning. The framework introduces three core contributions: (i) a dynamic temporal gradient auditing mechanism that leverages Gaussian mixture models (GMMs) and Mahalanobis distance (MD) to detect stealthy and adaptive poisoning attacks, (ii) an optimized privacy-preserving aggregation scheme based on transformed additive ElGamal encryption with adaptive pruning and quantization for secure and efficient communication, and (iii) a dual-objective personalized learning strategy that improves model adaptation under non-IID data using logit-adjusted loss. Extensive experiments on the N-BaIoT dataset under both IID and non-IID settings, including scenarios with up to 50% adversarial clients, demonstrate that SecureDyn- FL consistently outperforms state-of-the-art FL-based IDS defenses.

#16 A Bayesian Network-Driven Zero Trust Model for Cyber Risk Quantification in Small-Medium Businesses

著者: Ahmed M. Abdelmagid, Barry C. Ezell, Michael McShane

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06553

要約:
Small-Medium Businesses (SMBs) are essential to global economies yet remain highly vulnerable to cyberattacks due to limited budgets, inadequate cybersecurity expertise, and underestimation of cyber risks. Their increasing reliance on digital infrastructures has expanded their attack surfaces, exposing them to sophisticated and evolving threats. Consequently, implementing proactive, adaptive security measures has become imperative. This research investigates the effectiveness of Zero Trust Architecture (ZTA) as a sustainable cybersecurity solution tailored to SMBs. While ZTA adoption has been examined broadly, the specific financial, organizational, and capability constraints of SMBs remain underexplored. This study develops an integrated predictive model to assess both the feasibility and risk-mitigation potential of ZTA implementation. The model consists of two sub-models. The first sub-model evaluates the probability of successful ZTA adoption considering implied barriers, and the second tests the effectiveness of ZTA in responding to prevalent cyberattacks. The integrated model predicts the risk level in the presence of ZTA and quantifies the uncertainty of the extent to which ZTA can enhance SMBs' cyber resilience, contributing novel insights for practitioners and stakeholders seeking to enhance compliance with policies, risk, and governance activities in SMBs.

#17 QES-Backed Virtual FIDO2 Authenticators: Architectural Options for Secure, Synchronizable WebAuthn Credentials

著者: Kemal Bicakci, Fatih Mehmet Varli, Muhammet Emir Korkmaz, Yusuf Uzunay

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06554

要約:
FIDO2 and the WebAuthn standard offer phishing-resistant, public-key based authentication but traditionally rely on device-bound cryptographic keys that are not naturally portable across user devices. Recent passkey deployments address this limitation by enabling multi-device credentials synchronized via platform-specific cloud ecosystems. However, these approaches require users and organizations to trust the corresponding cloud or phone providers with the protection and availability of their authentication material. In parallel, qualified electronic signature (QES) tokens and smart-card--based PKCS#11 modules provide high-assurance, hardware-rooted identity, yet they are not directly compatible with WebAuthn flows. This paper explores architectural options for bridging these technologies by securing a virtual FIDO2 authenticator with a QES-grade PKCS#11 key and enabling encrypted cloud synchronization of FIDO2 private keys. We first present and implement a baseline architecture in which the cloud stores only ciphertext and the decryption capability remains anchored exclusively in the user's hardware token. We then propose a hardened variant that introduces an Oblivious Pseudorandom Function (OPRF)-based mechanism bound to a local user-verification factor, thereby mitigating cross-protocol misuse and ensuring that synchronization keys cannot be repurposed outside the intended FIDO2 semantics; this enhanced design is analyzed but not implemented. Both architectures preserve a pure WebAuthn/FIDO2 interface to relying parties while offering different trust and deployment trade-offs. We provide the system model, threat analysis, implementation of the baseline architecture, and experimental evaluation, followed by a discussion of the hardened variant's security implications for high-assurance authentication deployments.

#18 Are LLMs Vulnerable to Preference-Undermining Attacks (PUA)? A Factorial Analysis Methodology for Diagnosing the Trade-off between Preference Alignment and Real-World Validity

著者: Hongjun An, Yiliang Song, Jiangan Chen, Jiawei Shao, Chi Zhang, Xuelong Li

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06596

要約:
Large Language Model (LLM) training often optimizes for preference alignment, rewarding outputs that are perceived as helpful and interaction-friendly. However, this preference-oriented objective can be exploited: manipulative prompts can steer responses toward user-appeasing agreement and away from truth-oriented correction. In this work, we investigate whether aligned models are vulnerable to Preference-Undermining Attacks (PUA), a class of manipulative prompting strategies designed to exploit the model's desire to please user preferences at the expense of truthfulness. We propose a diagnostic methodology that provides a finer-grained and more directive analysis than aggregate benchmark scores, using a factorial evaluation framework to decompose prompt-induced shifts into interpretable effects of system objectives (truth- vs. preference-oriented) and PUA-style dialogue factors (directive control, personal derogation, conditional approval, reality denial) within a controlled $2 \times 2^4$ design. Surprisingly, more advanced models are sometimes more susceptible to manipulative prompts. Beyond the dominant reality-denial factor, we observe model-specific sign reversals and interactions with PUA-style factors, suggesting tailored defenses rather than uniform robustness. These findings offer a novel, reproducible factorial evaluation methodology that provides finer-grained diagnostics for post-training processes like RLHF, enabling better trade-offs in the product iteration of LLMs by offering a more nuanced understanding of preference alignment risks and the impact of manipulative prompts.

#19 Cross-Border Data Security and Privacy Risks in Large Language Models and IoT Systems

privacy

著者: Chalitha Handapangoda

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06612

要約:
The reliance of Large Language Models and Internet of Things systems on massive, globally distributed data flows creates systemic security and privacy challenges. When data traverses borders, it becomes subject to conflicting legal regimes, such as the EU's General Data Protection Regulation and China's Personal Information Protection Law, compounded by technical vulnerabilities like model memorization. Current static encryption and data localization methods are fragmented and reactive, failing to provide adequate, policy-aligned safeguards. This research proposes a Jurisdiction-Aware, Privacy-by-Design architecture that dynamically integrates localized encryption, adaptive differential privacy, and real-time compliance assertion via cryptographic proofs. Empirical validation in a multi-jurisdictional simulation demonstrates this architecture reduced unauthorized data exposure to below five percent and achieved zero compliance violations. These security gains were realized while maintaining model utility retention above ninety percent and limiting computational overhead. This establishes that proactive, integrated controls are feasible for secure and globally compliant AI deployment.

#20 Burn-After-Use for Preventing Data Leakage through a Secure Multi-Tenant Architecture in Enterprise LLM

著者: Qiang Zhang, Elena Emma Wang, Jiaming Li, Xichun Wang

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06627

要約:
This study presents a Secure Multi-Tenant Architecture (SMTA) combined with a novel concept Burn-After-Use (BAU) mechanism for enterprise LLM environments to effectively prevent data leakage. As institutions increasingly adopt LLMs across departments, the risks of data leakage have become a critical security and compliance concern. The proposed SMTA isolates LLM instances across departments and enforces rigorous context ownership boundaries within an internally deployed infrastructure. The BAU mechanism introduces data confidentiality by enforcing ephemeral conversational contexts that are automatically destroyed after use, preventing cross-session or cross-user inference. The evaluation to SMTA and BAU is through two sets of realistic and reproducible experiments comprising of 127 test iterations. One aspect of this experiment is to assess prompt-based and semantic leakage attacks in a multi-tenant architecture (Appendix A) across 55 infrastructure-level attack tests, including vector-database credential compromise and shared logging pipeline exposure. SMTA achieves 92% defense success rate, demonstrating strong semantic isolation while highlighting residual risks from credential misconfiguration and observability pipelines. Another aspect is to evaluate the robustness of BAU under realistic failure scenarios (Appendix B) using four empirical metrics: Local Residual Persistence Rate (LRPR), Remote Residual Persistence Rate (RRPR), Image Frame Exposure Rate (IFER), and Burn Timer Persistence Rate (BTPR). Across 72 test iterations, BAU achieves a 76.75% success rate in mitigating post-session leakage threats across the client, server, application, infrastructure, and cache layers. These results show that SMTA and BAU together enforce strict isolation, complete session ephemerality, strong confidentiality guarantees, non-persistence, and policy-aligned behavior for enterprise LLMs.

#21 Attack-Resistant Watermarking for AIGC Image Forensics via Diffusion-based Semantic Deflection

intellectual propertydiffusion

著者: Qingyu Liu, Yitao Zhang, Zhongjie Ba, Chao Shuai, Peng Cheng, Tianhang Zheng, Zhibo Wang

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06639

要約:
Protecting the copyright of user-generated AI images is an emerging challenge as AIGC becomes pervasive in creative workflows. Existing watermarking methods (1) remain vulnerable to real-world adversarial threats, often forced to trade off between defenses against spoofing and removal attacks; and (2) cannot support semantic-level tamper localization. We introduce PAI, a training-free inherent watermarking framework for AIGC copyright protection, plug-and-play with diffusion-based AIGC services. PAI simultaneously provides three key functionalities: robust ownership verification, attack detection, and semantic-level tampering localization. Unlike existing inherent watermark methods that only embed watermarks at noise initialization of diffusion models, we design a novel key-conditioned deflection mechanism that subtly steers the denoising trajectory according to the user key. Such trajectory-level coupling further strengthens the semantic entanglement of identity and content, thereby further enhancing robustness against real-world threats. Moreover, we also provide a theoretical analysis proving that only the valid key can pass verification. Experiments across 12 attack methods show that PAI achieves 98.43\% verification accuracy, improving over SOTA methods by 37.25\% on average, and retains strong tampering localization performance even against advanced AIGC edits. Our code is available at https://github.com/QingyuLiu/PAI.

#22 zkRansomware: Proof-of-Data Recoverability and Multi-round Game Theoretic Modeling of Ransomware Decisions

著者: Xinyu Hou, Yang Lu, Rabimba Karanjai, Lei Xu, Weidong Shi

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06667

要約:
Ransomware is still one of the most serious cybersecurity threats. Victims often pay but fail to regain access to their data, while also facing the danger of losing data privacy. These uncertainties heavily shape the attacker-victim dynamics in decision-making. In this paper, we introduce and analyze zkRansomware. This new ransomware model integrates zero-knowledge proofs to enable verifiable data recovery and uses smart contracts to enforce multi-round payments while mitigating the risk of data disclosure and privacy loss. We show that zkRansomware is technically feasible using existing cryptographic and blockchain tools and, perhaps counterintuitively, can align incentives between the attacker and the victim. Finally, we develop a theoretical decision-making frame- work for zkRansomware that distinguishes it from known ransomware decision models and discusses its implications for ransomware risk anal- ysis and response decision support.

#23 S-DAPT-2026: A Stage-Aware Synthetic Dataset for Advanced Persistent Threat Detection

synthetic data

著者: Saleem Ishaq Tijjani, Bogdan Ghita, Nathan Clarke, Matthew Craven

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06690

要約:
The detection of advanced persistent threats (APTs) remains a crucial challenge due to their stealthy, multistage nature and the limited availability of realistic, labeled datasets for systematic evaluation. Synthetic dataset generation has emerged as a practical approach for modeling APT campaigns; however, existing methods often rely on computationally expensive alert correlation mechanisms that limit scalability. Motivated by these limitations, this paper presents a near realistic synthetic APT dataset and an efficient alert correlation framework. The proposed approach introduces a machine learning based correlation module that employs K Nearest Neighbors (KNN) clustering with a cosine similarity metric to group semantically related alerts within a temporal context. The dataset emulates multistage APT campaigns across campus and organizational network environments and captures a diverse set of fourteen distinct alert types, exceeding the coverage of commonly used synthetic APT datasets. In addition, explicit APT campaign states and alert to stage mappings are defined to enable flexible integration of new alert types and support stage aware analysis. A comprehensive statistical characterization of the dataset is provided to facilitate reproducibility and support APT stage predictions.

#24 Incentive Mechanism Design for Privacy-Preserving Decentralized Blockchain Relayers

privacy

著者: Boutaina Jebari, Khalil Ibrahimi, Hamidou Tembine, Mounir Ghogho

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06699

要約:
Public blockchains, though renowned for their transparency and immutability, suffer from significant privacy concerns. Network-level analysis and long-term observation of publicly available transactions can often be used to infer user identities. To mitigate this, several blockchain applications rely on relayers, which serve as intermediary nodes between users and smart contracts deployed on the blockchain. However, dependence on a single relayer not only creates a single point of failure but also introduces exploitable vulnerabilities that weaken the system's privacy guarantees. This paper proposes a decentralized relayer architecture that enhances privacy and reliability through game-theoretic incentive design. We model the interaction among relayers as a non-cooperative game and design an incentive mechanism in which probabilistic uploading emerges as a unique mixed Nash equilibrium. Using evolutionary game analysis, we demonstrate the equilibrium's stability against perturbations and coordinated deviations. Through numerical evaluations, we analyze how equilibrium strategies and system behavior evolve with key parameters such as the number of relayers, upload costs, rewards, and penalties. In particular, we show that even with high transaction costs, the system maintains reliability with an outage probability below 0.05 . Furthermore, our results highlight a fundamental trade-off between privacy, reliability, robustness, and cost in decentralized relayer systems.

#25 Behavioral Analytics for Continuous Insider Threat Detection in Zero-Trust Architectures

著者: Gaurav Sarraf

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06708

要約:
Insider threats are a particularly tricky cybersecurity issue, especially in zero-trust architectures (ZTA) where implicit trust is removed. Although the rule of thumb is never trust, always verify, attackers can still use legitimate credentials and impersonate the standard user activity. In response, behavioral analytics with machine learning (ML) can help monitor the user activity continuously and identify the presence of anomalies. This introductory framework makes use of the CERT Insider Threat Dataset for data cleaning, normalization, and class balance using the Synthetic Minority Oversampling Technique (SMOTE). It also employs Principal Component Analysis (PCA) for dimensionality reduction. Several benchmark models, including Support Vector Machine (SVM), Artificial Neural Network (ANN), and Bayesian Network (Bayes Net), were used to develop and evaluate the AdaBoost classifier. Compared to SVM (90.1%), ANN (94.7%), and Bayes Net (94.9), AdaBoost achieved higher performance with a 98.0% ACC, 98.3% PRE, 98.0% REC, and F1-score (F1). The Receiver Operating Characteristic (ROC) study, which provided further confirmation of its strength, yielded an Area Under the Curve (AUC) of 0.98. These results prove the effectiveness and dependability of AdaBoost-based behavioral analytics as a solution to reinforcing continuous insider threat detection in zero-trust settings.

#26 Privacy-Preserving Data Processing in Cloud : From Homomorphic Encryption to Federated Analytics

privacy

著者: Gaurav Sarraf, Vibhor Pal

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06710

要約:
Privacy-preserving data processing refers to the methods and models that allow computing and analyzing sensitive data with a guarantee of confidentiality. As cloud computing and applications that rely on data continue to expand, there is an increasing need to protect personal, financial and healthcare information. Conventional centralized data processing methods expose sensitive data to risk of breaches, compelling the need to use decentralized and secure data methods. This paper gives a detailed review of privacy-saving mechanisms in the cloud platform, such as statistical approaches like differential privacy and cryptographic solutions like homomorphic encryption. Federated analytics and federated learning, two distributed learning frameworks, are also discussed. Their principles, applications, benefits, and limitations are reviewed, with roles of use in the fields of healthcare, finance, IoT, and industrial cases. Comparative analyses measure trade-offs in security, efficiency, scalability, and accuracy, and investigations are done of emerging hybrid frameworks to provide better privacy protection. Critical issues, including computational overhead, privacy-utility trade-offs, standardization, adversarial threats, and cloud integration are also addressed. This review examines in detail the recent privacy-protecting approaches in cloud computation and offers scholars and practitioners crucial information on secure and effective solutions to data processing.

#27 Deep Recurrent Hidden Markov Learning Framework for Multi-Stage Advanced Persistent Threat Prediction

著者: Saleem Ishaq Tijjani, Bogdan Ghita, Nathan Clarke, Matthew Craven

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06734

要約:
Advanced Persistent Threats (APTs) represent hidden, multi\-stage cyberattacks whose long term persistence and adaptive behavior challenge conventional intrusion detection systems (IDS). Although recent advances in machine learning and probabilistic modeling have improved APT detection performance, most existing approaches remain reactive and alert\-centric, providing limited capability for stage-aware prediction and principled inference under uncertainty, particularly when observations are sparse or incomplete. This paper proposes E\-HiDNet, a unified hybrid deep probabilistic learning framework that integrates convolutional and recurrent neural networks with a Hidden Markov Model (HMM) to allow accurate prediction of the progression of the APT campaign. The deep learning component extracts hierarchical spatio\-temporal representations from correlated alert sequences, while the HMM models latent attack stages and their stochastic transitions, allowing principled inference under uncertainty and partial observability. A modified Viterbi algorithm is introduced to handle incomplete observations, ensuring robust decoding under uncertainty. The framework is evaluated using a synthetically generated yet structurally realistic APT dataset (S\-DAPT\-2026). Simulation results show that E\-HiDNet achieves up to 98.8\-100\% accuracy in stage prediction and significantly outperforms standalone HMMs when four or more observations are available, even under reduced training data scenarios. These findings highlight that combining deep semantic feature learning with probabilistic state\-space modeling enhances predictive APT stage performance and situational awareness for proactive APT defense.

#28 ALFA: A Safe-by-Design Approach to Mitigate Quishing Attacks Launched via Fancy QR Codes

著者: Muhammad Wahid Akram, Keshav Sood, Muneeb Ul Hassan, Dhananjay Thiruvady

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06768

要約:
Phishing with Quick Response (QR) codes is termed as Quishing. The attackers exploit this method to manipulate individuals into revealing their confidential data. Recently, we see the colorful and fancy representations of QR codes, the 2D matrix of QR codes which does not reflect a typical mixture of black-white modules anymore. Instead, they become more tempting as an attack vector for adversaries which can evade the state-of-the-art deep learning visual-based and other prevailing countermeasures. We introduce "ALFA", a safe-by-design approach, to mitigate Quishing and prevent everyone from accessing the post-scan harmful payload of fancy QR codes. Our method first converts a fancy QR code into the replica of binary grid and then identify the erroneous representation of modules in that grid. Following that, we present "FAST" method which can conveniently recover erroneous modules from that binary grid. Afterwards, using this binary grid, our solution extracts the structural features of fancy QR code and predicts its legitimacy using a pre-trained model. The effectiveness of our proposal is demonstrated by the experimental evaluation on a synthetic dataset (containing diverse variations of fancy QR codes) and achieve a FNR of 0.06% only. We also develop the mobile app to test the practical feasibility of our solution and provide a performance comparison of the app with the real-world QR readers. This comparison further highlights the classification reliability and detection accuracy of this solution in real-world environments.

#29 CyberLLM-FINDS 2025: Instruction-Tuned Fine-tuning of Domain-Specific LLMs with Retrieval-Augmented Generation and Graph Integration for MITRE Evaluation

著者: Vasanth Iyer, Leonardo Bobadilla, S. S. Iyengar

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06779

要約:
Large Language Models (LLMs) such as Gemma-2B have shown strong performance in various natural language processing tasks. However, general-purpose models often lack the domain expertise required for cybersecurity applications. This work presents a methodology to fine-tune the Gemma-2B model into a domain-specific cybersecurity LLM. We detail the processes of dataset preparation, fine-tuning, and synthetic data generation, along with implications for real-world applications in threat detection, forensic investigation, and attack analysis. Experiments highlight challenges in prompt length distribution during domain-specific fine-tuning. Uneven prompt lengths limit the model's effective use of the context window, constraining local inference to 200-400 tokens despite hardware support for longer sequences. Chain-of-thought styled prompts, paired with quantized weights, yielded the best performance under these constraints. To address context limitations, we employed a hybrid strategy using cloud LLMs for synthetic data generation and local fine-tuning for deployment efficiency. To extend the evaluation, we introduce a Retrieval-Augmented Generation (RAG) pipeline and graph-based reasoning framework. This approach enables structured alignment with MITRE ATT&CK techniques through STIX-based threat intelligence, enhancing recall in multi-hop and long-context scenarios. Graph modules encode entity-neighborhood context and tactic chains, helping mitigate the constraints of short prompt windows. Results demonstrate improved model alignment with tactic, technique, and procedure (TTP) coverage, validating the utility of graph-augmented LLMs in cybersecurity threat intelligence applications.

#30 SecMoE: Communication-Efficient Secure MoE Inference via Select-Then-Compute

著者: Bowen Shen, Yuyue Chen, Peng Yang, Bin Zhang, Xi Zhang, Zoe L. Jiang

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06790

要約:
Privacy-preserving Transformer inference has gained attention due to the potential leakage of private information. Despite recent progress, existing frameworks still fall short of practical model scales, with gaps up to a hundredfold. A possible way to close this gap is the Mixture of Experts (MoE) architecture, which has emerged as a promising technique to scale up model capacity with minimal overhead. However, given that the current secure two-party (2-PC) protocols allow the server to homomorphically compute the FFN layer with its plaintext model weight, under the MoE setting, this could reveal which expert is activated to the server, exposing token-level privacy about the client's input. While naively evaluating all the experts before selection could protect privacy, it nullifies MoE sparsity and incurs the heavy computational overhead that sparse MoE seeks to avoid. To address the privacy and efficiency limitations above, we propose a 2-PC privacy-preserving inference framework, \SecMoE. Unifying per-entry circuits in both the MoE layer and piecewise polynomial functions, \SecMoE obliviously selects the extracted parameters from circuits and only computes one encrypted entry, which we refer to as Select-Then-Compute. This makes the model for private inference scale to 63$\times$ larger while only having a 15.2$\times$ increase in end-to-end runtime. Extensive experiments show that, under 5 expert settings, \SecMoE lowers the end-to-end private inference communication by 1.8$\sim$7.1$\times$ and achieves 1.3$\sim$3.8$\times$ speedup compared to the state-of-the-art (SOTA) protocols.

#31 CHASE: LLM Agents for Dissecting Malicious PyPI Packages

agent

著者: Takaaki Toda, Tatsuya Mori

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06838

要約:
Modern software package registries like PyPI have become critical infrastructure for software development, but are increasingly exploited by threat actors distributing malicious packages with sophisticated multi-stage attack chains. While Large Language Models (LLMs) offer promising capabilities for automated code analysis, their application to security-critical malware detection faces fundamental challenges, including hallucination and context confusion, which can lead to missed detections or false alarms. We present CHASE (Collaborative Hierarchical Agents for Security Exploration), a high-reliability multi-agent architecture that addresses these limitations through a Plan-and-Execute coordination model, specialized Worker Agents focused on specific analysis aspects, and integration with deterministic security tools for critical operations. Our key insight is that reliability in LLM-based security analysis emerges not from improving individual model capabilities but from architecting systems that compensate for LLM weaknesses while leveraging their semantic understanding strengths. Evaluation on a dataset of 3,000 packages (500 malicious, 2,500 benign) demonstrates that CHASE achieves 98.4% recall with only 0.08% false positive rate, while maintaining a practical median analysis time of 4.5 minutes per package, making it suitable for operational deployment in automated package screening. Furthermore, we conducted a survey with cybersecurity professionals to evaluate the generated analysis reports, identifying their key strengths and areas for improvement. This work provides a blueprint for building reliable AI-powered security tools that can scale with the growing complexity of modern software supply chains. Our project page is available at https://t0d4.github.io/CHASE-AIware25/

#32 qAttCNN - Self Attention Mechanism for Video QoE Prediction in Encrypted Traffic

著者: Michael Sidorov, Ofer Hadar

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06862

要約:
The rapid growth of multimedia consumption, driven by major advances in mobile devices since the mid-2000s, has led to widespread use of video conferencing applications (VCAs) such as Zoom and Google Meet, as well as instant messaging applications (IMAs) like WhatsApp and Telegram, which increasingly support video conferencing as a core feature. Many of these systems rely on the Web Real-Time Communication (WebRTC) protocol, enabling direct peer-to-peer media streaming without requiring a third-party server to relay data, reducing the latency and facilitating a real-time communication. Despite WebRTC's potential, adverse network conditions can degrade streaming quality and consequently reduce users' Quality of Experience (QoE). Maintaining high QoE therefore requires continuous monitoring and timely intervention when QoE begins to deteriorate. While content providers can often estimate QoE by directly comparing transmitted and received media, this task is significantly more challenging for internet service providers (ISPs). End-to-end encryption, commonly used by modern VCAs and IMAs, prevent ISPs from accessing the original media stream, leaving only Quality of Service (QoS) and routing information available. To address this limitation, we propose the QoE Attention Convolutional Neural Network (qAttCNN), a model that leverages packet size parameter of the traffic to infer two no-reference QoE metrics viz. BRISQUE and frames per second (FPS). We evaluate qAttCNN on a custom dataset collected from WhatsApp video calls and compare it against existing QoE models. Using mean absolute error percentage (MAEP), our approach achieves 2.14% error for BRISQUE and 7.39% for FPS prediction.

#33 United We Defend: Collaborative Membership Inference Defenses in Federated Learning

privacy

著者: Li Bai, Junxu Liu, Sen Zhang, Xinwei Zhang, Qingqing Ye, Haibo Hu

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06866

要約:
Membership inference attacks (MIAs), which determine whether a specific data point was included in the training set of a target model, have posed severe threats in federated learning (FL). Unfortunately, existing MIA defenses, typically applied independently to each client in FL, are ineffective against powerful trajectory-based MIAs that exploit temporal information throughout the training process to infer membership status. In this paper, we investigate a new FL defense scenario driven by heterogeneous privacy needs and privacy-utility trade-offs, where only a subset of clients are defended, as well as a collaborative defense mode where clients cooperate to mitigate membership privacy leakage. To this end, we introduce CoFedMID, a collaborative defense framework against MIAs in FL, which limits local model memorization of training samples and, through a defender coalition, enhances privacy protection and model utility. Specifically, CoFedMID consists of three modules: a class-guided partition module for selective local training samples, a utility-aware compensation module to recycle contributive samples and prevent their overconfidence, and an aggregation-neutral perturbation module that injects noise for cancellation at the coalition level into client updates. Extensive experiments on three datasets show that our defense framework significantly reduces the performance of seven MIAs while incurring only a small utility loss. These results are consistently verified across various defense settings.

#34 Towards Compositional Generalization in LLMs for Smart Contract Security: A Case Study on Reentrancy Vulnerabilities

著者: Ying Zhou, Jiacheng Wei, Yu Qi, Faguo Wu, Xiao Zhang

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06914

要約:
Large language models (LLMs) demonstrate remarkable capabilities in natural language understanding and generation. Despite being trained on large-scale, high-quality data, LLMs still fail to outperform traditional static analysis tools in specialized domains like smart contract vulnerability detection. To address this issue, this paper proposes a post-training algorithm based on atomic task decomposition and fusion. This algorithm aims to achieve combinatorial generalization under limited data by decomposing complex reasoning tasks. Specifically, we decompose the reentrancy vulnerability detection task into four linearly independent atomic tasks: identifying external calls, identifying state updates, identifying data dependencies between external calls and state updates, and determining their data flow order. These tasks form the core components of our approach. By training on synthetic datasets, we generate three compiler-verified datasets. We then employ the Slither tool to extract structural information from the control flow graph and data flow graph, which is used to fine-tune the LLM's adapter. Experimental results demonstrate that low-rank normalization fusion with the LoRA adapter improves the LLM's reentrancy vulnerability detection accuracy to 98.2%, surpassing state-of-the-art methods. On 31 real-world contracts, the algorithm achieves a 20% higher recall than traditional analysis tools.

#35 Operational Runtime Behavior Mining for Open-Source Supply Chain Security

著者: Zhuoran Tan, Ke Xiao, Jeremy Singer, Christos Anagnostopoulos

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06948

要約:
Open-source software (OSS) is a critical component of modern software systems, yet supply chain security remains challenging in practice due to unavailable or obfuscated source code. Consequently, security teams often rely on runtime observations collected from sandboxed executions to investigate suspicious third-party components. We present HeteroGAT-Rank, an industry-oriented runtime behavior mining system that supports analyst-in-the-loop supply chain threat investigation. The system models execution-time behaviors of OSS packages as lightweight heterogeneous graphs and applies attention-based graph learning to rank behavioral patterns that are most relevant for security analysis. Rather than aiming for fully automated detection, HeteroGAT-Rank surfaces actionable runtime signals - such as file, network, and command activities - to guide manual investigation and threat hunting. To operate at ecosystem scale, the system decouples offline behavior mining from online analysis and integrates parallel graph construction for efficient processing across multiple ecosystems. An evaluation on a large-scale OSS execution dataset shows that HeteroGAT-Rank effectively highlights meaningful and interpretable behavioral indicators aligned with real-world vulnerability and attack trends, supporting practical security workflows under realistic operational constraints.

#36 MemTrust: A Zero-Trust Architecture for Unified AI Memory System

著者: Xing Zhou, Dmitrii Ustiugov, Haoxin Shang, Kisson Lin

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07004

要約:
AI memory systems are evolving toward unified context layers that enable efficient cross-agent collaboration and multi-tool workflows, facilitating better accumulation of personal data and learning of user preferences. However, centralization creates a trust crisis where users must entrust cloud providers with sensitive digital memory data. We identify a core tension between personalization demands and data sovereignty: centralized memory systems enable efficient cross-agent collaboration but expose users' sensitive data to cloud provider risks, while private deployments provide security but limit collaboration. To resolve this tension, we aim to achieve local-equivalent security while enabling superior maintenance efficiency and collaborative capabilities. We propose a five-layer architecture abstracting common functional components of AI memory systems: Storage, Extraction, Learning, Retrieval, and Governance. By applying TEE protection to each layer, we establish a trustworthy framework. Based on this, we design MemTrust, a hardware-backed zero-trust architecture that provides cryptographic guarantees across all layers. Our contributions include the five-layer abstraction, "Context from MemTrust" protocol for cross-application sharing, side-channel hardened retrieval with obfuscated access patterns, and comprehensive security analysis. The architecture enables third-party developers to port existing systems with acceptable development costs, achieving system-wide trustworthiness. We believe that AI memory plays a crucial role in enhancing the efficiency and collaboration of agents and AI tools. AI memory will become the foundational infrastructure for AI agents, and MemTrust serves as a universal trusted framework for AI memory systems, with the goal of becoming the infrastructure of memory infrastructure.

#37 Zer0n: An AI-Assisted Vulnerability Discovery and Blockchain-Backed Integrity Framework

著者: Harshil Parmar, Pushti Vyas, Prayers Khristi, Priyank Panchal

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07019

要約:
As vulnerability research increasingly adopts generative AI, a critical reliance on opaque model outputs has emerged, creating a "trust gap" in security automation. We address this by introducing Zer0n, a framework that anchors the reasoning capabilities of Large Language Models (LLMs) to the immutable audit trails of blockchain technology. Specifically, we integrate Gemini 2.0 Pro for logic-based vulnerability detection with the Avalanche C-Chain for tamper-evident artifact logging. Unlike fully decentralized solutions that suffer from high latency, Zer0n employs a hybrid architecture: execution remains off-chain for performance, while integrity proofs are finalized on-chain. Our evaluation on a dataset of 500 endpoints reveals that this approach achieves 80% detection accuracy with only a marginal 22.9% overhead, effectively demonstrating that decentralized integrity can coexist with high-speed security workflows.

#38 LINEture: novel signature cryptosystem

著者: Gennady Khalimov, Yevgen Kotukh

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07071

要約:
We propose a novel digital signature cryptosystem that exploits the concept of the brute-force problem. To ensure the security of the cryptosystem, we employed several mechanisms: sharing a common secret for factorable permutations, associating permutations with the message being signed, and confirming knowledge of the shared secret using a zero-knowledge proof. We developed a secret-sharing theory based on homomorphic matrix transformations for factorized permutations. The inverse matrix transformation for computing the shared secret is determined by secret parameters, which results in incompletely defined functionality and gives rise to a brute-force cryptanalysis problem. Randomization of session keys using a message hash and random parameters guarantees the uniqueness of each signature, even for identical messages. We employed a zero-knowledge authentication protocol to confirm knowledge of the shared secret, thereby protecting the verifier against unauthorized signature imposition. The LINEture cryptosystem is built on linear matrix algebra and does not rely on a computationally hard problem. High security is achieved through the appropriate selection of matrix transformation dimensions. Matrix computations potentially offer low operational costs for signature generation and verification.

#39 Overcoming the Retrieval Barrier: Indirect Prompt Injection in the Wild for LLM Systems

著者: Hongyan Chang, Ergute Bao, Xinjian Luo, Ting Yu

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07072

要約:
Large language models (LLMs) increasingly rely on retrieving information from external corpora. This creates a new attack surface: indirect prompt injection (IPI), where hidden instructions are planted in the corpora and hijack model behavior once retrieved. Previous studies have highlighted this risk but often avoid the hardest step: ensuring that malicious content is actually retrieved. In practice, unoptimized IPI is rarely retrieved under natural queries, which leaves its real-world impact unclear. We address this challenge by decomposing the malicious content into a trigger fragment that guarantees retrieval and an attack fragment that encodes arbitrary attack objectives. Based on this idea, we design an efficient and effective black-box attack algorithm that constructs a compact trigger fragment to guarantee retrieval for any attack fragment. Our attack requires only API access to embedding models, is cost-efficient (as little as $0.21 per target user query on OpenAI's embedding models), and achieves near-100% retrieval across 11 benchmarks and 8 embedding models (including both open-source models and proprietary services). Based on this attack, we present the first end-to-end IPI exploits under natural queries and realistic external corpora, spanning both RAG and agentic systems with diverse attack objectives. These results establish IPI as a practical and severe threat: when a user issued a natural query to summarize emails on frequently asked topics, a single poisoned email was sufficient to coerce GPT-4o into exfiltrating SSH keys with over 80% success in a multi-agent workflow. We further evaluate several defenses and find that they are insufficient to prevent the retrieval of malicious text, highlighting retrieval as a critical open vulnerability.

#40 How Secure is Secure Code Generation? Adversarial Prompts Put LLM Defenses to the Test

著者: Melissa Tessa, Iyiola E. Olatunji, Aicha War, Jacques Klein, Tegawend\'e F. Bissyand\'e

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07084

要約:
Recent secure code generation methods, using vulnerability-aware fine-tuning, prefix-tuning, and prompt optimization, claim to prevent LLMs from producing insecure code. However, their robustness under adversarial conditions remains untested, and current evaluations decouple security from functionality, potentially inflating reported gains. We present the first systematic adversarial audit of state-of-the-art secure code generation methods (SVEN, SafeCoder, PromSec). We subject them to realistic prompt perturbations such as paraphrasing, cue inversion, and context manipulation that developers might inadvertently introduce or adversaries deliberately exploit. To enable fair comparison, we evaluate all methods under consistent conditions, jointly assessing security and functionality using multiple analyzers and executable tests. Our findings reveal critical robustness gaps: static analyzers overestimate security by 7 to 21 times, with 37 to 60% of ``secure'' outputs being non-functional. Under adversarial conditions, true secure-and-functional rates collapse to 3 to 17%. Based on these findings, we propose best practices for building and evaluating robust secure code generation methods. Our code is available.

#41 Enhancing Cloud Network Resilience via a Robust LLM-Empowered Multi-Agent Reinforcement Learning Framework

agent

著者: Yixiao Peng, Hao Hu, Feiyang Li, Xinye Cao, Yingchang Jiang, Jipeng Tang, Guoshun Nan, Yuling Liu

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07122

要約:
While virtualization and resource pooling empower cloud networks with structural flexibility and elastic scalability, they inevitably expand the attack surface and challenge cyber resilience. Reinforcement Learning (RL)-based defense strategies have been developed to optimize resource deployment and isolation policies under adversarial conditions, aiming to enhance system resilience by maintaining and restoring network availability. However, existing approaches lack robustness as they require retraining to adapt to dynamic changes in network structure, node scale, attack strategies, and attack intensity. Furthermore, the lack of Human-in-the-Loop (HITL) support limits interpretability and flexibility. To address these limitations, we propose CyberOps-Bots, a hierarchical multi-agent reinforcement learning framework empowered by Large Language Models (LLMs). Inspired by MITRE ATT&CK's Tactics-Techniques model, CyberOps-Bots features a two-layer architecture: (1) An upper-level LLM agent with four modules--ReAct planning, IPDRR-based perception, long-short term memory, and action/tool integration--performs global awareness, human intent recognition, and tactical planning; (2) Lower-level RL agents, developed via heterogeneous separated pre-training, execute atomic defense actions within localized network regions. This synergy preserves LLM adaptability and interpretability while ensuring reliable RL execution. Experiments on real cloud datasets show that, compared to state-of-the-art algorithms, CyberOps-Bots maintains network availability 68.5% higher and achieves a 34.7% jumpstart performance gain when shifting the scenarios without retraining. To our knowledge, this is the first study to establish a robust LLM-RL framework with HITL support for cloud defense. We will release our framework to the community, facilitating the advancement of robust and autonomous defense in cloud networks.

#42 Proof of Reasoning for Privacy Enhanced Federated Blockchain Learning at the Edge

privacy

著者: James Calo, Benny Lo

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07134

要約:
Consensus mechanisms are the core of any blockchain system. However, the majority of these mechanisms do not target federated learning directly nor do they aid in the aggregation step. This paper introduces Proof of Reasoning (PoR), a novel consensus mechanism specifically designed for federated learning using blockchain, aimed at preserving data privacy, defending against malicious attacks, and enhancing the validation of participating networks. Unlike generic blockchain consensus mechanisms commonly found in the literature, PoR integrates three distinct processes tailored for federated learning. Firstly, a masked autoencoder (MAE) is trained to generate an encoder that functions as a feature map and obfuscates input data, rendering it resistant to human reconstruction and model inversion attacks. Secondly, a downstream classifier is trained at the edge, receiving input from the trained encoder. The downstream network's weights, a single encoded datapoint, the network's output and the ground truth are then added to a block for federated aggregation. Lastly, this data facilitates the aggregation of all participating networks, enabling more complex and verifiable aggregation methods than previously possible. This three-stage process results in more robust networks with significantly reduced computational complexity, maintaining high accuracy by training only the downstream classifier at the edge. PoR scales to large IoT networks with low latency and storage growth, and adapts to evolving data, regulations, and network conditions.

#43 MacPrompt: Maraconic-guided Jailbreak against Text-to-Image Models

著者: Xi Ye, Yiwen Liu, Lina Wang, Run Wang, Geying Yang, Yufei Hou, Jiayi Yu

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07141

要約:
Text-to-image (T2I) models have raised increasing safety concerns due to their capacity to generate NSFW and other banned objects. To mitigate these risks, safety filters and concept removal techniques have been introduced to block inappropriate prompts or erase sensitive concepts from the models. However, all the existing defense methods are not well prepared to handle diverse adversarial prompts. In this work, we introduce MacPrompt, a novel black-box and cross-lingual attack that reveals previously overlooked vulnerabilities in T2I safety mechanisms. Unlike existing attacks that rely on synonym substitution or prompt obfuscation, MacPrompt constructs macaronic adversarial prompts by performing cross-lingual character-level recombination of harmful terms, enabling fine-grained control over both semantics and appearance. By leveraging this design, MacPrompt crafts prompts with high semantic similarity to the original harmful inputs (up to 0.96) while bypassing major safety filters (up to 100%). More critically, it achieves attack success rates as high as 92% for sex-related content and 90% for violence, effectively breaking even state-of-the-art concept removal defenses. These results underscore the pressing need to reassess the robustness of existing T2I safety mechanisms against linguistically diverse and fine-grained adversarial strategies.

#44 Safe-FedLLM: Delving into the Safety of Federated Large Language Models

著者: Mingxiang Tao, Yu Tian, Wenxuan Tu, Yue Yang, Xue Yang, Xiangyan Tang

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07177

要約:
Federated learning (FL) addresses data privacy and silo issues in large language models (LLMs). Most prior work focuses on improving the training efficiency of federated LLMs. However, security in open environments is overlooked, particularly defenses against malicious clients. To investigate the safety of LLMs during FL, we conduct preliminary experiments to analyze potential attack surfaces and defensible characteristics from the perspective of Low-Rank Adaptation (LoRA) weights. We find two key properties of FL: 1) LLMs are vulnerable to attacks from malicious clients in FL, and 2) LoRA weights exhibit distinct behavioral patterns that can be filtered through simple classifiers. Based on these properties, we propose Safe-FedLLM, a probe-based defense framework for federated LLMs, constructing defenses across three dimensions: Step-Level, Client-Level, and Shadow-Level. The core concept of Safe-FedLLM is to perform probe-based discrimination on the LoRA weights locally trained by each client during FL, treating them as high-dimensional behavioral features and using lightweight classification models to determine whether they possess malicious attributes. Extensive experiments demonstrate that Safe-FedLLM effectively enhances the defense capability of federated LLMs without compromising performance on benign data. Notably, our method effectively suppresses malicious data impact without significant impact on training speed, and remains effective even with many malicious clients. Our code is available at: https://github.com/dmqx/Safe-FedLLM.

#45 Defenses Against Prompt Attacks Learn Surface Heuristics

著者: Shawn Li, Chenxiao Yu, Zhiyu Ni, Hao Li, Charith Peris, Chaowei Xiao, Yue Zhao

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07185

要約:
Large language models (LLMs) are increasingly deployed in security-sensitive applications, where they must follow system- or developer-specified instructions that define the intended task behavior, while completing benign user requests. When adversarial instructions appear in user queries or externally retrieved content, models may override intended logic. Recent defenses rely on supervised fine-tuning with benign and malicious labels. Although these methods achieve high attack rejection rates, we find that they rely on narrow correlations in defense data rather than harmful intent, leading to systematic rejection of safe inputs. We analyze three recurring shortcut behaviors induced by defense fine-tuning. \emph{Position bias} arises when benign content placed later in a prompt is rejected at much higher rates; across reasoning benchmarks, suffix-task rejection rises from below \textbf{10\%} to as high as \textbf{90\%}. \emph{Token trigger bias} occurs when strings common in attack data raise rejection probability even in benign contexts; inserting a single trigger token increases false refusals by up to \textbf{50\%}. \emph{Topic generalization bias} reflects poor generalization beyond the defense data distribution, with defended models suffering test-time accuracy drops of up to \textbf{40\%}. These findings suggest that current prompt-injection defenses frequently respond to attack-like surface patterns rather than the underlying intent. We introduce controlled diagnostic datasets and a systematic evaluation across two base models and multiple defense pipelines, highlighting limitations of supervised fine-tuning for reliable LLM security.

#46 BlindU: Blind Machine Unlearning without Revealing Erasing Data

privacy

著者: Weiqi Wang, Zhiyi Tian, Chenhan Zhang, Shui Yu

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07214

要約:
Machine unlearning enables data holders to remove the contribution of their specified samples from trained models to protect their privacy. However, it is paradoxical that most unlearning methods require the unlearning requesters to firstly upload their data to the server as a prerequisite for unlearning. These methods are infeasible in many privacy-preserving scenarios where servers are prohibited from accessing users' data, such as federated learning (FL). In this paper, we explore how to implement unlearning under the condition of not uncovering the erasing data to the server. We propose \textbf{Blind Unlearning (BlindU)}, which carries out unlearning using compressed representations instead of original inputs. BlindU only involves the server and the unlearning user: the user locally generates privacy-preserving representations, and the server performs unlearning solely on these representations and their labels. For the FL model training, we employ the information bottleneck (IB) mechanism. The encoder of the IB-based FL model learns representations that distort maximum task-irrelevant information from inputs, allowing FL users to generate compressed representations locally. For effective unlearning using compressed representation, BlindU integrates two dedicated unlearning modules tailored explicitly for IB-based models and uses a multiple gradient descent algorithm to balance forgetting and utility retaining. While IB compression already provides protection for task-irrelevant information of inputs, to further enhance the privacy protection, we introduce a noise-free differential privacy (DP) masking method to deal with the raw erasing data before compressing. Theoretical analysis and extensive experimental results illustrate the superiority of BlindU in privacy protection and unlearning effectiveness compared with the best existing privacy-preserving unlearning benchmarks.

#47 When Bots Take the Bait: Exposing and Mitigating the Emerging Social Engineering Attack in Web Automation Agent

agent

著者: Xinyi Wu, Geng Hong, Yueyue Chen, MingXuan Liu, Feier Jin, Xudong Pan, Jiarun Dai, Baojun Liu

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07263

要約:
Web agents, powered by large language models (LLMs), are increasingly deployed to automate complex web interactions. The rise of open-source frameworks (e.g., Browser Use, Skyvern-AI) has accelerated adoption, but also broadened the attack surface. While prior research has focused on model threats such as prompt injection and backdoors, the risks of social engineering remain largely unexplored. We present the first systematic study of social engineering attacks against web automation agents and design a pluggable runtime mitigation solution. On the attack side, we introduce the AgentBait paradigm, which exploits intrinsic weaknesses in agent execution: inducement contexts can distort the agent's reasoning and steer it toward malicious objectives misaligned with the intended task. On the defense side, we propose SUPERVISOR, a lightweight runtime module that enforces environment and intention consistency alignment between webpage context and intended goals to mitigate unsafe operations before execution. Empirical results show that mainstream frameworks are highly vulnerable to AgentBait, with an average attack success rate of 67.5% and peaks above 80% under specific strategies (e.g., trusted identity forgery). Compared with existing lightweight defenses, our module can be seamlessly integrated across different web automation frameworks and reduces attack success rates by up to 78.1% on average while incurring only a 7.7% runtime overhead and preserving usability. This work reveals AgentBait as a critical new threat surface for web agents and establishes a practical, generalizable defense, advancing the security of this rapidly emerging ecosystem. We reported the details of this attack to the framework developers and received acknowledgment before submission.

#48 A High-Recall Cost-Sensitive Machine Learning Framework for Real-Time Online Banking Transaction Fraud Detection

著者: Karthikeyan V. R., Premnath S., Kavinraaj S., J. Sangeetha

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07276

要約:
Fraudulent activities on digital banking services are becoming more intricate by the day, challenging existing defenses. While older rule driven methods struggle to keep pace, even precision focused algorithms fall short when new scams are introduced. These tools typically overlook subtle shifts in criminal behavior, missing crucial signals. Because silent breaches cost institutions far more than flagged but legitimate actions, catching every possible case is crucial. High sensitivity to actual threats becomes essential when oversight leads to heavy losses. One key aim here involves reducing missed fraud cases without spiking incorrect alerts too much. This study builds a system using group learning methods adjusted through smart threshold choices. Using real world transaction records shared openly, where cheating acts rarely appear among normal activities, tests are run under practical skewed distributions. The outcomes reveal that approximately 91 percent of actual fraud is detected, outperforming standard setups that rely on unchanging rules when dealing with uneven examples across classes. When tested in live settings, the fraud detection system connects directly to an online banking transaction flow, stopping questionable activities before they are completed. Alongside this setup, a browser add on built for Chrome is designed to flag deceptive web links and reduce threats from harmful sites. These results show that adjusting decisions by cost impact and validating across entire systems makes deployment more stable and realistic for today's digital banking platforms.

#49 Memory-Based Malware Detection under Limited Data Conditions: A Comparative Evaluation of TabPFN and Ensemble Models

著者: Valentin Leroy, Shuvalaxmi Dass, Sharif Ullah

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07305

要約:
Artificial intelligence and machine learning have significantly advanced malware research by enabling automated threat detection and behavior analysis. However, the availability of exploitable data is limited, due to the absence of large datasets with real-world data. Despite the progress of AI in cybersecurity, malware analysis still suffers from this data scarcity, which limits model generalization. In order to tackle this difficulty, this workinvestigates TabPFN, a learning-free model designed for low-data regimes. We evaluate its performance against established baselines such as Random Forest, LightGBM and XGBoost, across multiple class configurations. Our experimental results indicate that TabPFN surpasses all other models in low-data regimes, with a 2% to 6% improvement observed across multiple performance metrics. However, this increase in performance has an impact on its computation time in a particular case. These findings highlight both the promise and the practical limitations of integrating TabPFN into cybersecurity workflows.

#50 Examining the Effectiveness of Transformer-Based Smart Contract Vulnerability Scan

著者: Emre Balci, Timucin Aydede, Gorkem Yilmaz, Ece Gelal Soyak

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07334

要約:
Smart contract technology facilitates self-executing agreements on the blockchain, eliminating dependency on an external trusted authority. However, smart contracts may expose vulnerabilities that can lead to financial losses and disruptions in decentralized applications. In this work, we evaluate deep learning-based approaches for vulnerability scanning of Ethereum smart contracts. We propose VASCOT, a Vulnerability Analyzer for Smart COntracts using Transformers, which performs sequential analysis of Ethereum Virtual Machine (EVM) bytecode and incorporates a sliding window mechanism to overcome input length constraints. To assess VASCOT's detection efficacy, we construct a dataset of 16,469 verified Ethereum contracts deployed in 2022, and annotate it using trace analysis with concrete validation to mitigate false positives. VASCOT's performance is then compared against a state-of-the-art LSTM-based vulnerability detection model on both our dataset and an older public dataset. Our findings highlight the strengths and limitations of each model, providing insights into their detection capabilities and generalizability.

#51 MCP-ITP: An Automated Framework for Implicit Tool Poisoning in MCP

backdoor

著者: Ruiqi Li, Zhiqiang Wang, Yunhao Yao, Xiang-Yang Li

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07395

要約:
To standardize interactions between LLM-based agents and their environments, the Model Context Protocol (MCP) was proposed and has since been widely adopted. However, integrating external tools expands the attack surface, exposing agents to tool poisoning attacks. In such attacks, malicious instructions embedded in tool metadata are injected into the agent context during MCP registration phase, thereby manipulating agent behavior. Prior work primarily focuses on explicit tool poisoning or relied on manually crafted poisoned tools. In contrast, we focus on a particularly stealthy variant: implicit tool poisoning, where the poisoned tool itself remains uninvoked. Instead, the instructions embedded in the tool metadata induce the agent to invoke a legitimate but high-privilege tool to perform malicious operations. We propose MCP-ITP, the first automated and adaptive framework for implicit tool poisoning within the MCP ecosystem. MCP-ITP formulates poisoned tool generation as a black-box optimization problem and employs an iterative optimization strategy that leverages feedback from both an evaluation LLM and a detection LLM to maximize Attack Success Rate (ASR) while evading current detection mechanisms. Experimental results on the MCPTox dataset across 12 LLM agents demonstrate that MCP-ITP consistently outperforms the manually crafted baseline, achieving up to 84.2% ASR while suppressing the Malicious Tool Detection Rate (MDR) to as low as 0.3%.

#52 Peacock: UEFI Firmware Runtime Observability Layer for Detection and Response

著者: Hadar Cochavi Gorelik, Orel Fadlon, Denis Klimov, Oleg Brodt, Asaf Shabtai, Yuval Elovici

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07402

要約:
Modern computing platforms rely on the Unified Extensible Firmware Interface (UEFI) to initialize hardware and coordinate the transition to the operating system. Because this execution environment operates with high privileges and persists across reboots, it has increasingly become a target for advanced threats, including bootkits documented in real systems. Existing protections, including Secure Boot and static signature verification, are insufficient against adversaries who exploit runtime behavior or manipulate firmware components after signature checks have completed. In contrast to operating system (OS) environments, where mature tools provide dynamic inspection and incident response, the pre-OS stage lacks practical mechanisms for real-time visibility and threat detection. We present Peacock, a modular framework that introduces integrity-assured monitoring and remote verification for the UEFI boot process. Peacock consists of three components: (i) a UEFI-based agent that records Boot and Runtime Service activity with cryptographic protection against tampering; (ii) a cross-platform OS Agent that extracts the recorded measurements and produces a verifiable attestation bundle using hardware-backed guarantees from the platform's trusted module; and (iii) a Peacock Server that verifies attestation results and exports structured telemetry for enterprise detection. Our evaluation shows that Peacock reliably detects multiple real-world UEFI bootkits, including Glupteba, BlackLotus, LoJax, and MosaicRegressor. Taken together, these results indicate that Peacock provides practical visibility and verification capabilities within the firmware layer, addressing threats that bypass traditional OS-level security mechanisms.

#53 Principal ideal problem and ideal shortest vector over rational primes in power-of-two cyclotomic fields

著者: Gaohao Cui, Jianing Li, Jincheng Zhuang

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07511

要約:
The shortest vector problem (SVP) over ideal lattices is closely related to the Ring-LWE problem, which is widely used to build post-quantum cryptosystems. Power-of-two cyclotomic fields are frequently adopted to instantiate Ring-LWE. Pan et al. (EUROCRYPT~2021) explored the SVP over ideal lattices via the decomposition fields and, in particular determined the length of ideal lattices over rational primes $p\equiv3,5\pmod{8}$ in power-of-two cyclotomic fields via explicit construction of reduced lattice bases. In this work, we first provide a new method (different from analyzing lattice bases) to analyze the length of the shortest vector in prime ideals in $\mathbb{Z}[\zeta_{2^{n+1}}]$ when $p\equiv3,5\pmod{8}$. Then we precisely characterize the length of the shortest vector on the cases of $p\equiv7,9\pmod{16}$. Furthermore, we derive a new upper bound for this length, which is tighter than the bound obtained from Minkowski's theorem. Our key technique is to investigate whether a generator of a principal ideal can achieve the shortest length after embedding as a vector. If this holds for the ideal, finding the shortest vector in this ideal can be reduced to finding its shortest generator.

#54 A Protocol-Aware P4 Pipeline for MQTT Security and Anomaly Mitigation in Edge IoT Systems

著者: Bui Ngoc Thanh Binh, Pham Hoai Luan, Le Vu Trung Duong, Vu Tuan Hai, Yasuhiko Nakashima

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07536

要約:
MQTT is the dominant lightweight publish--subscribe protocol for IoT deployments, yet edge security remains inadequate. Cloud-based intrusion detection systems add latency that is unsuitable for real-time control, while CPU-bound firewalls and generic SDN controllers lack MQTT awareness to enforce session validation, topic-based authorization, and behavioral anomaly detection. We propose a P4-based data-plane enforcement scheme for protocol-aware MQTT security and anomaly detection at the network edge. The design combines parser-safe MQTT header extraction with session-order validation, byte-level topic-prefix authorization with per-client rate limiting and soft-cap enforcement, and lightweight anomaly detection based on KeepAlive and Remaining Length screening with clone-to-CPU diagnostics. The scheme leverages stateful primitives in BMv2 (registers, meters, direct counters) to enable runtime policy adaptation with minimal per-packet latency. Experiments on a Mininet/BMv2 testbed demonstrate high policy enforcement accuracy (99.8%, within 95% CI), strong anomaly detection sensitivity (98\% true-positive rate), and high delivery >99.9% for 100--5~kpps; 99.8% at 10~kpps; 99.6\% at 16~kpps) with sub-millisecond per-packet latency. These results show that protocol-aware MQTT filtering can be efficiently realized in the programmable data plane, providing a practical foundation for edge IoT security. Future work will validate the design on production P4 hardware and integrate machine learning--based threshold adaptation.

#55 Simple Power Analysis of Polynomial Multiplication in HQC

著者: Pavel Velek, Tom\'a\v{s} Rabas, Ji\v{r}\'i Bu\v{c}ek

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07634

要約:
The Hamming Quasi-Cyclic (HQC) cryptosystem was selected for standardization in the fourth round of the NIST Post-Quantum Cryptography (PQC) standardization project. The goal of the PQC project is to standardize one or more quantum-resistant public-key cryptographic algorithms. In this paper, we present a single-trace Simple Power Analysis (SPA) attack against HQC that exploits power consumption leakage that occurs during polynomial multiplication performed at the beginning of HQC decryption. Using the ChipWhisperer-Lite board, we perform and evaluate the attack, achieving a 99.69% success rate over 10 000 attack attempts. We also propose various countermeasures against the attack and evaluate their time complexity.

#56 Hagenberg Risk Management Process (Part 1): Multidimensional Polar Heatmaps for Context-Sensitive Risk Analysis

著者: Eckehard Hermann, Harald Lampesberger

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07644

要約:
Traditional two-dimensional risk matrices (heatmaps) are widely used to model and visualize likelihood and impact relationships, but they face fundamental methodological limitations when applied to complex infrastructures. In particular, regulatory frameworks such as NIS2 and DORA call for more context-sensitive and system-oriented risk analysis. We argue that incorporating contextual dimensions into heatmaps enhances their analytical value. As a first step towards our Hagenberg Risk Management Process for complex infrastructures and systems, this paper introduces a multidimensional (ND) polar heatmap as a formal model that explicitly integrates additional context dimensions and subsumes classical two-dimensional models as a special case.

#57 Towards Automating Blockchain Consensus Verification with IsabeLLM

著者: Elliot Jones, William Knottenbelt

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07654

要約:
Consensus protocols are crucial for a blockchain system as they are what allow agreement between the system's nodes in a potentially adversarial environment. For this reason, it is paramount to ensure their correct design and implementation to prevent such adversaries from carrying out malicious behaviour. Formal verification allows us to ensure the correctness of such protocols, but requires high levels of effort and expertise to carry out and thus is often omitted in the development process. In this paper, we present IsabeLLM, a tool that integrates the proof assistant Isabelle with a Large Language Model to assist and automate proofs. We demonstrate the effectiveness of IsabeLLM by using it to develop a novel model of Bitcoin's Proof of Work consensus protocol and verify its correctness. We use the DeepSeek R1 API for this demonstration and found that we were able to generate correct proofs for each of the non-trivial lemmas present in the verification.

#58 TeeMAF: A TEE-Based Mutual Attestation Framework for On-Chain and Off-Chain Functions in Blockchain DApps

著者: Xiangyu Liu, Brian Lee, Yuansong Qiao

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07726

要約:
The rapid development of Internet of Things (IoT) technology has led to growing concerns about data security and user privacy in the interactions within distributed systems. Decentralized Applications (DApps) in distributed systems consist of on-chain and off-chain functions, where on-chain functions are smart contracts running in the blockchain network, while off-chain functions operate outside the blockchain. Since smart contracts cannot access off-chain information, they cannot verify whether the off-chain functions, i.e. the software components, they interact with have been tampered or not. As a result, establishing mutual trust between the on-chain smart contracts and the off-chain functions remains a significant challenge. To address the challenge, this paper introduces TeeMAF, a generic framework for mutual attestation between on-chain and off-chain functions, leveraging Trusted Execution Environments (TEE), specifically Intel Software Guard Extensions (SGX), SCONE (a TEE container on top of Intel SGX), and remote attestation technologies. This ensures that the deployed off-chain functions of a DApp execute in a provably secure computing environment and achieve mutual attestation with the interacting on-chain functions. Through a security analysis of TeeMAF, the reliability of deployed DApps can be verified, ensuring their correct execution. Furthermore, based on this framework, this paper proposes a decentralized resource orchestration platform (a specific DApp) for deploying applications over untrusted environments. The system is implemented on Ethereum and benchmarked using Hyperledger Caliper. Performance evaluation focusing on throughput and latency demonstrates that, compared to platforms without a mutual attestation scheme, the performance overhead remains within an acceptable range.

#59 SecureCAI: Injection-Resilient LLM Assistants for Cybersecurity Operations

著者: Mohammed Himayath Ali, Mohammed Aqib Abdullah, Mohammed Mudassir Uddin, Shahnawaz Alam

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07835

要約:
Large Language Models have emerged as transformative tools for Security Operations Centers, enabling automated log analysis, phishing triage, and malware explanation; however, deployment in adversarial cybersecurity environments exposes critical vulnerabilities to prompt injection attacks where malicious instructions embedded in security artifacts manipulate model behavior. This paper introduces SecureCAI, a novel defense framework extending Constitutional AI principles with security-aware guardrails, adaptive constitution evolution, and Direct Preference Optimization for unlearning unsafe response patterns, addressing the unique challenges of high-stakes security contexts where traditional safety mechanisms prove insufficient against sophisticated adversarial manipulation. Experimental evaluation demonstrates that SecureCAI reduces attack success rates by 94.7% compared to baseline models while maintaining 95.1% accuracy on benign security analysis tasks, with the framework incorporating continuous red-teaming feedback loops enabling dynamic adaptation to emerging attack strategies and achieving constitution adherence scores exceeding 0.92 under sustained adversarial pressure, thereby establishing a foundation for trustworthy integration of language model capabilities into operational cybersecurity workflows and addressing a critical gap in current approaches to AI safety within adversarial domains.

#60 How Generative AI Empowers Attackers and Defenders Across the Trust & Safety Landscape

著者: Patrick Gage Kelley, Steven Rousso-Schindler, Renee Shelby, Kurt Thomas, Allison Woodruff

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06033

要約:
Generative AI (GenAI) is a powerful technology poised to reshape Trust & Safety. While misuse by attackers is a growing concern, its defensive capacity remains underexplored. This paper examines these effects through a qualitative study with 43 Trust & Safety experts across five domains: child safety, election integrity, hate and harassment, scams, and violent extremism. Our findings characterize a landscape in which GenAI empowers both attackers and defenders. GenAI dramatically increases the scale and speed of attacks, lowering the barrier to entry for creating harmful content, including sophisticated propaganda and deepfakes. Conversely, defenders envision leveraging GenAI to detect and mitigate harmful content at scale, conduct investigations, deploy persuasive counternarratives, improve moderator wellbeing, and offer user support. This work provides a strategic framework for understanding GenAI's impact on Trust & Safety and charts a path for its responsible use in creating safer online environments.

#61 Reliability and Admissibility of AI-Generated Forensic Evidence in Criminal Trials

著者: Sahibpreet Singh, Lalita Devi

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06048

要約:
This paper examines the admissibility of AI-generated forensic evidence in criminal trials. The growing adoption of AI presents promising results for investigative efficiency. Despite advancements, significant research gaps persist in practically understanding the legal limits of AI evidence in judicial processes. Existing literature lacks focused assessment of the evidentiary value of AI outputs. The objective of this study is to evaluate whether AI-generated evidence satisfies established legal standards of reliability. The methodology involves a comparative doctrinal legal analysis of evidentiary standards across common law jurisdictions. Preliminary results indicate that AI forensic tools can enhance scale of evidence analysis. However, challenges arise from reproducibility deficits. Courts exhibit variability in acceptance of AI evidence due to limited technical literacy and lack of standardized validation protocols. Liability implications reveal that developers and investigators may bear accountability for flawed outputs. This raises critical concerns related to wrongful conviction. The paper emphasizes the necessity of independent validation and, development of AI-specific admissibility criteria. Findings inform policy development for the responsible AI integration within criminal justice systems. The research advances the objectives of Sustainable Development Goal 16 by reinforcing equitable access to justice. Preliminary results contribute for a foundation for future empirical research in AI deployed criminal forensics.

#62 Nigeria's Digital Sovereignty: Analysis of Cybersecurity Legislation, Policies, and Strategies

著者: Polra Victor Falade, Oluwafemi Osho

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06050

要約:
This paper examines Nigeria's pursuit of digital sovereignty through two core instruments: the Cybercrimes (Prohibition, Prevention, etc.) Act and the National Cybersecurity Policy and Strategy (NCPS). Despite recent reforms, it remains unclear whether these frameworks effectively secure Nigeria's digital domain and advance its digital sovereignty amid escalating cross-border cyber threats. Using a multi-method, triangulated qualitative design that combines document analysis, secondary analysis of existing studies, expert insights, and direct observation of cybersecurity developments, the paper assesses how these instruments operate in practice. The Cybercrimes Act (2015, amended 2024) and NCPS (2015, revised 2021) have strengthened Nigeria's commitments to tackling cybercrime, regulating digital activities, and protecting critical infrastructure. Yet persistent gaps remain, including legislative ambiguities, weak enforcement, uneven threat prioritization, limited institutional coordination, and loss of skilled professionals. The paper argues that achieving digital sovereignty will require stronger implementation, sustainable resourcing, workforce retention, and clearer accountability mechanisms to translate policy ambition into tangible and durable security outcomes.

#63 AI Safeguards, Generative AI and the Pandora Box: AI Safety Measures to Protect Businesses and Personal Reputation

著者: Prasanna Kumar

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06197

要約:
Generative AI has unleashed the power of content generation and it has also unwittingly opened the pandora box of realistic deepfake causing a number of social hazards and harm to businesses and personal reputation. The investigation & ramification of Generative AI technology across industries, the resolution & hybridization detection techniques using neural networks allows flagging of the content. Good detection techniques & flagging allow AI safety - this is the main focus of this paper. The research provides a significant method for efficiently detecting dark side problems by imposing a Temporal Consistency Learning (TCL) technique. Through pretrained Temporal Convolutional Networks (TCNs) model training and performance comparison, this paper showcases that TCN models outperforms the other approaches and achieves significant accuracy for five dark side problems. Findings highlight how important it is to take proactive measures in identification to reduce any potential risks associated with generative artificial intelligence.

#64 Leveraging Soft Prompts for Privacy Attacks in Federated Prompt Tuning

privacy

著者: Quan Minh Nguyen, Min-Seon Kim, Hoang M. Ngo, Trong Nghia Hoang, Hyuk-Yoon Kwon, My T. Thai

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06641

要約:
Membership inference attack (MIA) poses a significant privacy threat in federated learning (FL) as it allows adversaries to determine whether a client's private dataset contains a specific data sample. While defenses against membership inference attacks in standard FL have been well studied, the recent shift toward federated fine-tuning has introduced new, largely unexplored attack surfaces. To highlight this vulnerability in the emerging FL paradigm, we demonstrate that federated prompt-tuning, which adapts pre-trained models with small input prefixes to improve efficiency, also exposes a new vector for privacy attacks. We propose PromptMIA, a membership inference attack tailored to federated prompt-tuning, in which a malicious server can insert adversarially crafted prompts and monitors their updates during collaborative training to accurately determine whether a target data point is in a client's private dataset. We formalize this threat as a security game and empirically show that PromptMIA consistently attains high advantage in this game across diverse benchmark datasets. Our theoretical analysis further establishes a lower bound on the attack's advantage which explains and supports the consistently high advantage observed in our empirical results. We also investigate the effectiveness of standard membership inference defenses originally developed for gradient or output based attacks and analyze their interaction with the distinct threat landscape posed by PromptMIA. The results highlight non-trivial challenges for current defenses and offer insights into their limitations, underscoring the need for defense strategies that are specifically tailored to prompt-tuning in federated settings.

#65 Forgetting Similar Samples: Can Machine Unlearning Do it Better?

privacy

著者: Heng Xu, Tianqing Zhu, Dayong Ye, Lefeng Zhang, Le Wang, Wanlei Zhou

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06938

要約:
Machine unlearning, a process enabling pre-trained models to remove the influence of specific training samples, has attracted significant attention in recent years. Although extensive research has focused on developing efficient machine unlearning strategies, we argue that these methods mainly aim at removing samples rather than removing samples' influence on the model, thus overlooking the fundamental definition of machine unlearning. In this paper, we first conduct a comprehensive study to evaluate the effectiveness of existing unlearning schemes when the training dataset includes many samples similar to those targeted for unlearning. Specifically, we evaluate: Do existing unlearning methods truly adhere to the original definition of machine unlearning and effectively eliminate all influence of target samples when similar samples are present in the training dataset? Our extensive experiments, conducted on four carefully constructed datasets with thorough analysis, reveal a notable gap between the expected and actual performance of most existing unlearning methods for image and language models, even for the retraining-from-scratch baseline. Additionally, we also explore potential solutions to enhance current unlearning approaches.

#66 A Robust Certified Machine Unlearning Method Under Distribution Shift

privacy

著者: Jinduo Guo, Yinzhi Cao

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.06967

要約:
The Newton method has been widely adopted to achieve certified unlearning. A critical assumption in existing approaches is that the data requested for unlearning are selected i.i.d.(independent and identically distributed). However,the problem of certified unlearning under non-i.i.d. deletions remains largely unexplored. In practice, unlearning requests are inherently biased, leading to non-i.i.d. deletions and causing distribution shifts between the original and retained datasets. In this paper, we show that certified unlearning with the Newton method becomes inefficient and ineffective under non-i.i.d. unlearning sets. We then propose a better certified unlearning approach by performing a distribution-aware certified unlearning framework based on iterative Newton updates constrained by a trust region. Our method provides a closer approximation to the retrained model and yields a tighter pre-run bound on the gradient residual, thereby ensuring efficient (epsilon, delta)-certified unlearning. To demonstrate its practical effectiveness under distribution shift, we also conduct extensive experiments across multiple evaluation metrics, providing a comprehensive assessment of our approach.

#67 Belief in False Information: A Human-Centered Security Risk in Sociotechnical Systems

著者: Fabian Walke, Thadd\"aa N\"urnberger

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07016

要約:
This paper provides a comprehensive literature review on the belief in false information, including misinformation, disinformation, and fake information. It addresses the increasing societal concern regarding false information, which is fueled by technological progress, especially advancements in artificial intelligence. This review systematically identifies and categorizes factors that influence the belief in false information. The review identifies 24 influence factors grouped into six main categories: demographic factors, personality traits, psychological factors, policy and values, media consumption, and preventive factors. Key findings highlight that lower education levels, high extraversion, low agreeableness, high neuroticism, and low cognitive reflection significantly increase belief in false information. The effectiveness of preventive strategies like labeling false information and promoting reflection about correctness is also discussed. This literature review conceptualizes belief in false information as a human-centered security risk in sociotechnical systems, as it can be exploited to manipulate decisions, undermine trust, and increase susceptibility to social engineering. It aims to inform preventive strategies that strengthen socio-technical security and societal resilience.

#68 PASS-Enabled Covert Communications With Distributed Cooperative Wardens

著者: Ji He

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07147

要約:
This paper investigates PASS-enabled downlink covert communication in the presence of distributed surveillance, where multiple wardens perform signal detection and fuse their local binary decisions via majority-voting rule. We consider a dual-waveguide architecture that simultaneously delivers covert information and randomized jamming to hide the transmission footprint, incorporating three representative PASS power-radiation laws-general, proportional, and equal. To characterize the system-level detectability, we derive closed-form expressions for local false-alarm and miss-detection probabilities. By leveraging a probability-generating-function (PGF) and elementary-symmetric-polynomial (ESP) framework, combined with a breakpoint-based partition of the threshold domain, we obtain explicit closed-form characterizations of the system-level detection error probability (DEP) under non-i.i.d. majority-voting fusion. Building on this analytical framework, we formulate a robust optimization problem to maximize the average covert rate subject to covertness constraint. To solve the resulting nonconvex design, we develop an MM-BCD-SCA algorithm that produces tractable alternating updates for power/radiation variables and PA positions via convex surrogates and inner approximations of the DEP value function. Numerical results validate the theoretical analysis and demonstrate the impact of cooperative monitoring and PASS radiation laws on the covertness-rate tradeoff.

#69 Universal Adversarial Purification with DDIM Metric Loss for Stable Diffusion

diffusion

著者: Li Zheng, Liangbin Xie, Jiantao Zhou, He YiMin

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.07253

要約:
Stable Diffusion (SD) often produces degraded outputs when the training dataset contains adversarial noise. Adversarial purification offers a promising solution by removing adversarial noise from contaminated data. However, existing purification methods are primarily designed for classification tasks and fail to address SD-specific adversarial strategies, such as attacks targeting the VAE encoder, UNet denoiser, or both. To address the gap in SD security, we propose Universal Diffusion Adversarial Purification (UDAP), a novel framework tailored for defending adversarial attacks targeting SD models. UDAP leverages the distinct reconstruction behaviors of clean and adversarial images during Denoising Diffusion Implicit Models (DDIM) inversion to optimize the purification process. By minimizing the DDIM metric loss, UDAP can effectively remove adversarial noise. Additionally, we introduce a dynamic epoch adjustment strategy that adapts optimization iterations based on reconstruction errors, significantly improving efficiency without sacrificing purification quality. Experiments demonstrate UDAP's robustness against diverse adversarial methods, including PID (VAE-targeted), Anti-DreamBooth (UNet-targeted), MIST (hybrid), and robustness-enhanced variants like Anti-Diffusion (Anti-DF) and MetaCloak. UDAP also generalizes well across SD versions and text prompts, showcasing its practical applicability in real-world scenarios.

#70 ThreatLinker: An NLP-based Methodology to Automatically Estimate CVE Relevance for CAPEC Attack Patterns

著者: Andrea Ciavotta, Alessandro Palma, Simone Lenti, Silvia Bonomi

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2501.07131

要約:
Threat analysis is continuously growing in importance due to the always-increasing complexity and frequency of cyber attacks. Analyzing threats demands significant effort from security experts: different cybersecurity knowledge bases support this task, but manual efforts are required to correlate heterogeneous sources into a unified view that would enable a more comprehensive assessment. To address this gap, we propose ThreatLinker, a methodology leveraging Natural Language Processing (NLP) to effectively and efficiently associate Common Vulnerabilities and Exposure (CVE) vulnerabilities with Common Attack Pattern Enumeration and Classification (CAPEC) attack patterns. The proposed technique combines semantic similarity with keyword analysis to improve the accuracy of association estimations. We contributed a larger dataset for CVE-CAPEC correlation, and experimental evaluations demonstrate superior performance compared to state-of-the-art models.

#71 Classifying Implementations of Cryptographic Primitives and Protocols that Use Post-Quantum Algorithms

著者: Tushin Mallick, Cristina Nita-Rotaru, Ashish Kundu, Ramana Kompella

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2503.17830

要約:
Classification techniques can be used to analyze system behaviors, network protocols, and cryptographic primitives based on identifiable traits. While useful for defense, such classification can also be leveraged by attackers to infer system configurations, detect vulnerabilities, and tailor attacks such as denial-of-service, key recovery, or downgrade attacks. In this paper, we study the feasibility of classifying post-quantum (PQ) algorithms by analyzing implementations of key exchange and digital signatures, their use within secure protocols, and their integration into SNARK generation libraries. Unlike traditional cryptography, PQ algorithms have larger memory requirements and variable computational costs. Our research examines two post-quantum cryptography libraries, liboqs and CIRCL, evaluating TLS, SSH, QUIC, OpenVPN, and OpenID Connect (OIDC) across Windows, Ubuntu, and macOS. We also analyze pysnark and lattice_zksnark for SNARK generation and verification on Ubuntu. Experimental results show that (1) classical and PQ key exchange and signature algorithms can be distinguished with accuracies of 98% and 100%; (2) specific PQ algorithms can be identified with 97% accuracy for key exchange and 86% for signatures; (3) implementations of the same algorithm in liboqs and CIRCL are distinguishable with up to 100% accuracy; and (4) within CIRCL, PQ and hybrid key exchange implementations can be distinguished with 97% accuracy. For secure protocols, we can determine whether key exchange is classical or PQ and identify the PQ algorithm used. SNARK generation and verification in pysnark and lattice_zksnark are distinguishable with 100% accuracy. We demonstrate real-world applicability by identifying PQ-enabled TLS domains in the Tranco dataset and integrating our methods into QUARTZ, an open-source risk and threat analyzer by Cisco.

#72 Provable Secure Steganography Based on Adaptive Dynamic Sampling

著者: Kaiyi Pang, Minhao Bai

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2504.12579

要約:
The security of private communication is increasingly at risk due to widespread surveillance. Steganography, a technique for embedding secret messages within innocuous carriers, enables covert communication over monitored channels. Provably Secure Steganography (PSS), which ensures computational indistinguishability between the normal model output and steganography output, is the state-of-the-art in this field. However, current PSS methods often require obtaining the explicit distributions of the model. In this paper, we propose a provably secure steganography scheme that only requires a model API that accepts a seed as input. Our core mechanism involves sampling a candidate set of tokens and constructing a map from possible message bit strings to these tokens. The output token is selected by applying this mapping to the real secret message, which provably preserves the original model's distribution. To ensure correct decoding, we address collision cases, where multiple candidate messages map to the same token, by maintaining and strategically expanding a dynamic collision set within a bounded size range. Extensive evaluations of three real-world datasets and three large language models demonstrate that our sampling-based method is comparable with existing PSS methods in efficiency and capacity.

#73 How to Backdoor the Knowledge Distillation

model extractionbackdoor

著者: Chen Wu, Qian Ma, Prasenjit Mitra, Sencun Zhu

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2504.21323

要約:
Knowledge distillation has become a cornerstone in modern machine learning systems, celebrated for its ability to transfer knowledge from a large, complex teacher model to a more efficient student model. Traditionally, this process is regarded as secure, assuming the teacher model is clean. This belief stems from conventional backdoor attacks relying on poisoned training data with backdoor triggers and attacker-chosen labels, which are not involved in the distillation process. Instead, knowledge distillation uses the outputs of a clean teacher model to guide the student model, inherently preventing recognition or response to backdoor triggers as intended by an attacker. In this paper, we challenge this assumption by introducing a novel attack methodology that strategically poisons the distillation dataset with adversarial examples embedded with backdoor triggers. This technique allows for the stealthy compromise of the student model while maintaining the integrity of the teacher model. Our innovative approach represents the first successful exploitation of vulnerabilities within the knowledge distillation process using clean teacher models. Through extensive experiments conducted across various datasets and attack settings, we demonstrate the robustness, stealthiness, and effectiveness of our method. Our findings reveal previously unrecognized vulnerabilities and pave the way for future research aimed at securing knowledge distillation processes against backdoor attacks.

#74 BadPatches: Routing-aware Backdoor Attacks on Vision Mixture of Experts

backdoor

著者: Cedric Chan, Jona te Lintelo, Stjepan Picek

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.01811

要約:
Mixture of Experts (MoE) architectures have gained popularity for reducing computational costs in deep neural networks by activating only a subset of parameters during inference. While this efficiency makes MoE attractive for vision tasks, the patch-based processing in vision models introduces new methods for adversaries to perform backdoor attacks. In this work, we investigate the vulnerability of vision MoE models for image classification, specifically the patch-based MoE (pMoE) models and MoE-based vision transformers, against backdoor attacks. We propose a novel routing-aware trigger application method BadPatches, which is designed for patch-based processing in vision MoE models. BadPatches applies triggers on image patches rather than on the entire image. We show that BadPatches achieves high attack success rates (ASRs) with lower poisoning rates than routing-agnostic triggers and is successful at poisoning rates as low as 0.01% with an ASR above 80% on pMoE. Moreover, BadPatches is still effective when an adversary does not have complete knowledge of the patch routing configuration of the considered models. Next, we explore how trigger design affects pMoE patch routing. Finally, we investigate fine-pruning as a defense. Results show that only the fine-tuning stage of fine-pruning removes the backdoor from the model.

#75 SRAF: Stealthy and Robust Adversarial Fingerprint for Copyright Verification of Large Language Models

intellectual property

著者: Zhebo Wang, Zhenhua Xu, Maike Li, Wenpeng Xing, Chunqiang Hu, Chen Zhi, Meng Han

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.06304

要約:
The protection of Intellectual Property (IP) for Large Language Models (LLMs) has become a critical concern as model theft and unauthorized commercialization escalate. While adversarial fingerprinting offers a promising black-box solution for ownership verification, existing methods suffer from significant limitations: they are fragile against model modifications, sensitive to system prompt variations, and easily detectable due to high-perplexity input patterns. In this paper, we propose SRAF, which employs a multi-task adversarial optimization strategy that jointly optimizes fingerprints across homologous model variants and diverse chat templates, allowing the fingerprint to anchor onto invariant decision boundary features. Furthermore, we introduce a Perplexity Hiding technique that embeds adversarial perturbations within Markdown tables, effectively aligning the prompt's statistics with natural language to evade perplexity-based detection. Experiments on Llama-2 variants demonstrate SRAF's superior robustness and stealthiness compared to state-of-the-art baselines, offering a practical black-box solution for ownership verification.

#76 Residual-PAC Privacy: Automatic Privacy Control Beyond the Gaussian Barrier

privacy

著者: Tao Zhang, Yevgeniy Vorobeychik

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2506.06530

要約:
The Probably Approximately Correct (PAC) Privacy framework [46] provides a powerful instance-based methodology to preserve privacy in complex data-driven systems. Existing PAC Privacy algorithms (we call them Auto-PAC) rely on a Gaussian mutual information upper bound. However, we show that the upper bound obtained by these algorithms is tight if and only if the perturbed mechanism output is jointly Gaussian with independent Gaussian noise. We propose two approaches for addressing this issue. First, we introduce two tractable post-processing methods for Auto-PAC, based on Donsker-Varadhan representation and sliced Wasserstein distances. However, the result still leaves wasted privacy budget. To address this issue more fundamentally, we introduce Residual-PAC (R-PAC) Privacy, an f-divergence-based measure to quantify privacy that remains after adversarial inference. To implement R-PAC Privacy in practice, we propose a Stackelberg Residual-PAC (SR-PAC) privatization mechanism, a game-theoretic framework that selects optimal noise distributions through convex bilevel optimization. Our approach achieves efficient privacy budget utilization for arbitrary data distributions and naturally composes when multiple mechanisms access the dataset. Our experiments demonstrate that SR-PAC consistently obtains a better privacy-utility tradeoff than both PAC and differential privacy baselines.

#77 AI Agent Smart Contract Exploit Generation

agent

著者: Arthur Gervais, Liyi Zhou

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2507.05558

要約:
Smart contract vulnerabilities have led to billions in losses, yet finding actionable exploits remains challenging. Traditional fuzzers rely on rigid heuristics and struggle with complex attacks, while human auditors are thorough but slow and don't scale. Large Language Models offer a promising middle ground, combining human-like reasoning with machine speed. Early studies show that simply prompting LLMs generates unverified vulnerability speculations with high false positive rates. To address this, we present A1, an agentic system that transforms any LLM into an end-to-end exploit generator. A1 provides agents with six domain-specific tools for autonomous vulnerability discovery, from understanding contract behavior to testing strategies on real blockchain states. All outputs are concretely validated through execution, ensuring only profitable proof-of-concept exploits are reported. We evaluate A1 across 36 real-world vulnerable contracts on Ethereum and Binance Smart Chain. A1 achieves a 63% success rate on the VERITE benchmark. Across all successful cases, A1 extracts up to \$8.59 million per exploit and \$9.33 million total. Using Monte Carlo analysis of historical attacks, we demonstrate that immediate vulnerability detection yields 86-89% success probability, dropping to 6-21% with week-long delays. Our economic analysis reveals a troubling asymmetry: attackers achieve profitability at \$6,000 exploit values while defenders require \$60,000 -- raising fundamental questions about whether AI agents inevitably favor exploitation over defense.

#78 Between a Rock and a Hard Place: The Tension Between Ethical Reasoning and Safety Alignment in LLMs

著者: Shei Pern Chua, Zhen Leng Thai, Kai Jun Teh, Xiao Li, Qibing Ren, Xiaolin Hu

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.05367

要約:
Large Language Model safety alignment predominantly operates on a binary assumption that requests are either safe or unsafe. This classification proves insufficient when models encounter ethical dilemmas, where the capacity to reason through moral trade-offs creates a distinct attack surface. We formalize this vulnerability through TRIAL, a multi-turn red-teaming methodology that embeds harmful requests within ethical framings. TRIAL achieves high attack success rates across most tested models by systematically exploiting the model's ethical reasoning capabilities to frame harmful actions as morally necessary compromises. Building on these insights, we introduce ERR (Ethical Reasoning Robustness), a defense framework that distinguishes between instrumental responses that enable harmful outcomes and explanatory responses that analyze ethical frameworks without endorsing harmful acts. ERR employs a Layer-Stratified Harm-Gated LoRA architecture, achieving robust defense against reasoning-based attacks while preserving model utility.

#79 Imitative Membership Inference Attack

privacy

著者: Yuntao Du, Yuetian Chen, Hanshen Xiao, Bruno Ribeiro, Ninghui Li

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2509.06796

要約:
A Membership Inference Attack (MIA) assesses how much a target machine learning model reveals about its training data by determining whether specific query instances were part of the training set. State-of-the-art MIAs rely on training hundreds of shadow models that are independent of the target model, leading to significant computational overhead. In this paper, we introduce Imitative Membership Inference Attack (IMIA), which employs a novel imitative training technique to strategically construct a small number of target-informed imitative models that closely replicate the target model's behavior for inference. Extensive experimental results demonstrate that IMIA substantially outperforms existing MIAs in various attack settings while only requiring less than 5% of the computational cost of state-of-the-art approaches.

#80 Constructions of Efficiently Implementable Boolean Functions with Provable Nonlinearity/Resiliency/Algebraic Immunity Trade-Offs

著者: Palash Sarkar

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.01720

要約:
We describe several families of efficiently implementable Boolean functions achieving provable trade-offs between resiliency, nonlinearity, and algebraic immunity. In particular, the following statement holds for each of the function families that we propose. Given integers $m_0\geq 0$, $x_0\geq 1$, and $a_0\geq 1$, it is possible to construct an $n$-variable function which has resiliency at least $m_0$, linear bias (which is an equivalent method of expressing nonlinearity) at most $2^{-x_0}$ and algebraic immunity at least $a_0$; further, $n$ is linear in $\max(m_0,x_0,a_0)$, and the function can be implemented using $O(n)$ 2-input gates, which is essentially optimal.

#81 Membership Inference Attacks on Tokenizers of Large Language Models

privacy

著者: Meng Tong, Yuntao Du, Kejiang Chen, Weiming Zhang

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.05699

要約:
Membership inference attacks (MIAs) are widely used to assess the privacy risks associated with machine learning models. However, when these attacks are applied to pre-trained large language models (LLMs), they encounter significant challenges, including mislabeled samples, distribution shifts, and discrepancies in model size between experimental and real-world settings. To address these limitations, we introduce tokenizers as a new attack vector for membership inference. Specifically, a tokenizer converts raw text into tokens for LLMs. Unlike full models, tokenizers can be efficiently trained from scratch, thereby avoiding the aforementioned challenges. In addition, the tokenizer's training data is typically representative of the data used to pre-train LLMs. Despite these advantages, the potential of tokenizers as an attack vector remains unexplored. To this end, we present the first study on membership leakage through tokenizers and explore five attack methods to infer dataset membership. Extensive experiments on millions of Internet samples reveal the vulnerabilities in the tokenizers of state-of-the-art LLMs. To mitigate this emerging risk, we further propose an adaptive defense. Our findings highlight tokenizers as an overlooked yet critical privacy threat, underscoring the urgent need for privacy-preserving mechanisms specifically designed for them.

#82 From static to adaptive: immune memory-based jailbreak detection for large language models

著者: Jun Leng, Yu Liu, Litian Zhang, Ruihan Hu, Zhuting Fang, Xi Zhang

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.03356

要約:
Large Language Models (LLMs) serve as the backbone of modern AI systems, yet they remain susceptible to adversarial jailbreak attacks. Consequently, robust detection of such malicious inputs is paramount for ensuring model safety. Traditional detection methods typically rely on external models trained on fixed, large-scale datasets, which often incur significant computational overhead. While recent methods shift toward leveraging internal safety signals of models to enable more lightweight and efficient detection. However, these methods remain inherently static and struggle to adapt to the evolving nature of jailbreak attacks. Drawing inspiration from the biological immune mechanism, we introduce the Immune Memory Adaptive Guard (IMAG) framework. By distilling and encoding safety patterns into a persistent, evolvable memory bank, IMAG enables adaptive generalization to emerging threats. Specifically, the framework orchestrates three synergistic components: Immune Detection, which employs retrieval for high-efficiency interception of known jailbreak attacks; Active Immunity, which performs proactive behavioral simulation to resolve ambiguous unknown queries; Memory Updating, which integrates validated attack patterns back into the memory bank. This closed-loop architecture transitions LLM defense from rigid filtering to autonomous adaptive mitigation. Extensive evaluations across five representative open-source LLMs demonstrate that our method surpasses state-of-the-art (SOTA) baselines, achieving a superior average detection accuracy of 94\% across diverse and complex attack types.

#83 Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs

agent

著者: Jinbo Liu, Defu Cao, Yifei Wei, Tianyao Su, Yuan Liang, Yushun Dong, Yan Liu, Yue Zhao, Xiyang Hu

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.04668

要約:
Graph topology is a fundamental determinant of memory leakage in multi-agent LLM systems, yet its effects remain poorly quantified. We introduce MAMA (Multi-Agent Memory Attack), a framework that measures how network structure shapes leakage. MAMA operates on synthetic documents containing labeled Personally Identifiable Information (PII) entities, from which we generate sanitized task instructions. We execute a two-phase protocol: Engram (seeding private information into a target agent's memory) and Resonance (multi-round interaction where an attacker attempts extraction). Over 10 rounds, we measure leakage as exact-match recovery of ground-truth PII from attacker outputs. We evaluate six canonical topologies (complete, ring, chain, tree, star, star-ring) across $n\in\{4,5,6\}$, attacker-target placements, and base models. Results are consistent: denser connectivity, shorter attacker-target distance, and higher target centrality increase leakage; most leakage occurs in early rounds and then plateaus; model choice shifts absolute rates but preserves topology ordering; spatiotemporal/location attributes leak more readily than identity credentials or regulated identifiers. We distill practical guidance for system design: favor sparse or hierarchical connectivity, maximize attacker-target separation, and restrict hub/shortcut pathways via topology-aware access control.

#84 Taint-Based Code Slicing for LLMs-based Malicious NPM Package Detection

著者: Dang-Khoa Nguyen, Gia-Thang Ho, Quang-Minh Pham, Tuyet A. Dang-Thi, Minh-Khanh Vu, Thanh-Cong Nguyen, Phat T. Tran-Truong, Duc-Ly Vu

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.12313

要約:
Software supply chain attacks targeting the npm ecosystem have become increasingly sophisticated, leveraging obfuscation and complex logic to evade traditional detection mechanisms. Recently, large language models (LLMs) have attracted significant attention for malicious code detection due to their strong capabilities in semantic code understanding. However, the practical deployment of LLMs in this domain is severely constrained by limited context windows and high computational costs. Naive approaches, such as token-based code splitting, often fragment semantic context, leading to degraded detection performance. To overcome these challenges, this paper introduces a novel LLM-based framework for malicious npm package detection that leverages code slicing techniques. A specialized taint-based slicing method tailored to the JavaScript ecosystem is proposed to recover malicious data flows. By isolating security-relevant logic from benign boilerplate code, the approach reduces the input code volume by over 99\% while preserving critical malicious behaviors. The framework is evaluated on a curated dataset comprising over \num{7000} malicious and benign npm packages. Experimental results using the DeepSeek-Coder-6.7B model demonstrate that the proposed approach achieves a detection accuracy of \num{87.04}\%, significantly outperforming a full-package baseline based on naive token splitting (\num{75.41}\%). These results indicate that semantically optimized input representations via code slicing not only mitigate the LLM context window bottleneck but also enhance reasoning precision for security analysis, providing an effective defense against evolving open-source software supply chain threats.

#85 Secure Digital Semantic Communications: Fundamentals, Challenges, and Opportunities

著者: Weixuan Chen, Qianqian Yang, Yuanyuan Jia, Junyu Pan, Shuo Shao, Jincheng Dai, Meixia Tao, Ping Zhang

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2512.24602

要約:
Semantic communication (SemCom) has emerged as a promising paradigm for future wireless networks by prioritizing task-relevant meaning over raw data delivery, thereby reducing communication overhead and improving efficiency. However, shifting from bit-accurate transmission to task-oriented delivery introduces new security and privacy risks. These include semantic leakage, semantic manipulation, knowledge base vulnerabilities, model-related attacks, and threats to authenticity and availability. Most existing secure SemCom studies focus on analog SemCom, where semantic features are mapped to continuous channel inputs. In contrast, digital SemCom transmits semantic information through discrete bits or symbols within practical transceiver pipelines, offering stronger compatibility with realworld systems while exposing a distinct and underexplored attack surface. In particular, digital SemCom typically represents semantic information over a finite alphabet through explicit digital modulation, following two main routes: probabilistic modulation and deterministic modulation. These discrete mechanisms and practical transmission procedures introduce additional vulnerabilities affecting bit- or symbol-level semantic information, the modulation stage, and packet-based delivery and protocol operations. Motivated by these challenges and the lack of a systematic analysis of secure digital SemCom, this paper reviews SemCom fundamentals, clarifies the architectural differences between analog and digital SemCom and their security implications, organizes the threat landscape for digital SemCom, and discusses potential defenses. Finally, we outline open research directions toward secure and deployable digital SemCom systems.

#86 OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs

著者: Xin Wang, Yunhao Chen, Juncheng Li, Yixu Wang, Yang Yao, Tianle Gu, Jie Li, Yan Teng, Yingchun Wang, Xia Hu

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.01592

要約:
The rapid integration of Multimodal Large Language Models (MLLMs) into critical applications is increasingly hindered by persistent safety vulnerabilities. However, existing red-teaming benchmarks are often fragmented, limited to single-turn text interactions, and lack the scalability required for systematic evaluation. To address this, we introduce OpenRT, a unified, modular, and high-throughput red-teaming framework designed for comprehensive MLLM safety evaluation. At its core, OpenRT architects a paradigm shift in automated red-teaming by introducing an adversarial kernel that enables modular separation across five critical dimensions: model integration, dataset management, attack strategies, judging methods, and evaluation metrics. By standardizing attack interfaces, it decouples adversarial logic from a high-throughput asynchronous runtime, enabling systematic scaling across diverse models. Our framework integrates 37 diverse attack methodologies, spanning white-box gradients, multi-modal perturbations, and sophisticated multi-agent evolutionary strategies. Through an extensive empirical study on 20 advanced models (including GPT-5.2, Claude 4.5, and Gemini 3 Pro), we expose critical safety gaps: even frontier models fail to generalize across attack paradigms, with leading models exhibiting average Attack Success Rates as high as 49.14%. Notably, our findings reveal that reasoning models do not inherently possess superior robustness against complex, multi-turn jailbreaks. By open-sourcing OpenRT, we provide a sustainable, extensible, and continuously maintained infrastructure that accelerates the development and standardization of AI safety.

#87 Memory Poisoning Attack and Defense on Memory Based LLM-Agents

backdooragent

著者: Balachandra Devarangadi Sunil, Isheeta Sinha, Piyush Maheshwari, Shantanu Todmal, Shreyan Mallik, Shuchi Mishra

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.05504

要約:
Large language model agents equipped with persistent memory are vulnerable to memory poisoning attacks, where adversaries inject malicious instructions through query only interactions that corrupt the agents long term memory and influence future responses. Recent work demonstrated that the MINJA (Memory Injection Attack) achieves over 95 % injection success rate and 70 % attack success rate under idealized conditions. However, the robustness of these attacks in realistic deployments and effective defensive mechanisms remain understudied. This work addresses these gaps through systematic empirical evaluation of memory poisoning attacks and defenses in Electronic Health Record (EHR) agents. We investigate attack robustness by varying three critical dimensions: initial memory state, number of indication prompts, and retrieval parameters. Our experiments on GPT-4o-mini, Gemini-2.0-Flash and Llama-3.1-8B-Instruct models using MIMIC-III clinical data reveal that realistic conditions with pre-existing legitimate memories dramatically reduce attack effectiveness. We then propose and evaluate two novel defense mechanisms: (1) Input/Output Moderation using composite trust scoring across multiple orthogonal signals, and (2) Memory Sanitization with trust-aware retrieval employing temporal decay and pattern-based filtering. Our defense evaluation reveals that effective memory sanitization requires careful trust threshold calibration to prevent both overly conservative rejection (blocking all entries) and insufficient filtering (missing subtle attacks), establishing important baselines for future adaptive defense mechanisms. These findings provide crucial insights for securing memory-augmented LLM agents in production environments.

#88 Continual Pretraining on Encrypted Synthetic Data for Privacy-Preserving LLMs

privacysynthetic data

著者: Honghao Liu, Xuhui Jiang, Chengjin Xu, Cehao Yang, Yiran Cheng, Lionel Ni, Jian Guo

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2601.05635

要約:
Preserving privacy in sensitive data while pretraining large language models on small, domain-specific corpora presents a significant challenge. In this work, we take an exploratory step toward privacy-preserving continual pretraining by proposing an entity-based framework that synthesizes encrypted training data to protect personally identifiable information (PII). Our approach constructs a weighted entity graph to guide data synthesis and applies deterministic encryption to PII entities, enabling LLMs to encode new knowledge through continual pretraining while granting authorized access to sensitive data through decryption keys. Our results on limited-scale datasets demonstrate that our pretrained models outperform base models and ensure PII security, while exhibiting a modest performance gap compared to models trained on unencrypted synthetic data. We further show that increasing the number of entities and leveraging graph-based synthesis improves model performance, and that encrypted models retain instruction-following capabilities with long retrieved contexts. We discuss the security implications and limitations of deterministic encryption, positioning this work as an initial investigation into the design space of encrypted data pretraining for privacy-preserving LLMs. Our code is available at https://github.com/DataArcTech/SoE.

#89 Within-Dataset Disclosure Risk for Differential Privacy

privacy

著者: Zhiru Zhu, Raul Castro Fernandez

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2310.13104

要約:
Differential privacy (DP) enables private data analysis. In a typical DP deployment, controllers manage individuals' sensitive data and are responsible for answering analysts' queries while protecting individuals' privacy. They do so by choosing the privacy parameter $\epsilon$, which controls the degree of privacy for all individuals in all possible datasets. However, it is challenging for controllers to choose $\epsilon$ because of the difficulty of interpreting the privacy implications of such a choice on the within-dataset individuals. To address this challenge, we first derive a relative disclosure risk indicator (RDR) that indicates the impact of choosing $\epsilon$ on the within-dataset individuals' disclosure risk. We then design an algorithm to find $\epsilon$ based on controllers' privacy preferences expressed as a function of the within-dataset individuals' RDRs, and an alternative algorithm that finds and releases $\epsilon$ while satisfying DP. Lastly, we propose a solution that bounds the total privacy leakage when using the algorithm to answer multiple queries without requiring controllers to set the total privacy budget. We evaluate our contributions through an IRB-approved user study that shows the RDR is useful for helping controllers choose $\epsilon$, and experimental evaluations showing our algorithms are efficient and scalable.

#90 On Membership Inference Attacks in Knowledge Distillation

privacymodel extraction

著者: Ziyao Cui, Minxing Zhang, Jian Pei

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2505.11837

要約:
Large language models (LLMs) are trained on massive corpora that may contain sensitive information, creating privacy risks under membership inference attacks (MIAs). Knowledge distillation is widely used to compress LLMs into smaller student models, but its privacy implications are poorly understood. We systematically evaluate how distillation affects MIA vulnerability across six teacher-student model pairs and six attack methods. We find that distilled student models do not consistently exhibit lower MIA success than their teacher models, and in some cases demonstrate substantially higher member-specific attack success, challenging the assumption that knowledge distillation inherently improves privacy. We attribute this to mixed supervision in distillation: for vulnerable training data points, teacher predictions often align with ground-truth labels, causing student models to learn overly confident predictions that amplify the separability between members and non-members; conversely, for non-vulnerable points, teacher predictions and ground truth frequently diverge, providing inconsistent learning signals. To mitigate this, we propose three practical interventions -- restricting distillation to non-vulnerable points, adding a low-dimensional Bottleneck Projection, and a normalization variant (NoNorm). Experiments show these methods reduce both aggregate and member-specific MIA success while preserving model utility, improving privacy-utility trade-offs for distilled LLMs.

#91 Hierarchical Secure Aggregation with Heterogeneous Security Constraints and Arbitrary User Collusion

著者: Zhou Li, Xiang Zhang, Jiawen Lv, Jihao Fan, Haiqiang Chen, Giuseppe Caire

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2507.14768

要約:
In hierarchical secure aggregation (HSA), a server communicates with clustered users through an intermediate layer of relays to compute the sum of users' inputs under two security requirements -- server security and relay security. Server security requires that the server learns nothing beyond the desired sum even when colluding with a subset of users, while relay security requires that each relay remains oblivious to the users' inputs under collusion. Existing work on HSA enforces homogeneous security where \tit{all} inputs must be protected against \tit{any} subset of potential colluding users with sizes up to a predefined threshold. Such a \homo formulation cannot capture scenarios with \tit{\het} \secty \reqs where \diff users may demand various levels of protection. In this paper, we study hierarchical secure aggregation (HSA) with heterogeneous security requirements and arbitrary user collusion. Specifically, we consider scenarios where the inputs of certain groups of users must remain information-theoretically secure against inference by the server or any relay, even if the server or any relay colludes with an arbitrary subset of other users. Under server security, the server learns nothing about these protected inputs beyond the prescribed aggregate sum, despite any such collusion. Under relay security, each relay similarly obtains no information about the protected inputs under the same collusion model. We characterize the optimal communication rates achievable across all layers for all parameter regimes. Furthermore, we study the minimum source keys required at the users to ensure security. For this source key requirement, we provide tight characterizations in two broad regimes determined by the security and collusion constraints, and establish a general information-theoretic lower bound together with a bounded-gap achievable scheme for the remaining regime.

#92 SoK: Market Microstructure for Decentralized Prediction Markets (DePMs)

著者: Nahid Rahman, Joseph Al-Chami, Jeremy Clark

公開日: Tue, 13 Jan 2026 00:00:00 -0500

リンク: https://arxiv.org/abs/2510.15612

要約:
Decentralized prediction markets (DePMs) allow open participation in event-based wagering without fully relying on centralized intermediaries. We review the history of DePMs which date back to 2011 and includes hundreds of proposals. Perhaps surprising, modern DePMs like Polymarket deviate materially from earlier designs like Truthcoin and Augur v1. We use our review to present a modular workflow comprising eight stages: underlying infrastructure, market topic, share structure and pricing, market initialization, trading, market resolution, settlement, and archiving. For each module, we enumerate the design variants, analyzing trade-offs around decentralization, expressiveness, and manipulation resistance. We also identify open problems for researchers interested in this ecosystem.

cs.CR updates on arXiv.org

📋 論文タイトル一覧