Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published Dec 1, 2025 • 94
AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning Paper • 2408.01978 • Published Aug 4, 2024 • 1
FreezeVLA: Action-Freezing Attacks against Vision-Language-Action Models Paper • 2509.19870 • Published Sep 24, 2025 • 1
TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models Paper • 2411.13136 • Published Nov 20, 2024 • 1
Safety at Scale: A Comprehensive Survey of Large Model Safety Paper • 2502.05206 • Published Feb 2, 2025 • 3
SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law Paper • 2507.18576 • Published Jul 24, 2025 • 8
Safety at Scale: A Comprehensive Survey of Large Model Safety Paper • 2502.05206 • Published Feb 2, 2025 • 3
SafeWork-R1: Coevolving Safety and Intelligence under the AI-45^{circ} Law Paper • 2507.18576 • Published Jul 24, 2025 • 8
Simulated Ensemble Attack: Transferring Jailbreaks Across Fine-tuned Vision-Language Models Paper • 2508.01741 • Published Aug 3, 2025 • 1
Simulated Ensemble Attack: Transferring Jailbreaks Across Fine-tuned Vision-Language Models Paper • 2508.01741 • Published Aug 3, 2025 • 1
Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes? Paper • 2506.14805 • Published Jun 3, 2025 • 3
Imperceptible Jailbreaking against Large Language Models Paper • 2510.05025 • Published Oct 6, 2025 • 33
Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes? Paper • 2506.14805 • Published Jun 3, 2025 • 3
Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs Paper • 2511.12710 • Published Nov 16, 2025 • 38
Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs Paper • 2511.12710 • Published Nov 16, 2025 • 38
UniREditBench: A Unified Reasoning-based Image Editing Benchmark Paper • 2511.01295 • Published Nov 3, 2025 • 38
Imperceptible Jailbreaking against Large Language Models Paper • 2510.05025 • Published Oct 6, 2025 • 33