[1]Xu Z H, Liu Y, Deng G L, et al. LLM jailbreak attack versus defense techniquesa comprehensive study[EBOL]. 2024 [20250816]. https:arxiv.orgpdf2402.13457v1[2]李南, 丁益东, 江浩宇, 等. 面向大语言模型的越狱攻击综述[J]. 计算机研究与发展, 2024, 61(5): 11561181[3]Li X R, Wang R C, Cheng M H, et al. Drattack: Prompt decomposition and reconstruction makes powerful llm jailbreakers [EBOL]. 2024 [20250816]. https:arxiv.orgpdf2402.16914[4]Chang Z Y, Li M Y, Liu Y, et al. Play guessing game with llm: Indirect jailbreak attack with implicit clues[EBOL]. 2024 [20250816]. https:arxiv.orgpdf2402.09091[5]Zhang T R, Cao B C, Cao Y P, et al. Wordgame: Efficient & effective LLM jailbreak via simultaneous obfuscation in query and response[EBOL]. 2024 [20250816]. https:arxiv.orgpdf2405.14023[6]Zhou Y K, Huang Z J, Lu F Y, et al. Don’t say no: Jailbreaking LLM by suppressing refusal[EBOL]. 2024 [20250816]. https:arxiv.orgpdf2404.16369[7]Chao P, Robey A, Dobriban E, et al. Jailbreaking black box largelanguagemodels in twenty queries[EBOL]. 2023 [20250816]. https:arxiv.orgpdf2310.08419[8]Yu, JH, Lin X W, Yu Z, et al. Gptfuzzer: Red teaming large language models with autogenerated jailbreak prompts[EBOL]. 2023 [20250816]. https:arxiv.orgpdf2309.10253v3[9]Zou A, Wang Z F, Carlini N, et al. Universal and transferable adversarial attacks on aligned language models[EBOL]. 2023 [20250816]. https:www.cnblogs.comLittleHannp17590737.html[10]Wu T Y,Xue Z W,Liu Y, et al. Geneshift: Impact of different scenario shift on Jailbreaking LLM[EBOL]. 2025 [20250816]. https:arxiv.orgpdf2504.08104[11]Speer R, Chin J, Havasi C. ConceptNet 5.5: An open multilingual graph of general knowledge[C] Proc of the 31st AAAI Conf on Artificial Intelligence. New York: ACM, 2017: 44444451[12]Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[EBOL]. 2017 [20250816]. https:arxiv.orgpdf1707.06347 |