[1]Villalobos P, Sevilla J, Heim L, et al. Will we run out of data? An analysis of the limits of scaling datasets in machine learning[EBOL]. (20221026) [20240512]. https:arxiv.orgabs2211. 04325[2]杨强. AI与数据隐私保护: 联邦学习的破解之道[J]. 信息安全研究, 2019, 5(11): 961965[3]McMahan B, Moore E, Ramage D, et al. Communicationefficient learning of deep networks from decentralized data[C] Proc of the 20th Int Conf on Artificial Intelligence and Statistics (AISTATS). New York: PMLR, 2017: 12731282[4]Xu M, Song C, Tian Y, et al. Training largevocabulary neural language models by private federated learning for resourceconstrained devices[C] Proc of 2023 IEEE Int Conf on Acoustics, Speech, and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2023: 15[5]刘晓迁, 许飞, 马卓, 等. 联邦学习中的隐私保护技术研究[J]. 信息安全研究, 2024, 10(3): 194201[6]Li H, Xu M, Song Y. Sentence embedding leaks more information than you expect: Generative embedding inversion attack to recover the whole sentence[C] Proc of the Association for Computational Linguistics (ACL). Stroudsburg, PA: ACL, 2023: 1402214040[7]Gu K, Kabir E, Ramsurrun N, et al. Towards sentence level inference attack against pretrained language models[J]. Proceedings on Privacy Enhancing Technologies, 2023, 2023(3): 6278[8]Pan X, Zhang M, Ji S, et al. Privacy risks of generalpurpose language models[C] Proc of 2020 IEEE Symp on Security and Privacy (SP). Piscataway, NJ: IEEE, 2020: 13141331[9]Song C, Raghunathan A. Information leakage in embedding models[C] Proc of the 2020 ACM SIGSAC Conf on Computer and Communications Security (CCS). New York: ACM, 2020: 377390[10]Morris J X, Kuleshov V, Shmatikov V, et al. Text embeddings reveal (almost) as much as text[C] Proc of the 2023 Conf on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA: ACL, 2023: 1244812460[11]Balunovic M, Dimitrov D, Jovanovi N, et al. LAMP: Extracting text from gradients with language model priors[C] Proc of the 36th Conf on Neural Information Processing Systems (NeurIPS). Cambridge: MIT Press, 2022: 76417654[12]Gupta S, Huang Y, Zhong Z, et al. Recovering private text in federated learning of language models[C] Proc of the 36th Conf on Neural Information Processing Systems (NeurIPS). Cambridge: MIT Press, 2022: 81308143[13]Abadi M, Chu A, Goodfellow I, et al. Deep learning with differential privacy[C] Proc of the 2016 ACM SIGSAC Conf on Computer and Communications Security (CCS). New York: ACM, 2016: 308318[14]Yu D, Naik S,Backurs A, et al. Differentially private finetuning of language models[EBOL]. (20211013)[20240512]. https:arxiv.orgabs2110.06500[15]Qu C, Kong W, Yang L, et al. Natural language understanding with privacypreserving BERT[C] Proc of the 30th ACM Int Conf on Information & Knowledge Management (CIKM). New York: ACM, 2021: 14881497[16]Shi W, Shea R, Chen S, et al. Just finetune twice: Selective differential privacy for large language models[C] Proc of the 2022 Conf on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA: ACL, 2022: 63276340[17]Thakur N, Reimers N,Rücklé A, et al. BEIR: A heterogeneous benchmark for zeroshot evaluation of information retrieval models[EBOL]. (20210417) [20240512]. https:arxiv.org abs 2104.08663[18]Kwiatkowski T, Palomaki J, Redfield O, et al. Natural questions: A benchmark for question answering research[J]. Trans of the Association for Computational Linguistics, 2019, 7: 453466[19]Zhang Z, Yang Y, Dai Y, et al. FedPETuning: When federated learning meets the parameterefficient tuning methods of pretrained language models[C] Proc of the Association for Computational Linguistics (ACL). Stroudsburg, PA: ACL, 2023: 99639977[20]Wang A, Singh A, Michael J, et al. GLUE: A multitask benchmark and analysis platform for natural language understanding[C] Proc of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Stroudsburg, PA: ACL, 2018: 353355[21]Novikova J, Duek O, Rieser V. The E2E Dataset: New challenges for endtoend generation[C] Proc of the 18th Annual Meeting on Discourse and Dialogue. Stroudsburg, PA: ACL, 2017: 201206 |