[1]Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7): 15271554[2]Sanders J, Kandrot E. GPU高性能编程CUDA实战[M]. 聂雪军, 译. 北京: 机械工业出版社, 2011[3]沈传年, 徐彦婷, 陈滢霞. 隐私计算关键技术及研究展望[J]. 信息安全研究, 2023, 9(8): 714721[4]Yao A C. Protocols for secure computations[C] Proc of the 23rd Annual Symp on Foundations of Computer Science (SFCS 1982). Piscataway, NJ: IEEE, 1982: 160164[5]Beaver D. Efficient multiparty protocols using circuit randomization[C] Advances in Cryptology—CRYPTO’91: Proceedings 11. Berlin: Springer, 1992: 420432[6]Zhang Feng, Chen Zheng, Zhang Chenyang, et al. An efficient parallel secure machine learning framework on GPUs[J]. IEEE Trans on Parallel and Distributed Systems, 2021, 32(9): 22622276[7]Rivest R L, Adleman L, Dertouzos M L. On data banks and privacy homomorphisms[J]. Foundations of Secure Computation, 1978, 4(11): 169180[8]Fan J, Vercauteren F. Somewhat practical fully homomorphic encryption[J]. Cryptology ePrint Archive, 2012: 144[9]Brakerski Z, Gentry C, Vaikuntanathan V. (Leveled) fully homomorphic encryption without bootstrapping[J]. ACM Trans on Computation Theory, 2014, 6(3): 136[10]Cheon J H, Kim A, Kim M, et al. Homomorphic encryption for arithmetic of approximate numbers[C] Advances in Cryptology—ASIACRYPT 2017. Berlin: Springer, 2017: 409437[11]边松, 毛苒, 朱永清, 等. 全同态加密软硬件加速研究进展[J]. 电子与信息学报, 2024, 46(5): 116[12]卢凯, 赖志权, 李笙维, 等. 并行智能训练技术: 挑战与发展[J]. 中国科学: 信息科学, 2023, 53(8): 14411468[13]杜海舟, 黄晟. 分布式机器学习中的通信机制研究综述[J]. 上海电力大学学报, 2021, 37(5): 496500, 511[14]Chen Tianqi, Xu Bing, Zhang Chiyuan, et al. Training deep nets with sublinear memory cost[J]. arXiv preprint, arXiv: 1604.06174, 2016[15]马玮良, 彭轩, 熊倩, 等. 深度学习中的内存管理问题研究综述[J]. 大数据, 2020, 6(4): 5668[16]Korthikanti V A, Casper J, Lym S, et al. Reducing activation recomputation in large transformer models[J]. arXiv preprint, arXiv: 2205.05198, 2022[17]Guo Jinrong, Liu Wantao, Wang Wang, et al. AccUDNN: A GPU memory efficient accelerator for training ultradeep neural networks[C] Proc of the 37th IEEE Int Conf on Computer Design (ICCD). Piscataway, NJ : IEEE, 2019: 6572[18]Shi Shaohuai, Wang Qiang, Chu Xiaowen, et al. Communicationefficient distributed deep learning with merged gradient sparsification on GPUs[C] Proc of IEEE Conf on Computer Communications(IEEE INFOCOM 2020). Piscataway, NJ: IEEE, 2020: 406415[19]Tan S, Knott B, Tian Y, et al. CryptGPU: Fast privacypreserving machine learning on the GPU[C] Proc of 2021 IEEE Symp on Security and Privacy (SP). Piscataway, NJ: IEEE, 2021: 10211038[20]Watson J L, Wagh S, Popa R A. Piranha: A {GPU} platform for secure computation[C] Proc of the 31st USENIX Security Symp (USENIX Security 22). Berkeley, CA: USENIX Association, 2022: 827844[21]Jiang Wuxuan, Song Xiangjun, Hong Shenbai, et al. Spin: An efficient secure computation framework with GPU acceleration[J]. arXiv preprint, arXiv: 2402.02320, 2024[22]Thakkar V, Ramani P, Cecka C, et al. CUTLASS[CPOL]. [20240525]. https:github.comNVIDIAcutlass[23]Shi R, Potluri S, Hamidouche K, et al. Designing efficient small message transfer mechanism for internode MPI communication on InfiniBand GPU clusters[C] Proc of the 21st Int Conf on High Performance Computing (HiPC). Piscataway, NJ: IEEE, 2014: 110[24]Bell N, Garland M. Efficient sparse matrixvector multiplication on CUDA, NVR2008004[R]. Nvidia Technical Report, Santa Clara: Nvidia Corporation, 2008[25]Fan Shengyu, Wang Zhiwei, Xu Weizhi, et al. Tensorfhe: Achieving practical computation on encrypted data using gpgpu[C] Proc of 2023 IEEE Int Symp on HighPerformance Computer Architecture (HPCA). Piscataway, NJ: IEEE, 2023: 922934[26]Wang Zhiwei, Li Peinan, Hou Rui, et al. HEBooster: An efficient polynomial arithmetic acceleration on GPUs for fully homomorphic encryption[J]. IEEE Trans on Parallel and Distributed Systems, 2023, 34(4): 10671081[27]Wang Guibin, Lin Yisong, Yi Wei. Kernel fusion: An effective method for better power efficiency on multithreaded GPU[C] Proc of 2010 IEEEACM Int Conf on Green Computing and Communications & Int Conf on Cyber, Physical and Social Computing. Piscataway, NJ: IEEE, 2010: 344350[28]Al Badawi A, Jin C, Lin J, et al. Towards the alexnet moment for homomorphic encryption: Hcnn, the first homomorphic cnn on encrypted data with GPUs[J]. IEEE Trans on Emerging Topics in Computing, 2020, 9(3): 13301343[29]Deng Li. The mnist database of handwritten digit images for machine learning research[best of the web][J]. IEEE Signal Processing Magazine, 2012, 29(6): 141142[30]Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images[R]. Toronto: University of Toronto, 2000[31]Al Badawi A, Veeravalli B, Lin J, et al. MultiGPU design and performance evaluation of homomorphic encryption on GPU clusters[J]. IEEE Trans on Parallel and Distributed Systems, 2020, 32(2): 379391
|