[1]. Ali F, Mohsen H, Mohammad A.S, et al. Every picture tells a story: Generating sentences from images [C]//European conf on computer vision. Berlin: Springer, 2010: 15–29
[2]. Vicente O, Han Xufeng, Polina K, et al. Large scale retrieval and generation of image descriptions [J]. Int Journal of Computer Vision, 2016, 119(1):46–59
[3]. Micah H, Peter Y, and Julia H. Framing image description as a ranking task: Data, models and evaluation metrics [J]. Journal of Artificial Intelligence Research, 2013, 47:853–899
[4]. Rebecca M and Eugene C. Nonparametric method for data driven image captioning [C]//Proc of the 52nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg PA: ACL, 2014: 592–598
[5]. Ankush G, Yashaswi V, and CV J. Choosing linguistics over vision to describe images [C]//In 26th AAAI Conf Artificial Intelligence. AAAI press: California, 2012: 606-612
[6]. Yang Yezhou Yang, Ching L.T, Hal D, et al. Corpus-guided sentence generation of natural images [C]//Proc of the Conf on Empirical Methods in Natural Language Processing. Stroudsburg PA: ACL, 2011: 444–454.
[7]. Girish K, Visruth P, Vicente O, et al. Baby talk: Understanding and generating simple image descriptions [J]. IEEE Trans on Pattern Analysis and Machine Intelligence, 2013, 35(12):2891–2903
[8]. Margaret M, Han Xufeng, Jesse D, et al. Midge: Generating image descriptions from computer vision detections [C]//Proc of the 13th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg PA: ACL, 2017: 747–756
[9]. Yoshitaka U, Masataka Y, Yusuke M, et al. Common subspace for model and similarity: Phrase learning for caption generation from images [C]//Proc of the IEEE Int Conf on Computer Vision. NJ: IEEE, 2015: 2668–2676
[10]. Oriol V, Alexander T, Samy B, et al. Show and tell: A neural image caption generator [C]// Proc of the IEEE conf on computer vision and pattern recognition. NJ: IEEE, 2015: 3156–3164
[11]. Jeffrey D, Lisa A.H, Sergio G, et al. Long-term recurrent convolutional networks for visual recognition and description [C]//Proc of the IEEE Conf on Computer Vision and Pattern Recognition. NJ: IEEE, 2015: 2625-2634
[12]. Jia Xu, Efstratios G, Basura F, et al. Guiding the long-short term memory model for image caption generation [C]//Proc of the IEEE Int Conf on Computer Vision. NJ: IEEE, 2015: 2407–2415
[13]. Wu Qi, Shen Chunhua, Liu Lingqiao, et al. What value do explicit high level concepts have in vision to language problems? [C]//Proc of the IEEE conf on computer vision and pattern recognition. NJ: IEEE, 2016: 203–212
[14]. Kelvin X, Jimmy B, Ryan K, et al. Show, attend and tell: Neural image caption generation with visual attention [C]//Int conf on machine learning. CA: IMLS, 2015: 2048–2057
[15]. Li Linghui, Tang Sheng, Zhang Yongdong, et al. Gla: Global–local attention for image description [J]. IEEE Trans on Multimedia, 2017, 20(3):726–737
[16]. You Quanzeng, Jin Hailin, Wang Zhaowen, et al. Image captioning with semantic attention [C]//Proc of the IEEE conf on computer vision and pattern recognition. NJ:IEEE, 2016: 4651–4659
[17]. Lu Jiasen, Xiong Caiming, Devi P, et al. Knowing when to look: Adaptive attention via a visual sentinel for image captioning [C]//Proc of the IEEE conf on computer vision and pattern recognition. NJ:IEE, 2017: 375–383
|