图像内容自动描述技术综述

信息安全研究 ›› 2019, Vol. 5 ›› Issue (11): 988-992.

• 内容安全与人工智能专题 • 上一篇下一篇

图像内容自动描述技术综述

邓旭冉¹,李灵慧²,唐胜²,张勇东²

1. 中国科学技术大学
2. 中国科学院计算技术研究所

收稿日期:2019-11-08 出版日期:2019-11-15 发布日期:2019-11-20
通讯作者: 李灵慧
作者简介:邓旭冉博士研究生，主要研究方向为多媒体内容分析与安全. kjj3chu@sina.com 李灵慧博士研究生，主要研究方向为多媒体内容理解. lilinghui@ict.ac.cn 唐胜副研究员，博士生导师，主要研究方向为多媒体内容理解. ts@ict.ac.cn 张勇东研究员，博士生导师，主要研究方向为多媒体内容理解. zhyd@ict.ac.cn

A Survey of Image Captioning Technology

Received:2019-11-08 Online:2019-11-15 Published:2019-11-20

摘要/Abstract

摘要： 图像内容自动描述是计算机视觉和自然语言处理领域的一个重要任务，在生活娱乐、智慧交通以及帮助视觉障碍者理解视觉内容等领域有着广泛而重要的应用价值.相比于图像分类和目标检测等感知任务，图像内容自动描述是一种更高级别、更复杂的认知任务，对帮助分析和理解图像有着重要的意义.旨在对现有的图像自动描述技术进行全面的综述.讨论图像内容自动描述中常用的数据集和评价指标，以及现有图像自动描述技术的性能、优点和局限性.

关键词: 图像内容描述, 卷积神经网络, 循环神经网络, 注意力机制, 深度学习

Abstract: Image captioning is an important task in the field of computer vision and natural language processing. It has a wide and important application value in our life and entertainment, intelligent transportation and helping people with visual impairment. Compared with other perception tasks such as image classification and object detection, image captioning is a higher level and more complex cognitive task, which has a great significance to help analyze and understand images. In this paper, we aim to give a comprehensive overview of the existing image captioning techniques. Here we discuss the data sets and evaluation metrics commonly used in image captioning, as well as the performances, advantages and limitations of existing image captioning techniques.

Key words: image caption, CNN, RNN, attention, deep learning

邓旭冉李灵慧唐胜张勇东. 图像内容自动描述技术综述[J]. 信息安全研究, 2019, 5(11): 988-992.

参考文献

[1]. Ali F, Mohsen H, Mohammad A.S, et al. Every picture tells a story: Generating sentences from images [C]//European conf on computer vision. Berlin: Springer, 2010: 15–29 [2]. Vicente O, Han Xufeng, Polina K, et al. Large scale retrieval and generation of image descriptions [J]. Int Journal of Computer Vision, 2016, 119(1):46–59 [3]. Micah H, Peter Y, and Julia H. Framing image description as a ranking task: Data, models and evaluation metrics [J]. Journal of Artificial Intelligence Research, 2013, 47:853–899 [4]. Rebecca M and Eugene C. Nonparametric method for data driven image captioning [C]//Proc of the 52nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg PA: ACL, 2014: 592–598 [5]. Ankush G, Yashaswi V, and CV J. Choosing linguistics over vision to describe images [C]//In 26th AAAI Conf Artificial Intelligence. AAAI press: California, 2012: 606-612 [6]. Yang Yezhou Yang, Ching L.T, Hal D, et al. Corpus-guided sentence generation of natural images [C]//Proc of the Conf on Empirical Methods in Natural Language Processing. Stroudsburg PA: ACL, 2011: 444–454. [7]. Girish K, Visruth P, Vicente O, et al. Baby talk: Understanding and generating simple image descriptions [J]. IEEE Trans on Pattern Analysis and Machine Intelligence, 2013, 35(12):2891–2903 [8]. Margaret M, Han Xufeng, Jesse D, et al. Midge: Generating image descriptions from computer vision detections [C]//Proc of the 13th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg PA: ACL, 2017: 747–756 [9]. Yoshitaka U, Masataka Y, Yusuke M, et al. Common subspace for model and similarity: Phrase learning for caption generation from images [C]//Proc of the IEEE Int Conf on Computer Vision. NJ: IEEE, 2015: 2668–2676 [10]. Oriol V, Alexander T, Samy B, et al. Show and tell: A neural image caption generator [C]// Proc of the IEEE conf on computer vision and pattern recognition. NJ: IEEE, 2015: 3156–3164 [11]. Jeffrey D, Lisa A.H, Sergio G, et al. Long-term recurrent convolutional networks for visual recognition and description [C]//Proc of the IEEE Conf on Computer Vision and Pattern Recognition. NJ: IEEE, 2015: 2625-2634 [12]. Jia Xu, Efstratios G, Basura F, et al. Guiding the long-short term memory model for image caption generation [C]//Proc of the IEEE Int Conf on Computer Vision. NJ: IEEE, 2015: 2407–2415 [13]. Wu Qi, Shen Chunhua, Liu Lingqiao, et al. What value do explicit high level concepts have in vision to language problems? [C]//Proc of the IEEE conf on computer vision and pattern recognition. NJ: IEEE, 2016: 203–212 [14]. Kelvin X, Jimmy B, Ryan K, et al. Show, attend and tell: Neural image caption generation with visual attention [C]//Int conf on machine learning. CA: IMLS, 2015: 2048–2057 [15]. Li Linghui, Tang Sheng, Zhang Yongdong, et al. Gla: Global–local attention for image description [J]. IEEE Trans on Multimedia, 2017, 20(3):726–737 [16]. You Quanzeng, Jin Hailin, Wang Zhaowen, et al. Image captioning with semantic attention [C]//Proc of the IEEE conf on computer vision and pattern recognition. NJ:IEEE, 2016: 4651–4659 [17]. Lu Jiasen, Xiong Caiming, Devi P, et al. Knowing when to look: Adaptive attention via a visual sentinel for image captioning [C]//Proc of the IEEE conf on computer vision and pattern recognition. NJ:IEE, 2017: 375–383

图像内容自动描述技术综述

A Survey of Image Captioning Technology

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	冯科阮树骅陈兴蜀王海舟王文贤蒋术语. 基于联合模型的网络舆情事件检测方法 [J]. 信息安全研究, 2021, 7(3): 207-214.
[2]	范晓霞周安民郑荣锋李孟铭. 基于深度学习的暗网市场命名实体识别研究[J]. 信息安全研究, 2021, 7(1): 37-43.
[3]	肖喜生彭凯飞龙春魏金侠赵静. 基于人工智能的安全态势预测技术研究综述[J]. 信息安全研究, 2020, 6(6): 0-0.
[4]	王兴凤黄琨茗张文杰. 基于API序列和卷积神经网络的恶意代码检测[J]. 信息安全研究, 2020, 6(3): 212-219.
[5]	刘思琴冯胥睿瑞. 基于BERT的文本情感分析[J]. 信息安全研究, 2020, 6(3): 220-227.
[6]	杨频潘岳镭贾鹏刘亮. 基于汇编指令词向量特征的恶意软件检测研究[J]. 信息安全研究, 2020, 6(2): 113-121.
[7]	雷惊鹏. 基于云计算和深度学习的协议监测系统设计[J]. 信息安全研究, 2020, 6(12): 1127-1132.
[8]	周琳娜吕欣一. 基于GAN图像生成的信息隐藏技术综述[J]. 信息安全研究, 2019, 5(9): 771-777.
[9]	吕佩吾葛雅川李楠周彦飞. 基于卷积神经网络的工控协议Modbus TCP 异常检测[J]. 信息安全研究, 2019, 5(7): 635-638.
[10]	李创丰李云龙孙伟. 基于CNN和朴素贝叶斯方法的安卓恶意应用检测算法[J]. 信息安全研究, 2019, 5(6): 470-476.
[11]	李敏杨阳王钤孟博李凌寒白入文杜虹. 基于智能视频分析的人流量态势感知方法研究[J]. 信息安全研究, 2019, 5(6): 488-494.
[12]	周琳娜曹洋. 从传统到深度学习的图像隐写技术研究[J]. 信息安全研究, 2019, 5(3): 230-235.
[13]	陈华钧耿玉霞叶志权邓淑敏. “知识图谱+深度学习”赋能内容安全[J]. 信息安全研究, 2019, 5(11): 975-980.
[14]	罗观柱赵妍妍秦兵刘挺. 智能人机对话中的负面情感检测[J]. 信息安全研究, 2019, 5(11): 981-987.
[15]	郗桐金昊徐根炜周金岭. 基于卷积神经网络的Android恶意应用检测方法[J]. 信息安全研究, 2018, 4(8): 715-721.