Data Scarcity and Large Language Model Data Value Asymmetry

Journal of Information Security Reserach ›› 2023, Vol. 9 ›› Issue (7): 637-.

Previous Articles Next Articles

Data Scarcity and Large Language Model Data Value Asymmetry

Online:2023-07-01 Published:2023-07-01

数据稀缺性与大模型数据价值的非对称性

王翔1,2周辉3李志鹏4邢云5

1(全国海关信息中心北京100005)
2(海关国际贸易信息标准化应用创新实验室北京100005)
3(中国社会科学院法学研究所北京100720)
4(北京中海通科技有限公司北京100023)
5(中国电子口岸数据中心北京100088)

通讯作者: 王翔博士，高级工程师.主要研究方向为数字政府. newonemail1@163.com
作者简介:王翔博士，高级工程师.主要研究方向为数字政府. newonemail1@163.com 周辉博士，副研究员.主要研究方向为网络治理、数据隐私、智能法治. 13811511697@163.com 李志鹏硕士，高级工程师.主要研究方向为通关信息化. 15001091@qq.com 邢云高级工程师.主要研究方向为通关信息化. xingyun@chinaport.gov.cn

Abstract

Abstract: With the rapid development of the large language model (LLM) industry, due to market competition situations, LLM scale has expanded rapidly. However meanwhile on the supply side, available training datasets is relatively insufficient and increasing scarce, especially highvalue ones cannot fulfill the exponential growth on LLM computation scale on the demand side. Status quo, under stringent institutional constraints on data factor, the operation mechanism of LLM has been proved with natural monopoly characteristics. Differences among economies in data governance philosophy and international section technical environment, and algorithm discrimination all increase value asymmetry between supply and demand, impact LLM data value distribution, and strengthen LLM owners’ data monopoly. For China’ LLM industry, although it confronts a series of technical constraints in the international section, however advantages of great potential in dataset endowment, both quantity and quality, could improve contributions for data value benefits accumulations. It is necessary to strengthen the construction of selfsupporting LLM platforms, input and output value indicators, international rules, and also an emphasis on policy guidance for the future development of LLM industry.

Key words: data scarcity, data value asymmetry, data monopoly, artificial intelligence generated content (AIGC), large language model (LLM), crossborder data chain

摘要： 随着大模型产业的快速发展，出于市场竞争的需要，模型规模快速膨胀，但同时可用于训练的数据供给相对不足、未来日趋稀缺，特别是高质量数据无法满足大模型计算规模指数级增长需求.在数据制度性约束日趋严密的今天，大模型的运行机理呈现自然垄断特征，而主要经济体之间数据治理思路的差异、国际段技术条件的差异以及算法歧视等因素都在持续加大供需双方的价值非对称性，影响大模型的数据价值分配，进而强化大模型所有者的数据垄断.我国发展大模型产业尽管面临国际段一系列技术条件限制，但是拥有数据禀赋优势，无论数量还是质量均具有很大潜力.为了更好积累数据价值收益，未来需要在自主平台、评估指标、国际规则等方面加强建设，并注重对大模型产业的政策引导.

关键词: 数据稀缺性, 数据价值非对称性, 数据垄断, 智能生成(AIGC), 大模型(LLM), 跨境数据链

王翔, 周辉, 李志鹏, 邢云, . 数据稀缺性与大模型数据价值的非对称性[J]. 信息安全研究, 2023, 9(7): 637-.

References

［1］Riccaboni M, Rossi A, Schiavo S. Global networks of trade and bits［J］. Journal of Economic Interaction and Coordination, 2013, 8(1): 3356［2］Altman S. Moore’s law for everything［JOL］. ［20230510］. https:moores.samaltman.com［3］Hou L, Pan X, Liu K, et al.Information cocoons in online navigation［J］. iScience, 2023, 26(1): 105893［4］Villalobos P, Sevilla J, Heim L, et al. Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning［J］. arXiv preprint, arXiv: 2211.04325, 2022［5］Bickley S J, Chan H F, Torgler B. Artificial intelligence in the field of economics［J］. Scientometrics, 2022, 127(4): 20552084［6］王翔, 高芸, 蔡军霞. 索洛增长模型分析数据要素对总产出的影响［J］. 国际商务财会, 2021 (16): 37［7］ITU. World telecommunicationICT indicators database 2021—International bandwidth, in Mbits［DBOL］. ［20230510］. https:www.itu.intenITUDStatisticsPagespublicationswtid.aspx［8］UNCTAD. G20 members’ regulations of crossborder data flows［R］. Geneva: UNCTAD, 2023: 110［9］UNCTAD. Digital economy report 2021—Crossborder data flows and development: For whom the data flow［R］. New York: United Nations, 2021:3944, 7879, 138143［10］Ogaki M. Economics of the community mechanism［J］. The Japanese Economic Review, 2022, 73(3): 433457［11］Burnham T C, Phelan J. Biological welfare economics: a natural science critique of normative economics［J］. Journal of Bioeconomics, 2023, 25(1): 133［12］Kemp S. Digital 2023: Global overview report［JOL］. ［20230510］. https:datareportal.comreportsdigital2023globaloverviewreport［13］Safadi H, Watson R T. Knowledge monopolies and the innovation divide: A governance perspective［J］. Information and Organization, 2023, 33(2): 100466［14］Hagel J, Brown J S, Wooll M, et al. The paradox of flows: Can hope flow from fear［R］. London: Deloitte University Press, 2016: 1213［15］WIPO. International patent applications defy challenges, continue upward trend Annex 1: International patent applications by origin Annex 2: Top PCT applicants［R］. Geneva: WIPO, 2023: 12

Data Scarcity and Large Language Model Data Value Asymmetry

数据稀缺性与大模型数据价值的非对称性

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 1

Recommended Articles

Metrics