合成数据的隐私风险、监管困境与完善进路

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (1207 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要随着人工智能等数据驱动型技术的飞速发展,真实数据稀缺的问题日益严峻。各国不断强化隐私监管,使真实数据的供给不足问题更进一步加剧。在此背景下,具有虚拟性、拟合性的合成数据广泛地被认为是极具前景的解决方案。然而,合成数据并不能够完全消除隐私风险。对此,当前的隐私保护理论与实践缺乏足够关注,以至于在隐私监管上,合成数据面临监管定位不清、再识别责任不明等诸多挑战。为了充分发挥合成数据的实践效用,应从明确隐私监管定位、推动技术标准制定、确保全过程监管3个方面着手,完善对于合成数据的隐私监管对策。支持合成数据创新应用的同时,有效保障个人隐私权益。进而,助力数据要素的价值得到更好发挥。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	吴宗泽
	任柏玉

关键词 ：合成数据, 隐私风险, 个人信息保护法, 匿名化数据, 数据去标识化

Abstract：With the rapid development of data-driven technologies such as artificial intelligence,the scarcity of real-world data has become an increasingly severe problem.The continuously strengthened privacy regulations in various countries have further exacerbated the insufficient supply of real-world data.Against this background,synthetic data,which is virtual and can be fitted to real-world data,is widely regarded as a promising solution.However,synthetic data cannot completely eliminate privacy risks.This aspect has received insufficient attention in current privacy protection theories and practices.As a result,in terms of privacy regulation,synthetic data faces many challenges,such as unclear regulatory positioning and ambiguous re-identification responsibilities.In order to fully realize the practical utility of synthetic data,it is necessary to improve the privacy regulatory measures for synthetic data from three aspects:clarifying the privacy regulatory positioning,promoting the formulation of technical standards,and ensuring full-process supervision.While supporting the innovative application of synthetic data,it is also essential to effectively protect personal privacy rights and interests.Thus,it can help to better realize the value of data elements.

Key words： Synthetic data Privacy risk Personal Information Protection Law Anonymized data Data de-identification

收稿日期: 2025-02-20

PACS:

D912

基金资助:科技创新2030-“新一代人工智能” 重大项目“新一代人工智能风险防范与治理手段研究” (2023ZD0121700),国家数据局2025年重大课题“人工智能对经济社会发展的作用机理、安全风险和治理模式研究” (SJ-kj2025003)。

通讯作者: 任柏玉

作者简介: 吴宗泽 (1997—),男,福建三明人,博士后、助理研究员,研究方向为数据、人工智能治理。

引用本文:

吴宗泽, 任柏玉. 合成数据的隐私风险、监管困境与完善进路[J]. 中国科技论坛, 2025(8): 136-143.
Wu Zongze, Ren Baiyu. Privacy Risks,Regulatory Dilemmas and Improvement Approaches of Synthetic Data. , 2025(8): 136-143.

链接本文:

http://www.zgkjlt.org.cn/CN/Y2025/I8/136

[1]VILLALOBOS P,HO A,SEVILLA J,et al.Will we run out of data? Limits of LLM scaling based on human-generated data[EB/OL]. (2024-06-06)[2025-07-02].https://epoch.ai/blog/will-we-run-out-of-data-limits-of-llm-scaling-based-on-human-generated-data.
[2]Financial Conduct Authority.Report:Using synthetic data in financial services[EB/OL]. (2024-03-07)[2025-01-03].https://www.fca.org.uk/publications/corporate-documents/report-using-synthetic-data-financial-services.
[3]Utah State Legielature.Artificial Intelligence Amendments[EB/OL]. (2024-03-13)[2024-10-23].https://le.utah.gov/%7E2024/bills/static/SB0149.html.
[4]EMAM K E,MOSQUERA L,HOPTROFF R.Practical Synthetic Data Generation[M].Sebastopol,CA,USA:O'Reilly Media,2020.
[5]RUIZ N,MURALIDHAR K,DOMINGO-FERRER J.On the privacy guarantees of synthetic data:A reassessment from the maximum-knowledge attacker perspective[C]//DOMINGO-FERRER J,MONTES F.Privacy in statistical databases.Cham:Springer International Publishing,2018:59-74.
[6]GAL M S,LYNSKEY O.Synthetic data:Legal implications of the data-generation revolution[J].IOWA Law Review,2023,109:1087-1156.
[7]HAENDEL M A,CHUTE C G,BENNETT T D,et al.The national COVID cohort collaborative (N3C):Rationale,design,infrastructure,and deployment[J].Journal of the American Medical Informatics Association,2021,28 (3):427-443.
[8]ROSE L T,FISCHER K W.Garbage in,garbage out:Having useful data is everything[J].Measurement:Interdisciplinary Research and Perspectives,2011,9 (4):222-226.
[9]KILKENNY M F,ROBINSON K M.Data quality: “Garbage in-garbage out” [J].Health Information Management Journal,2018,47 (3):103-105.
[10]TANAKA F,ARANHA C.Data augmentation using GANs[EB/OL]. (2019-04-19)[2024-11-16].http://arxiv.org/abs/1904.09135.
[11]ROCHER L,HENDRICKX J M,DE MONTJOYE Y A.Estimating the success of re-identifications in incomplete datasets using generative models[J].Nature Communications,2019,10 (1):3069.
[12]SHOKRI R,STRONATI M,SONG C,et al.Membership inference attacks against machine learning models[C]//2017 IEEE Symposium on Security and Privacy (SP).San Jose,CA,USA:IEEE,2017.
[13]YEOM S,GIACOMELLI I,FREDRIKSON M,et al.Privacy risk in machine learning:Analyzing the connection to overfitting[C]//2018 IEEE 31st Computer Security Foundations Symposium (CSF).Oxford,United Kingdom:IEEE,2018.
[14]FREDRIKSON M,LANTZ E,JHA S,et al.Privacy in pharmacogenetics:An end-to-end case study of personalized warfarin dosing[C]//Proceedings of the 23rd USENIX Security Symposium.San Diego,CA,USA:USENIX Association,2014.
[15]COHEN A,NISSIM K.Towards formalizing the GDPR's notion of singling Out[J].National Academy of Sciences,2020,117 (15):8344-8352.
[16]丁晓东.论个人信息概念的不确定性及其法律应对[J].比较法研究,2022 (5):46-60.
[17]王利明,程啸.中国民法典释评·人格权编[M].北京:中国人民大学出版社,2020.
[18]程啸.个人信息范围的界定与要件判断[J].武汉大学学报 (哲学社会科学版),2024,77 (4):128-140.
[19]FREDRIKSON M,JHA S,RISTENPART T.Model inversion attacks that exploit confidence information and basic countermeasures[C]]//Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security.New York,USA:Association for Computing Machinery,2015.
[20]STALLA-BOURDILLON S,KNIGHT A.Anonymous data v.personal data — A false debate:An EU perspective on anonymization,pseudonymization and personal data[J].Wisconsin International Law Journal,2017,34 (2):285-321.
[21]齐爱民,张哲.识别与再识别:个人信息的概念界定与立法选择[J].重庆大学学报 (社会科学版),2018,24 (2):119-131.
[22]许可.复活僵尸法条:个人信息匿名化制度的再造[J].财经法学,2024 (4):160-177.
[23]MITTELSTADT B.From individual to group privacy in big data analytics[J].Philosophy & Technology,2017,30 (4):475-494.
[24]OHM P.Broken promises of privacy:Responding to the surprising failure of anonymization[J].UCLA Law Review,2010,57:1701-1768.
[25]胡凌.功能视角下个人信息的公共性及其实现[J].法制与社会发展,2021,27 (5):176-189.
[26]吴剑锋,陶文强.消费者人脸识别支付技术使用意愿的影响因素分析[J].浙江学刊,2020 (6):59-67.
[27]赵精武.个人信息匿名化的理论基础与制度建构[J].中外法学,2024,36 (2):326-345.
[28]丁晓东.公开个人信息法律保护的中国方案[J].法学,2024 (3):3-16.
[29]RICHARDS N,HARTZOG W.The pathologies of digital consent[J].Washington University Law Review,2019,96 (6):1461-1503.