Caetano, C. orcid.org/0000-0002-1546-3740, Santos, G.O.D. orcid.org/0000-0003-2835-1331, Petrucci, C. orcid.org/0000-0001-9881-208X et al. (6 more authors) (2025) Neglected risks: The disturbing reality of children’s images in datasets and the urgent call for accountability. In: FAccT '25: Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency. FAccT '25: The 2025 ACM Conference on Fairness, Accountability, and Transparency, 23-26 Jun 2025, Athens, Greece. ACM , pp. 2542-2553. ISBN 9798400714825
Abstract
Including children’s images in datasets has raised ethical concerns, particularly regarding privacy, consent, data protection, and accountability. These datasets, often built by scraping publicly available images from the Internet, can expose children to risks such as exploitation, profiling, and tracking. Despite the growing recognition of these issues, approaches for addressing them remain limited. We explore the ethical implications of using children’s images in AI datasets and propose a pipeline to detect and remove such images. As a use case, we built the pipeline on a Vision-Language Model under the Visual Question Answering task and tested it on the #PraCegoVer dataset. We also evaluate the pipeline on a subset of 100,000 images from the Open Images V7 dataset to assess its effectiveness in detecting and removing images of children. The pipeline serves as a baseline for future research, providing a starting point for more comprehensive tools and methodologies. While we leverage existing models trained on potentially problematic data, our goal is to expose and address this issue. We do not advocate for training or deploying such models, but instead call for urgent community reflection and action to protect children’s rights. Ultimately, we aim to encourage the research community to exercise — more than an additional — care in creating new datasets and to inspire the development of tools to protect the fundamental rights of vulnerable groups, particularly children.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2025 The Authors. Except as otherwise noted, this author-accepted version of a paper published in FAccT '25: Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ |
Keywords: | Children Rights; Human Rights; Vision-Language Models; Visual Question Answering |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 04 Jul 2025 11:00 |
Last Modified: | 04 Jul 2025 11:00 |
Status: | Published |
Publisher: | ACM |
Refereed: | Yes |
Identification Number: | 10.1145/3715275.3732166 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:228741 |