How AIs were used to generate images from child pornography

show index

Artificial intelligence at the heart of a child pornography controversy
The Unexpected Use of AI Databases
The controversial origin of the data
Ineffective filtration attempts
Implications for users
Summary Table

Artificial intelligence at the heart of a child pornography controversy

IA et Art : processus de création des œuvres IA présenté par @obv_ious pic.twitter.com/p3wqJpf9zs
— CNIL (@CNIL) November 28, 2023

The Unexpected Use of AI Databases

When tech enthusiasts employ AI systems to generate visuals, they are often unaware that these tools can draw from collections of objectionable images. According to research conducted by the Stanford Internet Observatory and relayed by 404 Media, this alarming situation is linked to LAION-5B, a vast database used for training multiple algorithms, including the notable Stable Diffusion. This immense library of approximately six billion items unfortunately includes thousands of illicit files, with no less than 3,226 images classified as child pornography.

The controversial origin of the data

The nonprofit LAION was originally created with the goal of providing publicly available machine learning tools. LAION-5B is among its main contributions. The database lists various image links from the Internet, including social platforms where child pornography can unfortunately hide. Marcus Rogers of Purdue University raises a harsh criticism in this regard: companies either lack the will to become aware of illicit content, or they have frankly lost control over the content distributed.

Ineffective filtration attempts

As early as 2021, LAION managers expressed doubts about the legal compliance of certain elements of their database. Despite attempts at cleanup, questionable images remained and LAION-5B was made available to the public. Reports from affected users were only seriously considered after several months, ultimately leading to the temporary suspension of LAION-5B and another base, LAION-44M, while these tools were rendered harmless.

Implications for users

The implications are serious for anyone who downloads all of these databases without taking extreme measures: they potentially find themselves in possession of illegal content. David Thiel of Stanford points out that the filters developed by LAION to eliminate these images only appeared recently.

Summary Table

Database	Number of images	Illicit content	Action by LAION
LAION-5B	~6 billion	Yes, includes child pornography images	Filtering and temporary suspension
LAION-44M	Not specified	Potentially (preventive suspension)	Temporary suspension

In conclusion, this case raises crucial ethical and legal questions about the management of data by organizations that promote open source, as well as the responsibilities of users of these databases. It highlights the need to protect the integrity and security of generative AI systems.

Rate this article