show index hide index
Artificial intelligence at the heart of a child pornography controversy
The Unexpected Use of AI Databases
When tech enthusiasts employ AI systems to generate visuals, they are often unaware that these tools can draw from collections of objectionable images. According to research conducted by the Stanford Internet Observatory and relayed by 404 Media, this alarming situation is linked to LAION-5B, a vast database used for training multiple algorithms, including the notable Stable Diffusion. This immense library of approximately six billion items unfortunately includes thousands of illicit files, with no less than 3,226 images classified as child pornography.
The controversial origin of the data
The nonprofit LAION was originally created with the goal of providing publicly available machine learning tools. LAION-5B is among its main contributions. The database lists various image links from the Internet, including social platforms where child pornography can unfortunately hide. Marcus Rogers of Purdue University raises a harsh criticism in this regard: companies either lack the will to become aware of illicit content, or they have frankly lost control over the content distributed.
Ineffective filtration attempts
As early as 2021, LAION managers expressed doubts about the legal compliance of certain elements of their database. Despite attempts at cleanup, questionable images remained and LAION-5B was made available to the public. Reports from affected users were only seriously considered after several months, ultimately leading to the temporary suspension of LAION-5B and another base, LAION-44M, while these tools were rendered harmless.
Implications for users
The implications are serious for anyone who downloads all of these databases without taking extreme measures: they potentially find themselves in possession of illegal content. David Thiel of Stanford points out that the filters developed by LAION to eliminate these images only appeared recently.
Summary Table
| Database | Number of images | Illicit content | Action by LAION |
| LAION-5B | ~6 billion | Yes, includes child pornography images | Filtering and temporary suspension |
| LAION-44M | Not specified | Potentially (preventive suspension) | Temporary suspension |
In conclusion, this case raises crucial ethical and legal questions about the management of data by organizations that promote open source, as well as the responsibilities of users of these databases. It highlights the need to protect the integrity and security of generative AI systems.
