A new certification scheme for copyright-compliant AI has launched, but ChatGPT and other text generators won’t qualify.
Named Fairly Trained, the initiative arrives amid a brewing backlash against generative AI companies. Many of their tools — from OpenAI’s chatbots to Stability AI’s art generators — are trained on copyrighted content that’s scraped from the web. Inspired by this data, the systems then deliver endless creations in response to prompts. Frequently, the outputs are clear derivations of their source material.
The practice has enraged creators and copyright-holders. They argue that their work is stolen and remixed without their consent and compensation. It’s hardly a controversial claim — GenAI leaders have admitted to the practice.
To justify the process, companies point to the “fair use” doctrine, which can allow transformative and socially beneficial use of copyrighted content.
That argument has sparked opprobrium. One of the most prominent critics is Fairly Trained’s CEO, Ed Newton-Rex. A musician and computational creativity pioneer, Newton-Rex attracted headlines in December after quitting his role at Stability AI over the startup’s use of copyrighted content. The 36-year-old raised concerns that the company was “exploiting creators.”
Fairly Trained is his attempt to foster an alternative model. By certifying companies that get a license for their training data, the non-profit wants to create a fairer world for human creators.
The verification shows which companies consider creator consent to be important — and which ones don’t. Consumers can then make an informed decision on their use of GenAI.
“GenAI poses an existential threat to creative industries.
The idea emerged after Newton-Rex resigned from Stability AI. In the debates triggered by his departure, he realised that licensed GenAI tools needed more exposure.
“Both for ethical and legal reasons, there are a lot of people and companies who would rather use generative AI models that are that are trained on licenced data,” Newton-Rex told TNW. “You’ve got a bunch of people who want to use licenced models and you’ve got a bunch of people who are providing those. I didn’t see any way of being able to tell them apart.”
Fairly Trained provides one way to differentiate them. The inaugural batch of certificates was awarded to nine GenAI organisations: Beatoven.ai, Boomy, Bria AI, Endel, LifeScore, Rightsify, Somms.AI, Soundful, and Tuney. While the majority are music makers, image generators are also represented and other media formats are “on the way,” Newton-Rex said.
There is, however, one big gaping in the modalities: text. Newton Rex isn’t aware of any major text generation model that could currently get certified.
“I don’t know any that would pass the bar, because every large language model that I’ve come across has been trained on a huge amount of copyrighted work,” he said.
LLM giants have argued that they have no other choice. In a recent submission to a British parliamentary committee, OpenAI said it would be “impossible” to create the likes of ChatGPT without training them on copyrighted works.
Because copyright protections cover virtually every sort of human expression — including blogs, photos, forum posts, scraps of code, and government documents — the company claims there’s no way to circumvent them.
Newton-Rex sympathises with the predicament, but he believes another approach is possible.
“I’m hopeful that language models will emerge that are trained on a small amount of data and end up being licenced,” he said. “I think there are other ways to do it as well.”
It might take time to create them, but there are real risks of continuing as normal. Newton-Rex believes that GenAI poses an “existential threat” to creative industries — and perhaps to human creativity as we know it.
If you’re interested in applying for the Fairly Trained certificate, you can start the process here.