The AI model collapse: as AI trains on AI content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2023 06:06 AM - edited 06-17-2023 06:12 AM
"A group of researchers from the UK and Canada have looked into this very problem and recently published a paper on their work in the open access journal arXiv. What they found is worrisome for current generative AI technology and its future: "We find that use of model-generated content in training causes irreversible defects in the resulting models." Specifically looking at probability distributions for text-to-text and image-to-image AI generative models, the researchers concluded that "learning from data produced by other models causes model collapse -- a degenerative process whereby, over time, models forget the true underlying data distribution ... this process is inevitable, even for cases with almost ideal conditions for long-term learning.""
Explaining what is happening: as more and more AI created content floods the internet AI will start to train on AI content, creating a loop that will generate irreversibly defects in the resulting models.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2023 06:23 AM
I've been reading about this, the problem and the proposed solution. The proposed solution is curation. Initially it worked to scrape the internet at large but now that will cause model collapse. Future training on content that is verified human created is going to be necessary going forward. One article said this could create new jobs even as some current jobs are taken away: becoming a content creator for datasets rather than for a human audience.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2023 07:25 AM
Another possible solution is auto labeling. I've read Adobe is working on this for Firefly but isn't there yet. If ChatGPT could figure out a way to add a label that would survive cut and paste that would help. Of course that only helps with future content. Current content has a lot of AI on the internet. Curating it out is getting less and less possible. ChatGPT passed the Turing test a while ago. So human created content will have to be curated in until labeling becomes the standard. And it will have to be invented before it can become the standard.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2023 08:52 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2023 01:14 PM
This exactly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2023 09:54 AM
Oh my! Reminds me of the movie, "Multiplicity" where the guy creates clones of himself to help him get all of his work done, and pretty soon he's making clones of clones - but each progressive generation of clones gets stupider and stupider. Sorta like making a tape recording from a tape recording (which, come to think of it, is an analogy that only people of a "certain age" will understand!)
Cat @ ZB Designs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2023 11:11 AM
I never saw that movie. I'll have to look for it.
Oh gosh, tapes. I go back to before 8-tracks, probably back to carving petroglyphs. LOL
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2023 11:57 AM
Petroglyphs... bwahahaha! I used to work at a music school and we had an accountant who was VERY old school. He had trouble with things like email, so we used to joke that the best way to communicate with him was via carrier pigeon!
Anyhow, Multiplicity is worth the watch - not a great cinematic masterpiece by any stretch of the imagination, but it's a cute comedy with Michael Keaton & Andie MacDowell.
Cat @ ZB Designs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2023 01:04 PM
My poor husband is like that accountant, but he's slowly and resentfully learning to do elementary things on the computer. He even goes shopping on it, but only after he's driven all over the place looking for what he wants. He's totally appalled by AI, and I can't say I blame him.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2023 01:17 PM
I've made tape recordings of tape recordings (and xeroxes of xeroxes). Back in the day, making tape recordings of tape recordings was the only way to disseminate the music of local bands that only made tapes. The hiss and echo got so bad. It will on the models too if they are allowed to get too recursive.

