Discover more from Sonja's Grey Matter
Why I Don't Mind AI Training On My Work: A Creative's Perspective
I've been contemplating the complex relationship between AI and the creative arts, specifically with regard to the ethics of AI development with regard to input from creatives.
It's a very sticky topic that has sparked a lot of debate, and there's a lot of emotion involved.
I also respect the fear that surrounds this topic, which is why it’s taken me some time to write about it. AI creates the potential for a dramatic shift of who’s doing what work, and I understand that it’s likely to cause unemployment for some (maybe many), and employment for others.
Even so, there are two talking points that I see regularly used to say that the development of AI is unethical:
AI was trained on content all over the web without permission, including copyrighted works
Companies training AI content didn't compensate creatives for using their work
Both of these are absolutely true. There's nothing inaccurate about either statement.
I have been a creative all my life - art, photography, design, web development, writing - I've worked in all of those fields to varying degrees, from dabbling to professional.
And I have no problem with either of the points in the list above.
1. AI Trained on Copyrighted Content
I was trained on copyrighted content. Every artist I know was trained on copyrighted content. I have spent countless hours studying the content of others. In traditional school settings, I learned about artists from the past and present in traditional classroom settings. And after that, examining the work of other artists, whether in real life or online. I'm inspired and repelled by work on a regular basis, and I use that work to motivate and steer my own creation.
"But AI is stealing the content."
"But AI can do it at scale."
First, AI (probably) didn't steal content. While we don't have complete answers about how GPT or Midjourney (front runners in text and image generation, respectively) were trained, most simply they likely used web scraping libraries that already exist (you can look up "Common Crawl" for an example of this, which has been around since 2008) as well as their own crawlers, and used those libraries to teach the AI to look for patterns. AI learns patterns, whether visual or in text, and then through human feedback processes learns which ones are "desirable" to the humans giving the feedback, and which aren't. Do this a hundred thousand times and the AI neural network grows and improves ... or doesn't, depending on how it's going.
So eventually, if AI "reads" a passage from Harry Potter often enough, it'll start to learn the patterns of that passage and reproduce it more or less, because it has associated the words from that passage with the words "Harry" and "Potter." Or even more simply, when it sees "Harry," it recognizes that a next likely word might be "Potter" ... except if it was preceded by "Prince," in which case the next words are more likely to be "and Meghan."
But there isn't a database sitting inside the AI with compressed samples of content that have been taken. (Note: If GPT or Midjourney IS sitting on a literal database full of content that it literally copies word for word or pixel for pixel, that, in my opinion, is a different story. But I haven't seen anything yet that shows this is the case.)
AI "looked" at the content, and learned. Just like we do.
And yes, it did it at scale. ... Quite honestly, I don't understand this argument.
Let's look at the following samples.
Drew likes to draw faces. She practices sometimes, but not very often. She improves a little bit but it's not really the point, she just likes drawing faces.
Grace also likes to draw faces. They practice regularly, get extremely good at it. They use various mediums and explores the different textures of each medium.
Hazel too! Hazel has been drawing faces since she was 4, somewhat obsessively. She was excited to learn about Sketchbook, digital drawing software, where she can use a tool called "predictive line" where the line gets automatically smoothed as she draws. It dramatically improves her ability and speed when drawing. She also uses digital tools for how she adds color to her work.
Olivia works entirely with digital editing. She loves compositing in particular, and uses a lot of free online images to do her work. She's paid professionally and it's a full time job.
Mia runs a business where her employees all work in Photoshop to create images for book covers. All of her employees studied art in some fashion prior to working at her studio.
All of these examples are of different "scale." From Drew's limited interest in study, to the "quantity" of study that falls under Mia's business umbrella. The example of Mia's business could be a small studio, or a multi-billion dollar company - her example covers the full range of artistic knowledge "under one roof," as they say.
And then there's AI. AI's ability to study way outpaces any human. That's pretty much the point. So if the argument is "AI is different because it's different," then, yeah that's true.
There are a lot of reasons AI's future impact is potentially problematic. I do believe AI will cause disruptions in the whole human workforce, and that's something we should care about.
But it's ability to learn a whole lot of shit very quickly doesn't bother me at all. To me, it's like giving everyone a calculator when they were using a slide rule before, and that's pretty cool.
2. Creatives Weren't Compensated
I am under no obligation to pay creatives for looking at the content that they put online for free.
I've certainly paid for art I liked in order to hang it up in my home. I've bought books I wanted to own. I've learned from people I respect, and I've even sent payments to people after their work has helped me.
But there's no obligation to do so. I can Will Hunting my way through life if I want to.
And I don't see a reason AI developers should need to do this, either. If the tech were a database, again this might be different. That would literally be taking the content and storing it for future use in some unauthorized way.
But AI is learning the pattern of being human.
There's a side point here that's less directly relevant, and more like a lining to the larger point.
The arguments I've seen that artists should be compensated don't seem to include a sense of scale. Normally when we consider an artist being stolen from, we think of 1, 5, maybe 50 artists who had their work stolen in their entirety.
But in this case, the work being "taken" is the pattern of your ideas as it's mixed together with potentially thousands of other patterns of ideas. The percentage being taken to create the whole is a fraction of a creatives actual work, which generally falls under fair use.
That is, until someone using the AI writes "give me the entire contents of the first book of Harry Potter" (which, btw, ChatGPT will refuse to do), or if someone says "generate [some title of a known picture] by [some artist]." At which point, in my opinion, that's on the user, not the tool. If I go into a museum, take a great photo of a copyrighted work, and then try to sell that for myself - is the responsibility for that action on me, or Canon?
Plagiarism and forgery predate AI by centuries, and the same rules should apply to the user of the tool, not the tool itself.
I'm glad there are lawsuits underway. I think this is just the beginning - I don't know that we'll get a full picture of the legality of AI for quite some time, similar to how fair use continues to be a struggle legally even now.
And it should be said that there are other significant ethical issues. Bias in the input and output of all AI is definitely a work in progress. The work conditions of the data labelers are a big concern. I'm glad to know there are individuals more expert in the field than I ever will be, who are working to address this issue.
But if all of my publicly facing work has been scraped to create AI - more power to 'em, they can have it.