Understanding the Copyright Issue
Saying that AI simply 'copies' what it sees is wrong and misleading. It does not help us understand the issue of copyright and AI. AI generates text and images based on what we might call a 'vocabulary' and a 'grammar', and a discussion of copyright in AI needs to address about the appropriateness of the words and forms used by the AI to generate its output.
---------
There is no shortage of issues around the use of AI, and one of the most salient is the copyright issue. In a nutshell, the issue is this:
AI copies content from other sources on the internet and presents it as its own.
This would not be an issue at all were there no evidence that AI copies content. Unfortunately, it has not been hard to find evidence. Case closed, right? Well, no, but it takes some explaining. And, admittedly, in some cases there's no amount of explaining that will be sufficient.
As our starting point let's take this article from Gary Marcus and Reid Southen in IEEE Spectrum. "The authors found that Midjourney could create all these images, which appear to display copyrighted material:"
They also pointed to extracts of text that appeared to be copied from the New York Times, saying, "We will call such near-verbatim outputs 'plagiaristic outputs,' because if a human created them we would call them prima facie instances of plagiarism."
Marcus and Southen pose the question as follows: "Can image-generating models be induced to produce plagiaristic outputs based on copyright materials?" The answer, obviously, is yes. We can see the evidence in the image displayed just above.
We need to be clear about what is being produced in the examples above, and we need to be clear about what we are accusing the generative AI systems of doing.
The images above, and the other examples produced by Marcus and Southen, are not exact copies of images from the movies or TV shows from which they appear to be drawn. They are clear about this. In the original movie image of Thanos (lower right in our image above) there is a different background. So it's not an exact copy of the original, though large parts of it appear to be copied.
We want to be clear that what is not at stake here is any use of illegally obtained copied of the images. That is a separate issue. There are standard libraries of internet images and contents used not only by AI engineers to train their models but also by scholars and researchers, for example, the Common Crawl dataset. For what follows, I'm going to assume that all contents used to train AI were legally obtained, using methods to view content any person using the web in a legal manner could use.
Now, what are the AIs being accused of? There are different types of (what we'll loosely call) copying.
- There's outright copyright violation: taking an exact copy of something and selling it as the original.
- There's piracy: taking an exact copy of something and giving it away.
- There are cases of copying that are not actually copyright violations: where they're non-commercial and educational, or transformative, or satirical, etc.
- There's plagiarism, which is the representation of someone's ideas as one's own.
- And there's even a soft sort of plagiarism, where you copy the ideas, correctly attribute it, but don't use quotation marks.
Marcus and Southen use the softer definition of plagiarism, but seem to imply the more serious offense of copyright violation. We need to be sure the evidence matches the accusation.
Now let's return to the prima facie case from above. I'll borrow from Alan Levine to make the case: "If you can enter 'popular 90s animated cartoon with yellow skin' and get back the Simpsons says something is fishy." I am inclined to agree. But what exactly is happening. I think a few things are happening that combine to produce the result we are seeing, and analyzing these things helps us understand the copyright issue more deeply.
Now the AI need not make that leap to the Simpsons. Here's an image from D'Arcy Norman of a cartoon in a Simpson's style that clearly isn't the Simpsons:
But I think it is a straight one-move leap to jump from 'popular 90s animated cartoon with yellow skin' to 'the Simpsons'. Almost no other association is possible! I tried to illustrate this with a Google search of the same term - and yes, there are some others fitting the description (SpongeBob Square Pants, for example) the overwhelmingly obvious choice is the Simpsons. So it would not be surprising for an AI to conclude that 'in the style of a popular 90s animated cartoon with yellow skin' and 'in the style of the Simpsons' are synonyms.And the AI definitely makes that leap.
Aaron B. Smith says, "The argument /has/ been made that Google's Image Search is less than ideal, but at least they provide sourcing information for every image they show and don't create mashups without permission or citation, claiming instead that it is a new picture."
This is a point where we need to pause and think. If the AI is just producing a mashup, then there's really nothing to get excited about. It's just some automated image manipulation. And yes, the original artists would have a great deal to complain about. But it isn't just automated image manipulation - the image displayed by D'Arcy Norman isn't in any obvious way a copy of the Simpsons (or indeed any other cartoon that I am aware of) at all. So what's going on?
Using chatGPT 4 myself (which in turn uses DALL-E) I tried to generate a Simpsons-like cartoon that was not the Simpsons. It proved surprisingly impossible.
On my first attempt I simply got the Simpsons (I'd display the whole thread but chatGPT doesn't yet support the sharing of conversations with the images embedded).
I tried telling it not to use Simpsons characters:
Still pretty recognizably the Simpsons. Even when I explicitly told it the ways in which I did not what the output to resemble the Simpsons, it resembled the Simpsons.
I concluded, "I tried to get it to create cartoons of non-Simpsons characters but it
was utterly unable to do so. It would create 'original' characters by
changing their clothes or putting on sunglasses." I'm not sure DALL-E even understands the concept of a 'Simpsons character".
So what does this tell me? DALL-E's understanding of 'Simpson's style cartoon' is based on a set of basic elements: a few characteristic heads, body types, clothes etc. and it creates the cartoons by reassembling those, but it's impossible to do so without creating essentially recreations of Simpson's characters. here (after a couple attempts) is a representation of that character set:
So now we construct a story of what's happening. The AI draws on what we might call a 'vocabulary' of cartoon parts. In some cases, it selects a 'Simpsons style' set of cartoon parts. In other cases (as in the D'Arcy Norman diagram) it starts with a different set of cartoon parts.
It then organizes those cartoon parts according to what we might call a 'grammar'. For example, 'eyes go on faces', shirts are placed below heads', etc. This grammar, though, isn't a set of rules; it's a neural network trained on examples of those specific cartoon parts organized in different ways. So we have a two-step process:
- create a 'style' of characteristic image parts found in a set of visually similar images
- create a 'grammar' for that style by learning how those image parts are organized to form wholes in that set of visually similar images
Now this is obviously a simplification, and it's also a process that can be manipulated (by, say, pre-selecting the images that will be used and giving the style a name, like 'Simpsons', that can be inferred from textual clues as being what the user wants).
So is this plagiarism, or even a copyright infringement? Let's return to Aaron Smith's comments: "The argument /has/ been made that Google's Image Search is less than ideal, but at least they provide sourcing information for every image they show and don't create mashups without permission or citation, claiming instead that it is a new picture." To which he adds, ""I can't cite my sources" is an excuse my students try to use to get out of crediting the artists who did the work. It sounds like AI developers claimed 'It's toooo haaaaaard!' was a legal defense."
In theory, the developers can and should identify their image source (if only to reassure us that it was, in fact, legally obtained). But it should be clear that no part of the image is coming from a single source:
- the image parts are patterns that reoccur in a large number of images, and
- the grammar is learned from the entire set of images
So it's not that it's too hard, it's that there is - beyond the image set - literally no 'source' for any given image.
But that should not let the developers off the hook. I think there are two important questions that we need to ask:
1. How small or generic do the components have to be before they no longer count as copying a unique set of components copyright by an author?
Obviously a full Bart-Simpson head is probably a component unique to Matt Groening:
Maybe even the familiar spiky hair is a signature Groening:
But is a single spike, or the specific colour of yellow, or the thickness of the lines?
At a certain point, we get to the smallest possible image part, which is a single pixel, which Matt Groening can't copyright (though Pantone would like you to think it has copyright the colours of some types of pixels).
There's obviously no line here - it's a fuzzy concept, and that's OK. But it should be clear here that the problem with the AI-generated image isn't that it's a copy of the original Simpsons image, but that its vocabulary is far too coarse and limited to be recognizable as anything but a Simpsons image, and it is in many ways impossible to produce characters that are anything other than recognizable Simpsons characters.
2. Can an author copyright a 'grammar'? If so, since a grammar is fuzzy (ie., there aren't specific rules, but rather, an indeterminate set of NN configurations that would produce a similar output), how generic can a copyright grammar be?
To be honest, I don't think there's any grammar that could be copyrighted, though there are certainly some grammars that are recognizable. Consider, for example, the grammar describing the arrangement of features on a face. Recognize this variant?
It's obviously a Picasso grammar (but not actually a Picasso - there's a school that teaches you how to paint like Picasso, and it's from there, unattributed).
The same sort of thing happens with text. Suppose, for example, we asked an AI to write a Simpsons script. Alan Levine provides us with 10 expressions in Bart Simpson style:
'Cow' is a word in the normal English vocabulary, but it is completely transformed when it's part of a Simpsons vocabulary, "Don't have a cow, man" (which is in turn a derived form of another slang vocabulary).
The copyright issue - at least, this part of it - boils down to the question: what are we allowed to use as 'words' in our 'vocabulary', and what amounts to plagiarism or copyright infringement?
There's the naive view, expressed by Poritz: "I remember when we had to start teaching our students that not everything on the internet was there for them to use however they wanted without permission or attribution/citation." This in many cases was not true: Google could create thumbnails of whatever it found on the web and people could create 'lolcats' out of images and phrases, and of course public domain stuff and CC0 content was there for the taking.
There's the generous view, expressed by Alan Levine: "there are rules and norms and terms in licenses, but we also can act on our own agency; I would never uses said 90s popular yellow cartoon images, and anything I reuse I got to great ends to attribute, even if the licensry be vague. (which is why I attribute public domain even if the rules say I dont have to). The systems be damned, we can individually act better."
Poritz asks, "Why is taking a little bit of many people's creativity and running it all through a blender a reasonable way to behave? If I published a poem consisting of lots of lines glued together from many other poets' work, without giving them credit, that would be horrible ... this is a little more sophisticated, but really quite similar." But the question here is: can a line of poetry be used as a 'word', that doesn't need attribution, or does it constitute an 'expression', which does?
Pointing to an instance that is clearly on one side of that line is not evidence that the line does not exist. It is, at best, an argument for placement of the line. But there's no point being overly precise, because the line isn't clearly definable and moves around a lot according to context. Bricolage, for example, is art, not theft.
(David-Baptiste Chirot, 'Hidden in plain sight': found visual/sound poetries of feeling eyes and seeing hands Jerome Rothenberg.)
Saying that AI simply 'copies' what it sees is wrong and misleading. It does not help us understand the issue of copyright and AI. AI generates text and images based on what we might call a 'vocabulary' and a 'grammar', and a discussion of copyright in AI needs to address about the appropriateness of the words and forms used by the AI to generate its output.
This comment has been removed by a blog administrator.
ReplyDelete