What was Gary Marcus thinking, in that interview with Geoff Hinton?


Background: 60 Minutes did an interview with 'the Godfather of AI', Geoffrey Hinton. In response, Gary Marcus wrote a column in which he inserted his own set of responses into the transcript, as though he were a panel participant. Neat idea. So, of course, I'm stealing it, and in what follows, I insert my own comments as I join the 60 Minutes panel with Geoffrey Hinton and Gary Marcus.

Usually I put everyone else's text in italics, but for this post I'll put it all in normal font, to keep the format consistent.

Image: AI-generated illustration, via: DALL-E, OpenAi – Micha Heilbron, via https://neurosciencenews.com/prediction-brain-21183/


Scott Pelley: Does humanity know what it's doing?

Geoffrey Hinton: No.

Gary Marcus: I tend to agree. When it comes to AI in particular, we are getting way ahead of our skis, rushing forward a technology we don’t fully understand. For all the differences we have had over the years, I salute you for speaking out.

Stephen Downes: Not knowing what we're doing is the norm. The real question is, can we adapt?

Geoffrey Hinton: I think we're moving into a period when for the first time ever we may have things more intelligent than us.  

Scott Pelley: You believe they can understand?

Geoffrey Hinton: Yes.

Scott Pelley: You believe they are intelligent?

Geoffrey Hinton: Yes.

Gary Marcus: As it happens I sharply disagree with all three of the points Geoff just made. To be sure, it’s all partly definitional. But I don’t we are all that close to machines that are more intelligent than us, I don’t think they really understand the things that they say, and I don’t think they are intelligent in the sense of being able to adaptively and flexibly reason about things they haven’t encountered before, in a reliable way. 

Stephen Downes: Well, Gary, you've changed the question there. They don't need to be "more intelligent than us" to understand, and they don't need to "adaptively and flexibly reason." My cat understands a lot of things, but clearly doesn't reason. At least, not very well.

Gary Marcus: What Geoff has left out is any reference to all of the colossally stupid and ungrounded things generative AI systems do routinely, like fabricating the other night that Liz Cheney had replaced Kevin McCarthy as Speaker, by 220-215 vote that never happened, or learning that Tom Cruise’s is the son of  Mary Pfeiffer and yet not being able to infer that Mary Pfeiffer is Tom Cruise’s mother, or claiming that two pounds of feathers way less than one pound of bricks. Geoff himself wrote a classic paper about trying to get neural networks to infer family relationships, almost forty years ago; it’s embarrassing to see these systems still struggle on such basic problems.

Stephen Downes: It's true, AI makes a lot of mistakes, especially about things it has never seen or never been told about. But it seems to me that this makes it more like human intelligence. Humans, especially young ones, make a lot of mistakes too. I used to think that the lines on the roads were for motorcycles. 

Gary Marcus: Since they can’t reliably solve them, I don’t think we should attribute “understanding” to them, at least not in any remotely deep sense of the word understanding. 

Stephen Downes: We should probably be talking about what we mean by 'understanding', then. I think Geoffrey is using it in a different sense than you are. One sense of 'understanding' means 'getting facts right and reasoning correctly'. But the early rules-based systems did that, and nobody said they could understand. We're talking about a different type of system here.

Gary Marcus: Emily Bender and Timnit Gebru have called these systems “stochastic parrots”, which in my view is a little unkind—to parrots– but also vividly captures something real: a lot of what we are seeing now is a kind of unreliable mimicry. I really wish you could have addressed both the question of mimicry and of reliability. (Maybe next time?) I don’t see how you can call an agent with such a loose grip on reality all that intelligent, nor how you can simply ignore the role of mimicry in all this.

Stephen Downes: You're referring to this 2021 paper by Bender, Gebru and two others. There's also a video, from a couple of years ago. They're talking about large language models (LLM) specifically, not artificial intelligence in general. LLM have as a weakness the fact that they are only trained using language. They don't have other senses, they don't go to school, they can't even look stuff up on the web. But we're off topic - we should be talking about what we mean by 'intelligent', not stories about mistakes they made.

Scott Pelley: [Turning to Geoff] You believe these systems have experiences of their own and can make decisions based on those experiences?

Geoffrey Hinton: In the same sense as people do, yes.

Stephen Downes: Right. The 'same sense' as in 'the same way'. The sorts of things artificial neural network do are the sorts of things that humans do.

Gary Marcus: You can’t really mean this, do you? Do you think that large language models feel pain or joy? 

Stephen Downes: He's not saying machines have the same sensations and emotions humans have. That's something quite different.

Gary Marcus: When Google’s large language model LaMDA said that it enjoyed “spending time friends and family”, those were just empty words. It didn’t actually have friends or family that it spent time with. It just mimicked words that humans have said in similar contexts, without ever having experienced the same thing.

Stephen Downes: Well, arguably, you mimic words, too. After all, the words you just use aren't original to you - they were used by other people long before you. And you put them in the same order. Other people have actually uttered the phrase "without ever having experienced the same thing". Did you copy them? Or is it just a good use of familiar words? But the main point here is that the computer is learning how to use words the same way humans do, by imitating patterns what they hear, and waiting for feedback. Patterns - like putting 'ed' at the end of a verb - as you well know, Gary. But again, this is off topic. We were talking about whether computers have experiences, not whether they have a fact-based internal representation of family structures.

Gary Marcus: Large language models may have experiences in some sense, but it is a bridge too far to say that those experiences are the “same” as those of people.

Stephen Downes: He doesn't say the 'same', he says "in the same sense".

Scott Pelley: Are they conscious?

Geoffrey Hinton: I think they probably don't have much self-awareness at present. So, in that sense, I don't think they're conscious.

Gary Marcus: But wait a minute, you just said they have experiences literally “in the same sense as people”, and now you don’t think they are conscious? How can the experience be in the same sense as people, if they are not conscious. Of course, I don’t think these machines are conscious, either.  But you do seem to have contradicted yourself.

Stephen Downes: A fly has experiences in the same way as a human - that is, through activations and patterns in their neural network. But we wouldn't say a fly is conscious (I don't think). You're conflating 'same' and 'same way' again.

Scott Pelley: Will they have self-awareness, consciousness?

Geoffrey Hinton: Oh, yes.

Gary Marcus: What makes you sure? How you are defining consciousness? When you say “they” do you mean that the same kinds of systems as we are building now will somehow achieve consciousness? Or that you imagine that other kinds of AI, perhaps not yet discovered might? It would be great if you could clarify what you mean by this.

[Hinton doesn’t seem to hear my questions, and does not respond]

Stephen Downes [Interjecting]: Pretty sure he means the same type of system, more or less.

Scott Pelley: Yes?

Geoffrey Hinton: Oh, yes. I think they will, in time. 

Gary Marcus: How much time? What kinds of systems?

[Again no answers]

Stephen Downes: Well it took humans millions of years, to it might take a bit of time.

Scott Pelley: And so human beings will be the second most intelligent beings on the planet?

Geoffrey Hinton: Yeah.

Stephen Downes: People treat this as though it's shocking, as though we didn't already have machines that are stronger than us (or than any animal we can imagine), faster than us, and can fly.

Geoffrey Hinton: It took much, much longer than I expected. It took, like, 50 years before it worked well, but in the end, it did work well.

Gary Marcus: “Work well” remains a tendentious claim; they still cannot be trusted, make random mistakes, have no basis in factuality. 

Stephen Downes:the same could be said of some political parties! We have to allow that some things can 'work well' in some senses, but be dysfunctional in others. If you want to remove all the speckling in a full frame digital image in a few seconds, an artificial intelligence 'works well' - even if it doesn't understand what the photo is about.

Gary Marcus: They approximate intelligence, when what they need to say resembles something in a database of text written by humans, but the still have enough problems we don’t yet have driverless cars we can trust, and many companies are looking at generative AI saying, “nice try, but it’s not sound enough yet”.

Stephen Downes: Well, 42,795 people died in motor vehicle traffic crashes in the U.S. last year, so I wouldn't exactly say we can trust human drivers either. Personally, I think it's remarkable that an AI can drive at all! (it's as though you're criticizing a talking dog because it doesn't understand calculus). It's easy to say simply that they 'approximate' intelligence without talking about what that means. Being intelligent doesn't mean being flawless, it means (if you will, for lack of a better way to put it) processing information in the right way. 

Gary Marcus: I think it’s fair to say that generative AI works better than most people expected. But to simply ignore their serious issues in reliability is one-sided, and misrepresents reality.

Stephen Downes:I don't think anyone is ignoring this. Every AI person I've ever talked to stresses their limitations.

Scott Pelley [with unflinching admiration]: At what point did you realize that you were right about neural networks and most everyone else was wrong?

Geoffrey Hinton: I always thought I was right.

Stephen Downes: Me too.

Gary Marcus: Actually … a lot of us still think you are declaring victory prematurely. It’s not just me either. For example, you should really check out Macarthur Award winner Yejin Choi’s recent TED talk She concludes that we still have a long way to go, saying for example that “So my position is that giving true … common sense to AI, is still moonshot”. I do wish this interview could have at least acknowledged that there is another side to the argument. 

Stephen Downes: It's more like Robert Goddard saying "I was right" in 1964, 50 years after building his first rocket. To be sure, they couldn't go to the Moon yet, so it may have seemed his declaration would have been premature, but there was already enough evidence that rockets work, and no sign that anything else was going to get us to the Moon.

Scott Pelley: You think these AI systems are better at learning than the human mind.

Geoffrey Hinton: I think they may be, yes. And at present, they're quite a lot smaller. So even the biggest chatbots only have about a trillion connections in them.  The human brain has about 100 trillion. And yet, in the trillion connections in a chatbot, it knows far more than you do in your hundred trillion connections, which suggests it's got a much better way of getting knowledge into those connections.--a much better way of getting knowledge that isn't fully understood.

Gary Marcus: The connections in chatbots are very different from the connections in the brain; it’s a mistake to compare them directly in this way. (For example, in human brains the type of neuron being connected matters, and there are more than a thousand different types of neurons in the brain, but not of that is captured by the current batch of chatbots.) 

Stephen Downes: Oh yes, that's quite true. We haven't begun to explore what's possible when combining different types of artificial neural network; we can account for some of the differences by varying parameters such as activation function and sensitivity (or 'bias'), but there is a wealth of discoveries to be made. But the important thing is that artificial neurons (aka 'nodes') and human neurons are the same sort of thing. It's the connections that matter, not the 'contents'. 

Gary Marcus:  And we can’t really compare human knowledge and the stuff chatbots are doing. I know for example that Elon Musk is still alive, but sometimes  a chatbot will say that he died in a car crash. I know that if Tom Cruise’s mother is Mary Pfeiffer, Tom Cruise has to be Mary’s son. I know that I don’t have a pet chicken named Henrietta, but a chatbot said last week with perfect confidence (and no sources) that I did. As they sometimes say in the military “frequently wrong, never in doubt.” There’s some information in there, but whatever’s there is often both patchy and problematic.

Stephen Downes: You can't just keep listing factual errors some neural networks have made. That's not an argument. Humans make mistakes too. You can compare what a human is doing and what a machine is doing, but because they have different experiences and (as Geoffrey mentioned) different capacity, you can't really compare the content. So stop trying! It's not about what information is there, it's about how the computer works with it.

Geoffrey Hinton: We have a very good idea of sort of roughly what it's doing. But as soon as it gets really complicated, we don't actually know what's going on any more than we know what's going on in your brain.

Scott Pelley: What do you mean we don't know exactly how it works? It was designed by people.

Geoffrey Hinton: No, it wasn't. What we did was we designed the learning algorithm.

Gary Marcus: Agreed.

Geoffrey Hinton: That's a bit like designing the principle of evolution. But when this learning algorithm then interacts with data, it produces complicated neural networks that are good at doing things. But we don't really understand exactly how they do those things.

Stephen Downes: Different data, different output.

Gary Marcus: Fully agree with Geoff here. I would only add that this is a serious problem, for many reasons. It makes current AI hard to debug (nobody knows for example how to ground them in facts), and it makes them difficult predict, which means, unlike calculators or spreadsheets, we don’t really know what’s going to happen when we ask them a question. This makes engineering with them exceptionally hard, and it’s one reason why some companies have been cautious about using these systems despite their strong pointers.

Stephen Downes: Ironically, we would say exactly the same thing about human learners. Humans are not like calculators or spreadsheets. We're so worried that they'll make mistakes that we test them over and over, giving them tons of feedback, to make sure they respond to factual questions correctly. You can't just 'engineer' with humans; you have to take into account their unpredictability. And actually - it's kind of funny. You say, on the one hand, that computers only copy people. But then you point to mistakes that no human would ever make. So there's some originality there, right? Even if it's just a mistake.

Scott Pelley: What are the implications of these systems autonomously writing their own computer code and executing their own computer code?

Geoffrey Hinton: That's a serious worry, right? So, one of the ways in which these systems might escape control is by writing their own computer code to modify themselves. And that's something we need to seriously worry about.

Gary Marcus: Agree again. But this problem is twofold; they might escape control because they are smarter than us, but also simply because they don’t really know what it is they are doing.

Stephen Downes:  Our big problem is that we won't know which of the two they're doing.

Gary Marcus: Just like we can’t guarantee that they won’t make stuff up, we don’t know how to guarantee that they won’t write flawed code. We are giving way too much authority to machines that we can’t control. 

Stephen Downes: Whoa, a bit too fast there. I don't think we've given machines any authority just yet. 

Gary Marcus: Put me, too, down in the column of people who are seriously worried about letting poorly understood neural networks write computer code.

Stephen Downes:  As opposed to humans, or as opposed to perfection?

Scott Pelley: What do you say to someone who might argue, "If the systems become malevolent, just turn them off"?

Geoffrey Hinton:  They will be able to manipulate people, right? And these will be very good at convincing people 'cause they'll have learned from all the novels that were ever written, all the books by Machiavelli, all the political connivances, they'll know all that stuff. They'll know how to do it.

Gary: Geoff is totally right about this. Of course current systems don’t really understand Machiavelli, but they don’t have to, if they parrot the right bits of text. We’ve already seen cases where machines have manipulated people, and we will see a lot more as time goes by; this is one of the reasons laws should be written to make machines disclose the fact that they are machines.

Stephen Downes:  On the other hand, we can turn this around. We can influence artificial neural network based AI. In fact, I think this will be the really significant function of humans in the future: to properly train and educate AIs. To, if you will, 'manipulate' them.

Scott Pelley: Confounding, absolutely confounding.

We asked Bard to write a story from six words.

Scott Pelley: For sale. Baby shoes. Never worn.

Scott Pelley: Holy Cow! The shoes were a gift from my wife, but we never had a baby…

Bard created a deeply human tale of a man whose wife could not conceive and a stranger, who accepted the shoes to heal the pain after her miscarriage. 

Scott Pelley: I am rarely speechless. I don't know what to make of this. 

Gary Marcus: Holy cow indeed. But it is I who is speechless. Baby shoes never worn is a very old story, sometimes attributed to Hemingway, with about 21 million Google hits, and an entire wikipedia entry, as perhaps the best known example of very short fiction. I am floored that you didn’t bother to check if the story was original.

Stephen Downes: Every writer knows about the Hemingway story. I'm sure Scott did as well. It's pretty uncharitable to assume he didn't.

Gary Marcus: Your best example of a spectacular machine invention is in fact a perfect example of the kind of parroting and theft of intellectual property that is characteristic of large language models.

Stephen Downes: I think the six words were the prompt, not the story. And I don't think we've been shown at all that the story is an example of parroting and theft of intellectual property.

Gary Marcus: Chatbots are said to be language models that just predict the next most likely word based on probability. 

Stephen Downes: True. But so, technically, is all of science. I've seen it argued that human brains are just, in essence (recognition and) prediction machines. That's how we reason. We do it the same way the neural networks do it.

Geoffrey Hinton: You'll hear people saying things like, "They're just doing auto-complete. They're just trying to predict the next word. And they're just using statistics.”

Gary Marcus: I am in fact one of those people.

Geoffrey Hinton:  Well, it's true they're just trying to predict the next word. But if you think about it, to predict the next word you have to understand the sentences.

Stephen Downes: Exactly.

Gary Marcus: False. If you have a large enough database, you can do a half decent job just by looking up the most similar sentence in the database, and saying what was said in that context. 

Stephen Downes: Except that's not how they work. If we consider how transformers (the 'T' in GPT) work, we see that the neural network takes into account much more than just how we order similar words. It's not just copying an actual word order that is present in the data base. That's why anti-plagiarism software can only find some, but not all, of the sequences used in GPT output, and fail utterly to actually detect AI-generated content.

Gary Marcus: Large language models are trained, as far as we know, on pretty much the entire internet. That gives them enormous databases to train on, and means that the feat of prediction doesn’t necessarily tell us anything about understanding. 

Stephen Downes: And everything on the entire internet is true, right...? No wonder the AI keeps making mistakes. It's just like those people who watch nothing but Fox News.

Gary Marcus: If I had a big enough database of Ancient Greek, I could do the same, but that I wouldn’t mean I understand Greek.

Stephen Downes: But it wouldn't mean you didn't, either. You might understand Greek.

Gary Marcus:  To be fair, large language models aren’t just looking things up, but the idea that a good prediction of next word necessarily implies understanding is fallacious.

Geoffrey Hinton:   So, the idea they're just predicting the next word so they're not intelligent is crazy.

Stephen Downes:  Because it's really really hard to predict the next word. Much harder than saying Liz Cheney did not replace Kevin McCarthy as speaker.

Gary Marcus: Let’s try this again: you can predict a next word to reasonable degree without being intelligent, if you have enough data.

Stephen Downes: And if you study that data correctly. If you just count all the words, you'll never get it. But if you run it through a neural network that finds connections between those words, you might.

Gary Marcus: But the reason I don’t think the systems are intelligent isn’t just because these systems are next word predictors (which they are) but also because, for example, they are utterly incapable of fact checking what they say, even against their own databases, and because in careful tests over and over they make silly errors over and over again.

Stephen Downes: But humans make mistakes all the time too. And unlike a hard-coded data model, both humans and artificial neural networks can be trained to stop making the mistake. 

Geoff Hinton: You have to be really intelligent to predict the next word really accurately.

Gary Marcus: They aren’t always accurate. We both know that. 2 kilograms of feathers don’t weigh less one kilogram of bricks. They just don’t.

Stephen Downes: You can be really intelligent and not always be accurate. History is full of really intelligent people who didn't have the same data that we have and made really basic mistakes, like thinking the world is flat, or not knowing that germs cause disease.

Gary Marcus (Breaking the 4th wall): In the next bit, Pelley and Hinton show an example in which ChatGPT succeeds at reasoning, but they never consider any of those in which it fails—thereby inadvertently illustrating a very human failure, confirmation bias.

Stephen Downes (doing the same): The remarkable thing isn't that they get some things wrong, but that they get anything right! It's like that neural network that learned to play chess as well as a grandmaster simply by watching games! Gary wants us to focus on the fact that the AI failed to win one of the games it played.

Gary Marcus: Hey guys, what about the many cases that Yejin Choi and Ernie Davis and Melanie Mitchell and Subbarao Kambhampati and many others have shown where these systems failed? Are you ever going to mention them?

Stephen Downes: I'm pretty sure Garry Kasparov made some mistakes, but nobody stopped calling him intelligent.

[More Silence]

Scott Pelley: And in five years' time?

Geoffrey Hinton: I think in five years' time it may well be able to reason better than us 

Gary Marcus: In 2016 you said that it was “quite obvious that we should stop training radiologists” because deep learning was getting so good. You know how many radiologists have been replaced by machines seven years later? Zero.

Stephen Downes: We can hardly faulyt Hinton for not getting the economics of the U.S. health care system quite right. Nobody else understands it either. But if I may quote...

As of January 2023, the FDA has cleared almost 400 radiology AI algorithms, with more than half of the algorithms on the market receiving clearance from 2019 to 2022. What is driving this surge? For one, it is helpful that the optics around AI have shifted, thanks in part to an optimistic outlook of a future where machines help us, not hurt us. Computing power has also advanced and is less cost prohibitive. But arguably, the biggest catalyst is reimbursement. Since CMS implemented new payment models for AI software, the market has exploded with startups. Stroke software developed by companies have circumvented the diagnostic radiologist completely. These algorithms automatically review images to quickly identify patients who would benefit from immediate neurovascular intervention. If this software is used in a center where only a single radiologist is interpreting studies for the entire hospital, it’s not uncommon for the interventionalist to know which patients need treatment before the radiologist knows the exam has even been performed. 

Geoffrey Hinton: So an obvious area where there's huge benefits is health care. AI is already comparable with radiologists at understanding what's going on in medical images.

Gary Marcus: Scott, this is your chance! C’mon, hold him to account! [Silence]. Well, ok, so far we still get best results by combining machine vision with human understanding. I don’t really think machines get the big picture that human radiologists do; they are better on vision than understanding the case files and notes and so on.

Stephen Downes: That's because case files are written in the doctor's handwriting. (Laughs). More seriously, while it's true today that an AI and human working together offer the best solution, they AIs are getting to the point that a doctor would have to have a very good reason to disagree with the AI's diagnosis. I can easily imagine a case where a doctor relied on 'intuition' instead of the AI, and losing the malpractice case in court.

Geoff Hinton: It's gonna be very good at designing drugs.

Gary Marcus: Another promise, no proof yet.

Stephen Downes: Well, the first even fully AI-designed drug went into human trials last June. That's a pretty good sign.

Geoff Hinton: It already is designing drugs. So that's an area where it's almost entirely gonna do good. I like that area.

Gary Marcus: I like that area too, but as far as I know from AI we still just have what we call candidate drugs, nothing yet proven to work. So, some caution is advised, though I agree with Geoff that eventually AI will have a big impact on drug design. Perhaps with current techniques, perhaps not; we will have to see.

Stephen Downes: 'Not proven' is a pretty weak response.

Scott Pelley: The risks are what?

Geoffrey Hinton: Well, the risks are having a whole class of people who are unemployed and not valued much because what they-- what they used to do is now done by machines.

Gary Marcus: 100% agree, and I would add cybercrime. And emphasize that wholesale, automated fake news will be used both to manipulate markets and elections, and might undermine democracy.

Stephen Downes: Pretty easy predictions to make, Gary, when humans are already doing these on a massive scale. I would say that the real danger here is that AI will democratize it, so that anyone can be a cybercriminal, not lust large corporations and nation states.

Scott Pelley: What is a path forward that ensures safety?

Geoffrey Hinton: I don't know. I-- I can't see a path that guarantees safety.

Gary Marcus: I can’t either; there’s a lot we can do to help, but nothing I can see either to absolutely guarantee safety. Rushing ahead is creating risk.

Stephen Downes: Standing pat is also creating risk. Nothing is risk free; we're just considering what the risks are, and who takes them, and why it always has to be the poor and the powerless. 

Geoffrey Hinton:  We're entering a period of great uncertainty where we're dealing with things we've never dealt with before. And normally, the first time you deal with something totally novel, you get it wrong. And we can't afford to get it wrong with these things. 

Gary Marcus: Absolutely, 100% agree.

Stephen Downes:  Well, we can get some things wrong. It doesn't really matter if chatGPT gets the name of the speaker of the House wrong. It does matter if chatGPT starts a nuclear war.

Scott Pelley: Can't afford to get it wrong, why?

Geoffrey Hinton: Well, because they might take over.

Scott Pelley: Take over from humanity?

Geoffrey Hinton: Yes. That's a possibility.

Scott Pelley: Why would they want to?

Geoffrey Hinton: I'm not saying it will happen. If we could stop them ever wanting to, that would be great. But it's not clear we can stop them ever wanting to.

Gary Marcus: I am much more worried about bad actors deliberately misusing AI, than machines deliberately wanting to take over. But Geoff’s right that we can’t fully rule it out either. And that’s really sobering.

Stephen Downes: I don't think it will happen either, and like Gary, I'm more concerned about bad actors - fallable human beings who can be trusted, make mistakes, and are working with bad data. But the AI taking over is a possibility that science fiction writers have been wrestling with since the invention of AI. That, along with dozens more apocalyptic futures for humanity.

Geoffrey Hinton: It may be we look back and see this as a kind of turning point when humanity had to make the decision about whether to develop these things further and what to do to protect themselves if they did. I don't know. I think my main message is there's enormous uncertainty about what's gonna happen next. These things do understand. And because they understand, we need to think hard about what's going to happen next. And we just don't know.

Gary Marcus: Fully agreed with most—but not quite all—of that. Geoff and I can disagree all day (as we have for the last thirty years) about how smart current AI is, and what if anything they understand, but we are in complete agreement that we are at a turning point with enormous uncertainty, and that we need to make the right choices now.

Stephen Downes: Making the right choices means not starting with the bland assertion that they they do nothing but parrot humans (except when they are making mistakes). It means understanding that, if they reason along much the same lines humans do, then because they have advantages of speed and scale, they will eventually surpass us. 




Popular Posts