Is AI a new species? Microsoft’s Mustafa Suleyman thinks so
Table of Contents:
- The metaphor of AI being a new digital species
- Balancing creativity and caution in AI advancement
- Emphasizing emotional intelligence in AI technology
- The potential of voice interfaces in AI interaction
- Fostering creativity through AI collaboration
- The evolution of AI models, scaling, and distillation
- Opportunities for entrepreneurs with small AI models
- Considering the ambient sensing future
- Mustafa Suleyman: Are you all in?
Transcript:
Is AI a new species? Microsoft’s Mustafa Suleyman thinks so
REID HOFFMAN: Hi listeners. This week, we’re sharing my conversation with Mustafa Suleyman from the Masters of Scale Summit.
Mustafa is the CEO of Microsoft AI, and it was inspiring and thought-provoking to talk with him about the transformational moment we’re in.
I hope you enjoy this chat we had live onstage at the Presidio Theatre in San Francisco.
[THEME MUSIC]
The metaphor of AI being a new digital species
HOFFMAN: So other people have sometimes, you know, made parallels between AI and a species, thinking about this as…
MUSTAFA SULEYMAN: Yes…
HOFFMAN: Starting small like I said—
SULEYMAN: Let’s go.
HOFFMAN: Yes, so what’s a way to think about AI? What are the places in which it’s a good lens? What are the places in which it’s a misleading lens? What’s the way that it should guide kind of the global thinking for how we’re getting there?
SULEYMAN: I think when we have something that is fundamentally new and unlike anything we’ve seen before, each new wave of technology really does feel like that. I mean, if you think about how utterly magical and crazy it must have been to have electricity for the first time or even to speak to somebody via a telephone line across the Atlantic, it must have been mind-blowing. It adds a completely new mental representation to your worldview of what’s possible. Each time that happens, we struggle for the right metaphor to relate it to something we do know.
It isn’t ultimately going to be like that thing we do know, but it’s the best that we have before it arrives. I propose this new digital species metaphor just because when you step back and look at the capabilities of these things, it’s really the closest corollary, even though it’s kind of…
Raises a lot of things that we don’t want, and I think frames the question of containment in the correct way. I mean, these models will be able to see what you see, hear what you hear, understand, and interact with text in real time, and take actions on your behalf. Those capabilities are now coming into vogue.
I think the right metaphor, the most similar alternative, is the species. That, I think, is a helpful frame for thinking about what we don’t want as well.
Balancing creativity and caution in AI advancement
HOFFMAN: And what would you say would be the one thing that’s really important that we do, and the one thing that’s really important that we try not to do in order to guide the digital species as we are? By the way, I recommend Mustafa’s book, “The Coming Wave,” which also goes in depth. This is like the 60-second version.
SULEYMAN: I think one of the incredible things about these models is that they don’t give you exactly what you put in. That’s the great ambition of software, right? We want it to tell me something that I don’t know. So hallucinations, I think, are sort of an unfortunate phrase. That’s not a downside to me; that’s the upside.
HOFFMAN: Called creativity.
SULEYMAN: Yeah, creativity. We want a wide variety of possible responses given some input, and that malleability and ambiguity are what we want. Having them learn their own representations of things rather than us handcrafting those features was the core motivation of the last years of machine learning.
It’s great that it’s now doing that, but we need to figure out where the boundary on that learning is, right? At the moment, there’s very little, if any, recursive self-improvement, so there’s not a closed loop of self-improvement that doesn’t have a human directly overseeing it, but we can see that coming into view. In 2025, teams will start experimenting with that.
So I think that’s one thing to watch as something to be cautious about. The other thing is just straight-up autonomy, right? It’s clearly going to increase the risk if these models can interact in arbitrary digital environments, spin up their own VMs, take actions on web pages, interact with APIs, and do all that kind of thing completely independently of human oversight and control. So those are two capabilities that we would be pretty concerned about.
HOFFMAN: Then on the positive side?
SULEYMAN: I think the positive side is that they will be immensely creative. They’re going to help us interact with the very best of ourselves. The fact that they, if well-designed, don’t need to be snarky, judgmental, shame-inducing. Most humans can be harsh, right? There’s no reason for these things to be mean. Some people will program some AI companions to be all those things, but that isn’t an inevitable outcome. That is a choice that some designers will make, and I think structurally we should do everything in our power to limit those kinds of things in the ecosystem, in the norms, and values. But I think there’s plenty of space for these things to really help us be our best selves.
Like, I read this paper a couple of weeks ago that basically reported that a bunch of people who had real conspiracy theories, I’m talking about flat earth-level conspiracy theories…
HOFFMAN: There’s a few of those.
SULEYMAN: Yeah, I just stay away from election yes, but just even, I think we should all agree that the flat earth one is pretty nuts. People who had chatted to a chatbot over an extended period of time, I think it was like six weeks or something, reduced their propensity to believe the conspiracy theory. That’s because the chatbot is patient. It’s non-judgmental, doesn’t put you down, it’s relentless, it always comes back, and generally draws on the kind of scientific literature with an evidence-based view, and so forth. So I think there are very promising signs that the upside really will be incredible.
Emphasizing emotional intelligence in AI technology
HOFFMAN: So, actually, I’m going to jump ahead to a question that I was planning on asking you in a bit, but I think this is a good frame for it, which is when you, Karen, and I started Inflection, one of the founding principles was that EQ is as important as IQ, right? Say a little bit about why, what that meant for Pi, what is the thinking of that? And why is that important across the board, not just for what PI is doing?
SULEYMAN: Right, right. I mean, IQ we can generally consider to be the sort of accuracy, speed, comprehensiveness, relevance of an answer, and the extent to which it has real-time access to information.
All of those things are sort of making, there’s a steady march of progress. What I noticed was that people tended, in the community of AI researchers in general, tended to neglect the importance of the delivery vehicle for the information. It’s kind of a very nerdy thing to just say, well, if I just lay out the facts, then people will clearly see this is correct.
HOFFMAN: Yes, the engineering mindset.
SULEYMAN: Yes, and it turns out that actually the tone, style, the kind of emotional intelligence of these models, the extent to which they ask you questions, the extent to which it reflects back in a sort of type of language that you might use, and so on — that delivery vehicle for the substance is perhaps more important to the majority of consumers than just an objective regurgitation of Wikipedia. I think that’s going to be one of the key capabilities that I think everyone’s starting to wrestle with now. This kind of agentic future isn’t about…
The actions people can clearly see and it’s also about personality. I’m very interested in how we engineer personality because I think that’s what people are really going to find valuable.
The potential of voice interfaces in AI interaction
HOFFMAN: So, speaking of the agentic future, give us a bit of a lens into it. What are the places and how you’re thinking about it from a co-pilot perspective? What are the ways that you think about the likelihood in the next two to five years about how agents will be playing a role in our life? What’s important from the species level now down to the specifics about these agents, and how should we navigate with them?
SULEYMAN: The first step for the agentic future is that your AI companion in general has to see what you see. Having an aid or an assistant or a companion that is really seeing the pixels that you see on the screen, in your browser, on your desktop, on your phone means that there’s a kind of level of constant awareness of your sensory input that enables your companion to also observe what you’re seeing. Then you can use ambiguous references like, “Remember that thing I saw,” or, “Where were those things,” or “what,” and that is a kind of level of understanding we’ve never had before. It enables your AI to then act on your behalf, right, and that means navigating on the browser, using APIs, booking things, buying, and planning. We’ve got a lot of cool demos floating around of those kinds of things at the moment, but it seems we’re still a little way away from getting those ready for production. You can see from all the previous waves, just before GPT-3 there were models, LLMs inside big companies and stuff.
That was probably in sort of 2020-2021, and they were really flaky. I think that may be where we are with the AQ side of things, the actions quotient. I think getting things to work 50-60 percent of the time is great. We have to get them to like 99 percent accuracy. You can see that with voice recognition and dictation, like that’s been a 15 or 20-year trajectory. It’s really only in the last 2-3-4 years where it’s crossed the threshold to 99.5 percent accurate, is personalized, and you’re starting to see a huge uptick in people going voice-first, partly because of the input, but also because of the generation. So I think it’s a few years away from that.
HOFFMAN: And what do you think is the intersection between the voice input? Because I actually think part of it, and you know this because we’ve talked about this a lot, but it’s the generative AI revolution that allows it to be in that conversation, allowing the voice input to work much better. Because you can just speak to it, and then it can interpret what you’re saying. How is that going to bring that extra elevation to the agentic universe?
SULEYMAN: Yeah, the interface, the shape of the interface, just very abstractly governs what you can put into it. Because the search box of a search engine was just a letterbox, we learned to speak the language of search, right?
We compressed our ideas into a three or four or five-word phrase. It’s not even…
HOFFMAN: 1.6 on average, just to be clear.
SULEYMAN: You go even, yeah, maybe 1.6. So I think what’s interesting about these voice experiences is that it unlocks a new part of your mind when interacting with a computer because you can speak in full sentences, you can self-correct, you can go forward and back, add in all the other junk we have when we’re just kind of talking off the top of our head, and then the model speaks back to you in paragraphs. Suddenly, you think to ask and talk about things which you never would have digitized previously.
So that, I think, is probably a good. I mean, I’m pretty sure it’s a good framework that tells you what is likely to happen on the action side of things. Because you have this always-available AI companion that can really do any digital task you can do, I think you will ask it to do things that you don’t do yourself on the computer today. That, I think, is a big shift because the barrier to entry to get something done is about to go through the floor. It’s both because it’s zero marginal cost and because the friction is really diminished, and then you’ll think of things that you hadn’t thought about yourself because it’s much less of a pain.
HOFFMAN: More with Mustafa Suleyman in just a minute.
[AD BREAK]
HOFFMAN: Welcome back to Masters of Scale. You can find this conversation and more on the Masters of Scale YouTube channel.
Fostering creativity through AI collaboration
HOFFMAN: What are the ways you think this will help us also become more creative? What kind of creative, inspiring, and inspirational moments come out of the interaction with these agents?
SULEYMAN: Think about how many random ideas, things that occur to you, questions throughout the whole day. If you just really deeply meditate on your subconscious, those moments when you’re like, “Oh, I wonder,” you don’t have someone with you all the time to listen to your crazy thoughts, other than you. You certainly don’t have the effort to go type something in all the time. Actually getting your phone out and typing something is kind of a high bar. Like for me, I search quite a lot, but probably five to eight times a day. It’s quite a bit of effort.
HOFFMAN: Yes.
SULEYMAN: So if the barrier to entry to getting those things is now lower, then surely the range of creative thoughts you can have, which then get manifested in the context of your AI companion, has to go up. Because it remembers — I mean, this is the other big thing that’s going to come way before actions is memory.
We’re going to nail memory. I mean, I’m really confident 2025 memory is done, permanent memory. We already have memory on the web. We retrieve from the web all the time, quite accurately now. Copilot has really good citations, it’s up to date to 15 minutes ago, knows what’s happened in the news on the web, and so on. We’re just kind of compressing that to do it for your personal knowledge graph, and then you can sort of add in your own documents and your email and calendar and stuff like that. Memory is going to completely transform these experiences because it’s frustrating to like have a meaningful conversation or go on an interesting exploration around some creative idea and then come back three or four or five sessions later.
And it’s like, “Let’s start again.” It’s completely forgotten what we talked about. So I think that’s going to be a big shift as well. Not only does it lower the barrier to entry to you expressing a creative idea, but those things don’t get forgotten too. You can do this ambiguous cross-reference back to something that you… What was that thing I said, like three weeks ago?
HOFFMAN: Yeah, and how does this relate to this thing we were talking about? Much more of a conversation.
SULEYMAN: Much more of a conversation. Yeah, exactly. It’s like having a second brain, like an extension of your mind, and that’s kind of why the EQ side of it is so important.
The evolution of AI models, scaling, and distillation
HOFFMAN: A hundred percent. So let’s go now down to a little bit more of the tactical with models. Because we have a lot of entrepreneurs and are kind of thinking, like, “Okay, here’s how to be thinking about how this landscape is evolving the next couple of years.” What are the things to watch for?
SULEYMAN: The good news is that models are both getting bigger and smaller at the same time, and that’s almost certainly going to continue. So there’s a sort of new flavor of methods starting to come into vogue in the last year, known as distillation. You have big, very smart, expensive models that cost a lot in inference, teach small models, and they do reinforcement learning from AI feedback. That supervision seems to be pretty good. There’s good evidence of that now. Scale is definitely going to still be a part of the game. I mean, we’ve got room to go. There will be plenty of data. I don’t see any slowdown, at least for the next two to three years, in scale models delivering out-sized performance.
There are also new modalities being put in. Of course, we’re adding video and image and stuff like that, but really what we’re interested in is trajectories of actions across complex digital surfaces. Jumping from a browser to the desktop, then handing off to your phone, then going from different ecosystems, whether in your walled garden or in the open web. We’re trying to understand these trajectories, collect lots of that data, and use supervised fine-tuning. I think that’s going to deliver a lot of impressive results.
Opportunities for entrepreneurs with small AI models
HOFFMAN: The other thing, obviously, is there’s tons of different angles by which data is talked about a lot. And so the classic one is like, “Okay, which data can you run over and what is the quality?”
I think I’ll leave that for the corpus of tons and tons of discussion on the web. But a little bit of what people don’t spend quite enough time thinking about is where new data will be. For example, one of the things I think is interesting about synthetic data, we say, “Oh, actually we had data like this.”
We could train much better small models, big models. How do we get to that data? How do we make sure it’s integrated? What are some of the ways entrepreneurs should be thinking about this?
SULEYMAN: Think about a prompt, not just a question you ask. I think the language got a bit confused. When you ask a chatbot a question, that’s a question. It’s not a prompt. It’s a question. When you write a three-page style guide with a set of examples to imitate, that’s a prompt. And then you subsequently ask questions of a model that has been prompted. So, with that frame in mind, the prompt is kind of your data. It’s your high-quality set of instructions that give your pre-trained model direction to behave in a certain way.
It’s kind of remarkable that the model can take literally just a few pages of instructions and really behave very differently than a model that has been prompted in a very different way; that in itself is kind of crazy.
But if you step one step back further, in order for a model to perform with nuance, precision, subtlety, and really adhere to the brand values of your business or the unique product that you’re trying to create, you have to show tens of thousands of examples of good behavior, and you have to fine-tune those into the model, which is a continuation of the pre-training process with respect to some high-quality data that you know to be accurate. The good news is that tens of thousands of examples are very accessible to many niche domains or specific verticals, right? So, that’s an edge, and I think there’s plenty of room for start-ups in doing high-quality fine-tuning of a pre-trained model. Then you’ll get much more stable adherence to the behavior policy you care about.
HOFFMAN: And how should entrepreneurs think about the use and deployment of small models? Obviously, they’ll be using Microsoft and OpenAI and Google and other things for the frontier models and scale models to help them with that because that’s where the multibillion-dollar models will be.
But how should entrepreneurs be thinking about what kinds of opportunities come about with small models? How can they do something interesting and unique with it?
SULEYMAN: Yeah, I mean, I think it’s so… small is definitely going to be the future because if you think about it, the very large model, when you ask a query of a really frontier model, in a way, it’s lighting up the neural representations of billions of pathways that are not relevant to the query at hand.
The crazy thing is, it does that incredibly efficiently. I mean, searching or referencing hundreds of millions of nodes, if you, like, at each token that is generated is kind of crazy, but it doesn’t need to do that. If you have a tight use case, what I think is going to happen is we’re going to sort of compress knowledge into smaller, cheaper models, which can live on a fridge magnet, right? And you…
HOFFMAN: I haven’t heard you use that metaphor before. Yes.
SULEYMAN: I don’t know. It’s kind of the smallest digital thing I can think of. Well, maybe…
HOFFMAN: Not even sure it’s digital.
SULEYMAN: All right. Yeah, that’s true.
HOFFMAN: Yes. Yes. Yes. Yes.
Considering the ambient sensing future
SULEYMAN: A wearable, on my earring, or in a plant pot with a little sensor. So, those things, like the ambient sensing revolution, are going to come alive, I think. They’ve been long promised, but that’s the kind of compression trajectory. It’ll go to the extreme where you can have really quite functional… obviously the fridge magnet is not going to know a lot about quantum computing, but it’s going to know what it needs to know in order to welcome you in the morning, give you the weather, talk about what may or may not be in the fridge, remind you of your calendar. That I think is going to be maybe a few tens of millions of parameters. We haven’t pushed that yet, and yet it’s totally feasible for any two-person team to explore that.
HOFFMAN: Yeah, exactly. And since this is also an entrepreneurship event, it’s a key thing.
So, I’m going to move to a slightly longer version of our last question, which is…
What’s the question that people should be thinking about for the next two days? To give you a few seconds to think about it, having sprung this upon you, I’ll start, right? For me, that would be… I’ll generalize off of what I was just saying, but what are the things that we need to bring in as technologists to be thinking about how to design a more human future? Frequently, when people think about “more human,” they think about the classic, like, okay, it’s what human beings have been over the last, you know, thousands of years, and that’s an important part of it. But it’s also important to look forward because as we evolve our technology, we evolve our humanity.
We evolve our humanity through mugs, stages, podcasting equipment, all of that is part of what changes who we are as human beings. It isn’t just remembering that we have emotions and passions. Yes, of course, we have compassion.
But how is that expressed through how we change and the stance we are with technology? That’s what I would offer as a big question to think about: that design. And now, having given you a couple of seconds to think about this…
Mustafa Suleyman: Are you all in?
SULEYMAN: I would say: ask yourself, “Are you all in?” Because this really is a transition moment, right? And I really think we’ve got enough evidence now from the last like five decades of big technology transitions. All of the structure of things gets reshaped. I think this is a moment to found companies, scale companies. It’s a moment to really pivot careers. Even if you’re not an entrepreneur, even if you’re an activist or an organizer, if you’re an academic, this is the moment to really pay attention because in 2050 the train will have left the station and it will be quite different. This is a moment where we really do have a chance collectively to shape and influence things. Nothing is predetermined. It’s within our reach to shape it for the very best of humanity. I think we’re very lucky to be alive at this moment. It feels incredibly empowering, and it’s a great responsibility.
HOFFMAN: Completely agree. And now you see why I was so excited to kick off with Mustafa. Let’s thank him.
HOFFMAN: I always love talking with Mustafa Suleyman. I hope you’re as inspired by his call to action as I am. I’m all in – ready to do what I can to help build the best possible future.
Are you all in?