A conversation with OpenAI's CPO Kevin Weil, Anthropic's CPO Mike Krieger, and Sarah Guo

A conversation with Mike Kreiger (CPO of Anthropic) and Kevin Weil (CPO of OpenAI), moderated by Sarah Guo (Conviction).

Published: Published Jun 14, 2025
Uploaded: Uploaded Jun 14, 2026
File type: YouTube
Queried: 00
Source: youtube.com

Full transcript

Showing the full transcript for this video.

AI-generated transcript with timestamped sections.

0:09-1:42

[00:09] Thank you. [00:18] All right, hello everyone. [00:21] Okay, okay. Sarah, you're the queen of AI investing. [00:25] A phrase never ever to be used again, but it's great to be here with both of you. [00:29] So I had two different ideas for our final discussion. The first was a product off because these two men have [00:39] the merge to prod button, both of them, and I was like, "Oh, please just release everything we know is coming over the next six or twelve months." [00:46] Ignore all internal guidelines. The second was we just redesigned Instagram together, since they both actually run Instagram before. [00:55] Both of these got fully shot down, and so instead I think we'll just trade notes among friends, [01:01] Lame, I know, but really excited to hear from you both anyway. So this is actually a relatively new role for both of you. [01:10] Kevin, let's start with you. [01:12] You've done a bunch of really different interesting things. Like what was the reaction you got when you took the job from friends and the team? [01:18] Generally excitement. I mean, it's -- I think it's one of the most interesting and impactful roles. There's so much to figure out. [01:26] Um... [01:27] I've never had such a challenging, interesting, [01:32] sleepless product role in my life. It's got all the challenges of a normal product role, where you're trying to figure out who you're building for,

1:42-3:21

[01:42] and what problems you can solve and things like that, [01:45] But normally when you're building product, [01:47] you're building off of kind of a fixed technology base, right? You know what you have to work with. [01:52] and [01:53] and you're trying to build the best product you can, [01:55] Here it's like, [01:57] every two months computers can do something computers have never been able to do before in the history of the world and you're trying to figure out how that changes your product and [02:06] The answer should probably be a fair amount. [02:09] And so it's so interesting. [02:13] and fascinating to see on the inside as AI gets developed, but I've been having a blast. [02:19] - Mike, what about you? I remember hearing the news, I was like, "Oh, I didn't know you could convince the founder of Instagram to go work on something that existed already." - Yeah, my favorite three reactions, like, people who know me were like, "Oh, that makes sense." [02:30] You're going to like [02:31] have fun there. The middle people were like, [02:34] "Why? You don't have to work. Why are you doing this?" And then if you knew me, you'd know me like, "I can't not." And I think that I couldn't stop myself. And the third was like, "Oh, you could hire the founder of Instagram," which was also fun. And it's like, I mean, not many people could, but this is probably a list of three companies that would have been interesting. [02:51] And so, yeah, there's like a range of reactions depending on how well you knew me and how, like, you've seen me in my, like, [02:57] semi-retired state, which lasted like six weeks. And I was like, all right, what are we doing next? [03:02] Thank you. [03:03] So we had dinner together with a bunch of friends recently, and I was impressed by the childish delight that you had around, like, yeah, I'm learning about all this enterprise stuff. Like, tell me if it's about serving customers that are not all of us with Instagram, or,

3:21-4:56

[03:21] Just working in an organization that's research driven, what's the biggest surprise so far? [03:27] Those are two, I think, both very worthwhile pieces of this role that are very new to me as well. When I was 18, I made this very 18-year-old vow, which was every year of my life I wanted to be different. I don't have the same year twice. And I was like, it's why I didn't... There's been times where I was like, oh, another social product. I'm like, doing that again? First of all, your bar is really distorted. And second of all, it just would feel like too much of the same thing. So yeah, enterprise has been wild. I'm really curious about your experience with that as well. You're like, you know... [03:56] uh... [03:57] you're [03:57] Your feedback loop, I actually imagine it's a lot more like investing, is far longer, right? You have that initial convo and you're like, "I think they like me," and then you're like, "Oh no, it's now in some requisition state and it's going to take six months before they even get to deployment before you know whether it's right." [04:13] getting used to that pace where like, [04:15] "Why hasn't this shipped yet?" And they're like, "Mike, you've been here two months. "This is like, it's making its way through the VPs. "It's gonna get there eventually." So getting used to different timelines for sure. [04:24] But like, [04:25] The part that is fun is actually getting the feedback in the kind of engagement where you're like, [04:29] Once it gets deployed, [04:30] You have somebody that can call you and you can call them and be like, how's it working for you? Like, is this good? Like, whereas users like, [04:35] you're doing data science in aggregate, and sure you can bring in one or two people, but it's like, [04:40] They don't have enough financial incentive riding on telling you where you suck and where you're doing well. [04:45] That's been a different but also rewarding side of that for sure. [04:49] Kevin, you've worked on such a wide range of products before. How much do your instincts apply?

4:56-6:26

[04:56] Yeah, I was going to add on to the enterprise point too, and then I'll get to that. [05:00] The other thing [05:01] interesting thing about enterprises it's not necessarily about the product right there's a buyer [05:05] And they have goals. [05:07] and you could build the best product in the world that all the people at the company might be happy to use, and it still doesn't necessarily matter. [05:14] Exactly. I was in a [05:17] a meeting with one of our big enterprise customers and they were like, [05:21] this is great, we're really happy, da-da-da, you know, the one thing we need [05:26] is-- [05:27] We really need you to tell us 60 days before you launch anything. [05:31] And I was like, [05:33] I also would like to know 60 days. [05:38] So very, very different actually, and it's interesting, right? Because at OpenAI, we have a consumer product, [05:44] and we have an enterprise product, and we have a developer product, so we're kind of doing all at once. [05:49] Um, [05:50] Instinct-wise, [05:53] In, I'd say in like half the job, it works. You know, when you have, [05:58] But when you have a sense of the product you're trying to build, [06:01] you know, we're getting towards, you know, the end of shipping advanced speech mode or something, or you're getting towards shipping Canvas. [06:10] and you're making final touches, [06:12] trying to understand who you're building for and like what, you know, exactly what problems you're trying to solve. [06:17] it works then because that's a little bit more like the tail end of it is shipping a normal product. [06:22] But the beginning of these things is nothing like that.

6:26-8:01

[06:26] Hey. [06:28] So, like, [06:31] there will just be these capabilities that we don't know [06:36] it. [06:36] you have some sense as you're training some new model that it might have capability X. [06:43] You don't really know. [06:44] nor does the research team, nor does anybody, right? You're like, "I think this might be possible." [06:49] And it's kind of like, [06:50] coming through the mist at you, [06:52] but it's this emergent property of a model and, you know, so you don't know whether it's going to really work. [06:57] And you don't know whether it's going to be like, [06:59] 60% good. [07:01] or 90% good or 99% good. [07:04] and [07:05] the product that you would build that would make sense with something that works 60 percent of the time is super different than 90 or 99 percent of the time right so [07:14] You're kind of just waiting, and you're, you know, [07:16] at least, I don't know if you feel this, checking in with the research team from time to time, like, "Hey guys, how's it going? "How's that model training? "Any insight on this?" And they're like, [07:27] It's research, we're working on it, you know, it's -- we don't know either. We're working through this at the same time. [07:33] And it's-- [07:34] I mean, it makes it super fun because you're kind of like discovering things together, [07:39] Very sort of stochastic too. It's the thing it most reminds me of from the Instagram days, where like Apple, like WWC announcements, you're like, [07:47] this could either be awesome for us or could like absolutely like cause chaos for it. It's like that, but your own company is the one kind of disrupting you from within, which is like a very like it's very cool, but also like, oh, this might totally up in my product roadmap now.

8:01-9:30

[08:01] Yeah. [08:02] What does that cycle look like for both of you? You described it as [08:07] you know, like peering through the mist, trying to look at the next set of capabilities. I mean, can you plan if you don't know exactly what is coming, and what is the iteration cycle to discover new things that should belong in your product? [08:19] I think like on the intelligence side, you can sort of squint and see like, all right, it's advancing this way. And so the kinds of things that you'll want to do with the model, [08:27] and start building the product around that. There's three ways, right? Intelligence feels... [08:32] not predictable, but at least on like a slope that you can kind of watch. [08:36] There's the capabilities you decide to invest in from the product side and then do fine tuning with the actual research teams, something like artifacts, [08:43] We spent a lot of time between research. I think the same was true with Canvas, right? Like you're doing a like... [08:47] co-design, co-research, co-fine-tune, and that's like, I think, a real privilege of getting to work at this company and getting to do design there. And then there's the capability front, so maybe speech mode for OpenAI. For us, it's the computer use. [08:58] work that we released this week, you're like, [09:00] All right, 60%, all right, yes, all right. And like, so what we try to do is embed designers early in the process, but knowing that like, [09:09] you're not placing a bet, like the experimentation talk was saying, like, your output for experiments should be learning, not necessarily like perfect products you're going to ship every time. I think the same is true when you're partnering with research, like, your outcome is hopefully demos or informative things that, like, could spark product ideas, not like a predictable product process where you're like, [09:26] Well, it's this de-risked by now, which means it's going to look that way when research comes along.

9:31-11:06

[09:31] I've also, one thing that I really enjoy, because research is, at least parts of research are very product oriented, especially on the post training side, like Mike was saying. [09:40] and then parts of it are really like academic research at some level. [09:44] And so you will just also occasionally hear about some capability. [09:49] And we'll be in a meeting and you'll be like, oh, I really wish we could do this thing. [09:53] And a researcher on the team would be like, [09:55] "Oh no, we can do that, we've had that for three months." And we're like, "Really? "What does that mean?" Like, "Okay, where do I learn more?" And they're like, "Oh, well, we didn't think, "we didn't know it was important, so, you know, [10:06] I'm working on this other thing now. [10:08] But you do just get like magic happening sometimes too. [10:14] One thing we think a lot about when we're investing is actually like, [10:19] can you do anything with a model if it is 60% successful at a task instead of 99%? And unlike lots of tasks, it's closer to 60. [10:27] Right? But the task is really important and valuable. [10:29] Like, how do you think about that internally in terms of evaluating [10:34] progression on a task and then what types of things you like put in [10:39] sort of the burden of product. [10:41] to make it graceful failure or to cross the miles of the user versus [10:46] you know, we just need to wait for the models to get better. [10:49] I'd argue there are a lot of things that you can actually do when something is 60% right. [10:54] you just need to really design for it. [10:55] You have to expect that there's a human in the loop a lot more than there would be otherwise. [11:00] If you look at, take like GitHub Copilot, right? That was kind of the first AI product that really

11:07-12:44

[11:07] open people's eyes to like this thing can be useful not just as you know, Q&A, but for really economically valuable work. [11:14] And that launched, I don't know exactly which model that was built off of, but I mean, it was multiple generations ago. [11:20] So I guarantee you that model wasn't perfect at anything related to coding. - I think it was GPT-2, which is like pretty small, so. - Yeah, I mean, and so, but the fact that it was still valuable for you, 'cause if it got the code, [11:33] some significant fraction of the way there, that was still stuff you didn't have to type yourself and you could edit it. [11:38] And so there are experiences like that that I think totally work. I think we'll see the same kinds of things happening with, [11:43] with [11:45] sort of the shift towards agents and longer form tasks. [11:50] where it may not be perfect, [11:52] But... [11:53] If it can save you five or 10 minutes, that's still valuable, and even more, if the model [11:58] where it doesn't have confidence and can come back to you and say, I'm not sure about this. Can you actually help me with this? [12:04] then the combination of human and model together can be much higher than 60%. [12:09] I also find that 60% [12:11] this magic 60% number, it's kind of lumpy. - I made it up five minutes ago. - That was the takeaway, 60%. - 60%, that is our new, that's the Mendoza line of AI. [12:20] I think it's [12:21] often very lumpy, where it'll do very well on some tasks and not well on others. And I think that also helps [12:26] when we run pilot programs with customers, it's really interesting when we'll get the same day feedback from two different companies, we'll be like, [12:33] it solved our whole problem like we've been trying to do this for three months thank you another one be like it was way off it's like worse than the other model and so like uh it's also humbling to know that you have your own internal evals but like

12:44-14:27

[12:44] the rubber hitting the road and actually seeing the model out in the world is where it's kind of the equivalent of like you do all this design and then like you put it in front of one user and you're like, oh, wow, I was wrong. The model has that feeling as well where you're like, [12:55] we try as hard as we can to have a good sense, but then [12:59] People have their own custom data sets, they have their own internal use, they've prompted it a certain way. And like, so that, the lies, that sort of almost like bimodal nature of when you actually put it out in the world. [13:08] I'm curious if you feel this. [13:11] I think there's a very real sense in which models today are not [13:15] intelligence limited, they're eval limited, they can actually do much more and be much more correct on a wider range of things than they are today. [13:23] and it's really about sort of teaching them. They have the intelligence, you need to teach them certain specific topics that [13:29] maybe weren't in their original training set, but they can do it if you do it right. - Yeah, we've seen that all the time where like, [13:35] there was a lot of like, [13:36] exciting AI deployments that happened in like, you know, maybe three years ago. [13:40] And now they're like, we think the new models are better, but we never did evals, because all we were doing was just shipping cool AI features three years ago. And like the hardest hump to get people over is like, [13:49] Let's step back and like, what does success actually look like for you? Like, what problem are you solving? Like often the PM has rotated. So it's like somebody's inherited it and then be like, all right, [13:57] What does that look like? Let's write some evaluations. What we've learned is like Claude is actually good at writing evaluations and also grading them. So like we can automate a lot of this for you, but you have to tell us what success looks like. [14:07] And then... [14:08] let's go and actually iteratively improve our way over there. [14:11] That is often like the difference between like 60% of the task and like 85% of the task. If you come interview Adanthropic, which maybe you should at some point, maybe you're happy in your role, maybe not, you'll see one of the things we do in our interview process is actually like make you get a prompt from like,

14:27-16:02

[14:27] crappy eval to good and like just we want to see what you think but like [14:30] Not enough of that talent exists. [14:32] So we're trying to get that like, [14:34] If there's one thing we can teach people, that's probably the most important thing. Yeah, writing evals, I mean, I actually think it's going to become a core skill for PMs. [14:42] We actually had this, and maybe this is like a little inside baseball, but I thought this was interesting. [14:46] Internally we had our research PMs who like work a lot on model capabilities and model development and then we had our like more like [14:52] product surface PMs or API PMs, [14:55] And we ended up realizing that the job of a PM in 2024, 2025, building AI-powered features, [15:01] is looking more and more like the former than the latter in a lot of cases. [15:05] We launched like our [15:06] code analysis and like, basically, the cloud can analyze CSVs and write code for you now. And the PM there was like getting it 80% of the way there and then having to hand it over to the PM that could write the evals and then go to like, [15:17] fine tune and prompt, and I was like, that's actually the same role. Like the quality of your feature is now gated on how well [15:23] you have done the evals and the prompts and so like [15:25] That PM definition is definitely just merging now. Yeah, absolutely. We set up a boot camp and took every PM through [15:35] writing evals and like [15:36] what it was like, difference between good and bad evals, and, you know, we're definitely not done there. We've got to keep [15:42] iterating and getting better on it, but it is such [15:44] a critical part of making a good product with AI. [15:48] As part of this recruiting call for any of the people who want to be good at building AI product or research product in the future, [15:56] We can't come to your boot camp, Kevin, so how do we develop some intuition for

16:02-17:33

[16:02] getting good at this. [16:03] eval and iteration, Lou. [16:05] I actually think it's something you can use the models themselves for, like you were talking about. You can ask the models at this point, "What makes a good eval? Give me -- I want to do this. Can you write me a sample eval?" And it will [16:15] It will be pretty good. [16:16] Yeah. [16:17] I think that goes a long way. I think there's also this question of like, [16:22] It's, [16:23] And if you listen to everybody from Andrej Karpathy to others who have spent a lot of time in the field, nothing beats looking at data. And so people often get caught up, like, "Well, we already have these evaluations and the new model is 80% there rather than 78%. We can't shit." Or, "It's worse." [16:38] And I was like, have you looked at the cases where it fails? And you're like, oh, actually, this was better. It's just our grader is not as good, you know? Or... [16:44] It's funny, like, [16:46] like a little inside baseball, you know, like every model release has the model card. And some of these model, these evals we've seen, like, [16:52] even the golden answer, I'm like, I'm not sure a human would say it. Or like, I think that math is actually a little wrong. Like getting 100% is going to be really hard because even just grading them is very challenging. So like I'd encourage you to like the way you build intuition is go look at the actual answers, even a sample of them and be like, [17:07] All right. [17:08] Yeah, maybe we should evolve the evals or maybe like [17:10] the vibes are good, even if the eval is like tight. So like getting real and getting like deep on the data, I think matters. [17:17] I also think it'll be really interesting to see how this evolves as we go towards longer form, more agentic tasks, [17:24] It's one thing when your evals are like, [17:26] I gave you this math thing and you were able to like add four digit numbers and get to the right answer. You know, it's easy to know what good looks like there.

17:34-19:04

[17:34] As the models start to do more long form, more ambiguous things, [17:40] go get me a hotel in New York City. [17:42] you know, what's -- [17:43] what's right there. A lot of it will be about personalization, [17:47] You know, if you ask any two humans who are perfectly competent, [17:51] they're going to do two different things. [17:53] So your grading becomes much softer, and it'll just be interesting. I think we'll have to evolve yet again. Speaking of having to reinvent. [18:01] stuff over and over again. [18:03] I think a lot like when you think about and I think both labs have some concept of like this is what capabilities look like as things evolve. Like it looks a little bit like a career ladder, like what bigger and longer horizon tasks are you taking? And maybe like evals start looking more like, [18:16] Performance review, I'm in performance review season, so this is the metaphor that's in my head, sorry. But it's like, did the model meet your expectation of what a competent human would have done? Did it exceed it, 'cause it did it twice as fast? [18:27] discovered some restaurant you wouldn't have known, did it greatly exceed, meets most, it starts being like, [18:31] more nuanced than just like right or wrong. [18:34] let alone you have humans writing these evals, and the models are getting to the point where they can often beat humans at certain tasks. Like, people prefer the model's answers to a human's answers, and so if you're humans writing your evals, like, [18:46] Yeah. You know, so what does that mean? [18:48] Go ahead. [18:49] Oh, what? Okay, evals are clearly the key. We're gonna go spend a bunch of time with these models [18:55] teaching ourselves to write evals. What other skills should product people be learning now? You're both on that learning path. [19:03] I think

19:04-20:36

[19:04] Prototyping with these models is a thing that is underused. Like our best PMs do this, where we'll get in some long conversation about like, should the UI be this or that? [19:12] And before our designers have even picked up their Figma, [19:16] like [19:16] Often our PMs, or sometimes our engineers, will be like, [19:20] Great, I prompted Claude, I did an A/B comparison of what these two UIs could look like, let's try them. And I'm like, oh, this is really cool. [19:27] Play that out and we'll be able to prototype a far greater variety and evaluate [19:32] like on a much faster scale than before. [19:35] skill of like, [19:36] using these tools to actually be in prototyping mode, I think is a really, really useful one. [19:42] That's a good one. I would also -- you sort of said this, but [19:46] I think it's also going to push PMs to go deeper into the tech stack. [19:51] Because it's, and maybe that changes over the years. Like if you were doing like, [19:56] database tech. [19:57] in, I don't know, 2005. [19:59] Maybe it required you to be able to go really deep in a different way than it would if you were doing database tech now. Like layers of abstraction get built and you maybe don't need to know all the fundamentals. [20:09] But-- [20:09] It's not like every PM needs to be a researcher by any means, but I think having an appreciation for it, spending time and... [20:16] Learning the language and gaining intuition for how this stuff works a little bit, I think will go a long way. [20:22] I think the other piece, like you're dealing with this stochastic, non-deterministic system, which like, [20:26] emails are our best attempts to do it, but like [20:29] Product design in a world where like, [20:31] your [20:32] not in control of what the model is gonna say. You can try. And so like,

20:36-22:09

[20:36] What are the feedback mechanisms that you need to close that loop? Like how do you decide when like the model's gone astray? How do you collect that feedback in a [20:43] in a rapid way, you know, like what are the guardrails you want to put in, like, [20:46] How do you even know what it's doing in aggregate? Like it's a much more like your understanding, like, [20:52] the output of this intelligence across a lot of outputs over a lot of people every single day. It just requires a very different set than like, [20:59] oh, the bug report is you clicked on the button and didn't follow the user. It's like that's a pretty knowable kind of problem. Right. And maybe this will change, you know, five years from now when people are used to it. But I think we're all still in the mode of adapting to this sort of non-deterministic [21:14] user interface ourselves. [21:15] and certainly people who are not, you know, tech people here in this room working on tech products. [21:21] who are using AI are definitely not used to it. Like, it goes against all of the intuition that we've built up for the last, like, 25 years of using computers. [21:29] And so, like, the idea that you're gonna put in the exact same things [21:34] Normally, if you put in the exact same inputs, computers give you the exact same outputs, and that is no longer true. [21:40] And it, [21:41] It's not just that we have to adapt to it building products. We have to also put ourselves in the shoes of the people who are using our products. [21:48] and think about what this means for them. [21:50] And there's like, I mean, there are downsides to it. There are also really cool upsides. And so it's fun to kind of think about how [21:56] and how you can use that to your advantage in different ways. I remember we did a lot of rolling user research at Instagram. So we had the same-- or the researchers would bring in different people every single week. Whatever was prototype ready would get put through it.

22:09-23:44

[22:09] And we do the same thing in Anthropic, [22:11] What's interesting is like for those sessions, what would often surprise me is like, [22:15] how users were using Instagram. There's something interesting about their use case or their reaction to a new feature. And now it's like half that and half what the model did in that situation. You're like, "Oh, it did the right thing. This is great." So there's this very, almost like a sense of pride, maybe, of when it reacts well and you're in a user research environment. And then also the frustration where you're like, [22:34] "Oh no, you misunderstood the intent and now you're like, [22:37] 10 pages down into this answer. And so like, it's also like maybe a little getting zen about like letting go of control and you know, what's going to happen in those environments. [22:46] Thank you. [22:47] You have both worked on these consumer experiences that taught new behaviors to, you know, many hundreds of millions of people [22:54] quickly. [22:56] these AI products are happening actually faster than that, right? [23:00] If PMs and technical people don't have that much intuition, [23:05] naturally for how to use them how do you think about educating end users at the scale you're both working with [23:10] on something that is so unintuitive. [23:13] I mean, it... [23:14] It is kind of amazing how fast we all adapt. [23:18] I was talking to somebody the other day, and they were telling me about their first Waymo ride. [23:23] Who's ridden in a Waymo? [23:25] We wrote one here. [23:26] Yeah, if you haven't ridden in a Waymo, you're in San Francisco, ride a Waymo to wherever you're going when you leave here. [23:33] it's a magical experience. [23:35] But [23:35] They were like, [23:36] My first 30 seconds, I was like, oh my God, watch out for that bicyclist. And then five minutes in, it was like...

23:44-25:15

[23:44] "Oh my God, I'm living in the future." [23:47] Thank you. [23:47] And then 10 minutes later, it was like, bored scrolling on your phone. Like, you know, how quickly we become used to something that is just absolute magic. Yeah. And I think there, I mean, ChatGPT is less than two years old. [24:01] Mm-hm. [24:02] It was [24:03] absolutely mind-blowing when it first came. And now I think if we had to go back and use the original [24:09] whatever it was, GPT 3.5, I think. The horror, yeah. Yeah, like, everybody would be like, ugh. [24:16] So dumb. How could I possibly, you know? [24:18] And you know, the stuff that's-- [24:20] happening today that we're working on, that you guys are working on, it all feels like magic. [24:25] 12 months from now, we're going to be like, can you believe we use that garbage? Because that's how fast this thing is moving. [24:33] But it's also amazing to me how quickly people adapt, because [24:36] I mean, as much as we try and bring people along, there are also [24:39] Um, [24:40] There's a lot of excitement. People understand that the world is moving in this direction, and [24:46] We've got to try and make it [24:48] the best possible move that we can, but it's happening and it's happening fast. I think we're trying to get better at and that is also letting the product be like [24:56] educational in a very literal way, which is like, the thing we did not do early and now we're changing is just tell Claude more about itself, which was like, you know, it's in its training set that it's, you know, artificial intelligence created by anthropic, whatever. But now we're literally like, and here's how you use this feature we should because people would ask. And again, this came from user research, because

25:15-26:51

[25:15] We'd be like, [25:16] they'd be like, how do I use this thing? And then Claude would be like, I don't know, have you tried looking at it on the internet? You're like, no, that's not helpful. And so like... [25:24] Like, we're really trying to ground it, and then at launch time, we're like, you know, it's a process we're improving, but it's cool to now see, like, this is the exact link to the documentation, like, here's how you do it, like, [25:34] I can help you stop by step. Oh, you're stuck. I can help you here. So, [25:37] These things are actually very good at solving UI problems and user confusion, and we should use them more for that. [25:44] It's got to be different when you are trying to do change management in enterprise, though, right? Because there's a status quo for how are you doing things. There's organizational process. How do you think about educating entire organizations about productivity improvements or whatever else can come? I think the enterprise one is really interesting because even... [26:03] like these products have millions and millions of users, [26:07] The power users are very much, I think, still early adopters, people who like technology, and then there's a long tail. [26:13] Whereas when you go to enterprise, you're deploying to an organization that is like, often there's folks who are very non-technical. And I think that's really cool actually seeing [26:22] fairly non-technical users get exposed to, like, [26:25] Chat. [26:26] powered LLM for the first time and then getting to see it, then you have the luxury of getting to run [26:31] a session where you teach them about it and like have educational materials. And so I think we need to learn from what happens in those [26:38] and then say, like, that's what we need to do to teach the next 100 million people how to use these. [26:41] these UIs. [26:43] And they're usually power users internally, and they're excited to teach the rest of people, and, you know, like with OpenAI,

26:51-28:33

[26:51] We have these custom GPTs that you can make, and organizations make thousands of them often, and it's a way for the power users to make something. [26:58] that [26:59] makes, [27:00] AI easier and immediately valuable for the people that might not know how to use it otherwise. [27:05] So, like, that's one cool thing. You find the pockets of power users, and they actually will sort of be evangelists. [27:12] I have to ask you then, because your organizations are both like all power users. [27:17] So you're living in your little pocket of the future. I'll ask about one thing, but feel free to redirect. [27:24] How am I supposed to use computer use? [27:26] This is amazing. What are you guys doing? Yeah, well internally, we're... I mean this is Kevin's earlier comment around like, [27:33] When is it going to be ready? All right, like go. It's like it was [27:36] Pretty late breaking. We had conviction that it was like, this is good. We don't want to put this down. It's early still and it's still going to make mistakes, but how do we do this as well? The funniest use case, when we were beta testing it, was somebody was like, I wonder if I can get it to order us a pizza. And it did. And they were like, great. The moment where Domino's shows up at your office and it was ordered entirely by AI, it was a very cool seminal moment. And then we're like, oh, but it's Domino's. But it was definitely amazing. But it was AI. [28:06] It was good. And also, like, ordered quite a bit of pizza, so it was, like, maybe hungrier than intended. Some early things that we're seeing that we think are really interesting, one is UI testing, [28:15] I was, like, on Instagram, we had basically no UI tests, because they're hard to write, they're, like, brittle, [28:20] and they're often a little bit like, oh, we moved this button around, and it should still pass. That was the point of the PR, but now it's going to fail. We're going to have to do this whole other snapshot. And early signs are like, computer use works really well for, hey, does it work?

28:33-30:06

[28:33] as intended, as to do the thing that you want it to do. And I think that's been very, very interesting. And then what we're starting to get into, too, is what are the agentic things that just, like, [28:41] involve a lot of data manipulation. So we're looking at it with our support teams and our finance teams around like, [28:47] Those PR forms aren't going to fill themselves, but like, [28:49] It's very repetitive. You often have data in one silo. You want to put it in a different silo, and it just requires, like, [28:54] human time. Like, I keep using the word drudgery when I talk about computer use. [28:58] Can we automate the drudgery so you can focus on the creative stuff and not like [29:02] the, you know, [29:03] 30 clicks to do one single thing. [29:06] Kevin, I think we have a lot of teams that are experimenting with 01. You can obviously do much more sophisticated things. You also can't use it as a one-for-one replacement if you're already using one of the, you know, GPT-40 models or whatever in your application, like, [29:23] Can you give us some guidance and what are you guys doing with it internally? [29:27] So I think [29:28] One thing that [29:29] the [29:30] people maybe don't realize that actually a lot of the most sophisticated customers of ours are doing, and that we're certainly doing internally, is it's not really about [29:38] one model for any particular thing, you end up putting together sort of workflows and orchestration between models. [29:45] And so you use them for what they're good at. O1's really good at reasoning, but it also takes a little bit of time to think, and it's not multimodal, and has other limitations. Can you define reasoning for the group? I realize it's a basic question, but... [29:56] Yeah, so people are, I think, pretty used to the concept of the scaling pre-training concept. You go GPT-2, 3, 4, 5, whatever.

30:07-31:42

[30:07] And you're doing bigger and bigger runs on pre-training. These models are getting smarter and smarter, or rather, maybe they know more and more. [30:15] But [30:16] they're kind of like system one thinking right it's it's you ask it a question you [30:21] immediately get an answer. [30:22] text completion, [30:24] - It's like me asking you questions right now, and you just have to stream the answer, one token at a time, keep going, don't think. - It's amazing actually how much human, like your intuition about how other humans work, [30:36] will often help you in intuiting about how these models work. [30:40] You know, you asked me a question, I got off onto the wrong, like, [30:44] sentence is hard to recover, the models totally do the same thing. [30:47] So, [30:48] But... [30:49] So you've got that sort of larger and larger pre-training. [30:53] 01-11. [30:54] is actually a different way of scaling intelligence [30:59] by doing it at query time, basically. [31:02] So instead of system one thinking, I ask you a question and immediately tries to give you an answer, [31:07] It'll pause. Same thing I would, you know, you would do if I asked you a question and I said, "Solve this Sudoku." [31:12] do this New York Times connections puzzle. [31:15] You know, you would start going, "Okay, these words, how do they group together? Okay, these might be these four. Well, no, I'm not sure. [31:21] Could be, you know, you're like forming hypotheses. [31:24] using what you know to refute these hypotheses or affirm them, [31:28] And then from that, continuing to reason on [31:32] It's how scientific breakthroughs are made. It's how we answer hard questions. [31:36] And so this is about teaching the models to do it and right now, you know, they'll think

31:42-33:13

[31:42] 30 or 60 seconds before they answer. [31:44] Imagine what happens if they can think for five hours or five days. [31:48] So it's basically a new way to scale intelligence, and we feel like we're just at the very beginning. You know, we're at the, like, GPT-1 phase of [31:58] of this new form of reasoning. [32:01] But in the same way, you don't use it for everything, right? There are some times when you ask me a question, you don't want me to wait 60 seconds, I should just give you an answer. [32:08] So, [32:10] we end up using our models in a bunch of different ways together. So for example, like cybersecurity, you would think, [32:17] not really a use case for models. They can hallucinate. That seems like a bad place to [32:23] but [32:23] you can [32:25] A, like fine-tune a model to be good at certain tasks, and then you can fine-tune models [32:30] to be very precise about the kinds of inputs and outputs that they expect, [32:33] and have these models start working in concert together, [32:37] and models that are checking the outputs of other models, realizing when something doesn't make sense, asking it to try again. [32:44] um, [32:45] and uh so like that ends up being how we get a ton of value out of our own models internally it's like [32:53] specific [32:55] use cases and [32:57] orchestrations of models together designed sort of working in concert to do specific tasks. [33:03] Which again, going back to like reasoning about how we work as humans, [33:06] How do we do complex things as humans? You have different people who often have different skill sets, and they work together to accomplish a hard task.

33:14-34:48

[33:14] I can't let you guys get away without telling us something about the future and what's coming. [33:22] You don't have to give us release dates. I understand you don't know, but... [33:26] If you look out, I think the furthest anyone can look out in AI right now is like, well, tell me if you can see the future, but like, let's say like six months, 12 months, like, what's an experience that you imagine is going to be possible or prevalent? [33:39] I think a lot about like, [33:41] to breaking the-- well, [33:43] I think a lot about this all the time, but maybe two words to be like, [33:48] plant seeds in everybody's mind. One is proactivity. How do the models become more proactive? Once they know about you and they're monitoring... [33:55] they're reading your email in a good, not creepy way, and they're like, because you authorized them to, and then they spot an interesting trend, or you start your day with something that's a, like, [34:05] like a proactive like [34:07] recap of what's going on, some conversations you're going to have. I pre-did some research for you. Hey, your next meeting is coming up. [34:13] Here's what you might want to talk about. I saw you have this like [34:15] presentation coming up. Here's the first draft that I put together. Like that kind of proactivity, I think is going to be really, really powerful. [34:21] And then the other part is being more asynchronous. [34:24] I think 01 is like early UI in this exploration, which is like, [34:28] It's going to do a lot and it's going to tell you kind of what it's going to do along the way. And like you can sit there and wait for it, but you could also like be like, it's going to think for a while. I'm going to go like do something else, maybe tab back. Maybe it like can tell me when it's done. Like expanding the time horizon, both in terms of like you didn't ask a question. It just told you something. I think that's going to be interesting. And then you did ask a question and it can be like,

34:48-36:18

[34:48] Great. [34:49] I'm gonna go reason about it, I'm gonna go research it, I might have to ask another human about it, and then I'm gonna, like, [34:54] Maybe come up with my first answer. I'm going to vet that answer. [34:57] you'll hear back from me in like an hour. Like breaking free of those like constraints of like expecting an answer immediately, I think... [35:04] will let you do things like, [35:05] "Hey, I have this whole mini project plan. Go flesh it out." [35:09] not just like i want you to like change this one thing on the screen but like [35:12] Fix this bug for me. [35:14] Take my PRD and adapt it for these new market conditions, like adapt it for these three different marketing conditions that emerge. [35:20] Being able to push those dimensions, I think, is what I'm personally most excited about on the product side. [35:25] Yeah, I completely agree with all of that. And it's the models are going to get smarter at [35:32] an accelerating rate, I think, which is also part of how all of that comes to pass. [35:37] Another thing that will be really exciting is seeing the models able to interact in all the same ways that we as humans interact. Right now, you mostly type to these things, and, you know, [35:47] I mostly type to a lot of my friends on WhatsApp and other things. [35:51] I also speak. [35:52] I also can see [35:54] and we just launched this advanced voice mode relatively recently. [35:58] I was in [36:01] Korea and Japan, [36:03] having conversations with [36:04] And I would just, I would often be with somebody with whom I had no common language whatsoever, [36:11] Before this, we could not have said a word to each other, and [36:15] Instead, I was like, hey, ChatGPT, I want you to act as a translator.

36:18-37:49

[36:18] When I say something in English, I want you to say it in Korean. [36:21] And when you hear something in Korean, say it back to me in English. [36:24] And all of a sudden I had this universal translator and I was having business conversations. [36:29] another person and it was magical. [36:33] And you think what that can do, like not just in a business context, but think about people's willingness to travel to new places if you don't ever have to be worried about not speaking the language. [36:41] and you've got this Star Trek Universal translator in your pocket. [36:45] You know? [36:46] Experiences like that, I think it's going to become commonplace fast. [36:50] but it's magical and I'm excited about that. [36:53] in combination with all the stuff Mike was just saying. [36:56] - Oh, one of my favorite pastimes now, just since a voice mode release, is actually watching... There's a genre of TikTok of... This just speaks to how old I am. There's a genre of TikTok where it's just young people talking. [37:12] to voice mode, like pouring their heart out, using it in all these ways where I'm like, oh my God, like there's this old term being like digitally native or mobile native, and I'm like, I like pretty strongly believe in this AI thing, and I would not think to interact [37:27] in this way. [37:28] But people who are 14 years old are like, well, I expect AI to be able to do that. And I love that. Have you ever given it to your kids? [37:35] I haven't yet. [37:37] My kids are like five and seven, knows them, so. But we'll get there. I mean, mine are eight and ten, but like on a car ride, they'll be like, "Can I talk to chat GPT?" [37:45] And they will ask it the most bizarre things. They will just have

37:49-39:41

[37:49] weirdo conversations with it, but they're perfectly happy talking to an AI. [37:54] - Yeah, actually one of my favorite experiences, and maybe we'll close and ask you for like, the most surprising behavior, kids or not, is, [38:01] Um. [38:02] Like, when my parents read to me, like, I was lucky if I got to choose the book, and it wasn't my dad being like, we're gonna read this physics study I'm interested in, right? My kids, I don't know if it's just like parenting in the Bay Area, but my kids are like, [38:17] "Okay, Mom, make the images." [38:19] I want to tell a story about the dragon unicorn in this setting. I'm going to tell you exactly how it's going to happen, create it in real time." [38:28] That's a big ask, I'm glad you believe and know that's possible, but it's a wild way to create your own entertainment. [38:35] What is the most surprising behavior you've seen in your own products recently? [38:42] I think... [38:43] It's a behavior and a relationship. [38:46] people really start [38:48] understanding the nuance of what Claude is, or just like a new revenge to the model, and it's like, they get the nuance. I guess the behavior is almost befriending, or really developing a lot of two-way empathy around what's happening, and then the model's like, "Oh, you know, the new model, [39:05] It felt like it was smarter, but maybe a little more distant. But maybe, you know, and it's like, it's like that kind of like nuance, which like you like, it's given me as a product person a lot more empathy around like, you're not just shipping a product. [39:18] you're shipping like intelligence and intelligence and empathy are like what makes like interpersonal relationships important. And if somebody showed up and they're like, I was upgraded, like I say, you know, I scored two percent higher on this math score. But like I'm different in this way. You'd be like, oh, I got to adapt now and maybe, you know, be a little worried about it. So like that that's been an interesting journey for me, like understanding the mentality for people when they're using our products.

39:41-40:51

[39:41] Yeah, model behavior is absolutely a product role. [39:45] Like the personality of the model is [39:48] And there are interesting questions around how much should it customize, [39:51] versus how much should OpenAI have one personality and Claude has some distinct personality. [39:57] And are people gonna use one versus the other because they happen to like it? I mean, that's a very human thing, right? We're friends with different people because we happen to like different people better than others. [40:07] That's an interesting thing to think about. [40:10] We did something recently, [40:13] And [40:14] it sort of went viral on Twitter, people started asking the model [40:18] based on everything you know about me, based on all of our past interactions, you know, what would you say about me? [40:24] And the model will respond and it will give it a description of what it kind of thinks, based on all of your past interactions. [40:33] And it is this sort of, you're starting to interact with it almost like [40:38] some sort of person or entity in interesting ways. [40:42] Anyways, it was fascinating to see people's reaction to that. [40:47] Kevin, Mike, thank you so much for doing this and giving us a glimpse into the future. [40:51] Thank you so much.

Want to learn more?