AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)

Latent Space: The AI Engineer Podcast

0:00

-54:52

AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)

Note: This episode was recorded just after AIE Europe, but before the Cursor-xAI deal.

Apr 23, 2026

Today, we check in a year after the first Unsupervised Learning x Latent Space Crossover special to discuss everything that has changed (there is a lot) in the world of AI. This episode was recorded just after AIE Europe, but before the Cursor-xAI deal.

Unsupervised Learning is a podcast that interviews the sharpest minds in AI about what’s real today, what will be real in the future and what it means for businesses and the world - helping builders, researchers and founders deconstruct and understand the biggest breakthroughs.

Thanks to Jacob and the UL production team for hosting and editing this!

Jacob Effron

LinkedIn: https://www.linkedin.com/in/jacobeffron/
X: https://x.com/jacobeffron

Full Episode on Their YouTube

We discuss:

swyx’s view from the center of the AI engineering zeitgeist: OpenClaw, harness engineering, context engineering, evals, observability, GPUs, multimodality, and why conference tracks now reveal what matters most in AI
Whether AI infrastructure has finally stabilized: why “skills” may be the minimal viable packaging format for agents, why infra companies have had to reinvent themselves every year, and why application companies have had an easier time surviving model volatility
The vertical vs. horizontal AI startup debate: why application companies can act as the outsourced AI team for enterprises, why some horizontal companies still matter, and why sandboxes may be the clearest reinvention of classic cloud infrastructure for the AI era
The “agent lab” playbook: starting with frontier models, specializing for your domain, then training your own models once you have enough data, workload, and user behavior to justify the cost and latency savings
Why domain-specific model training is real, not just marketing: how companies like Cursor and Cognition can get users to choose their in-house models, and why search, domain specialization, and distillation are becoming more important
Open models, custom chips, and alternative inference infrastructure: why swyx has turned more bullish on open source, why non-NVIDIA hardware is suddenly getting real attention, and why every 10x speedup can unlock new product experiences
What it means to sell to agents instead of humans: why agent experience may mostly just be good developer experience by another name, why APIs and docs matter more than ever, and how pretraining-data incumbents are compounding advantages in an agent-first world
Why memory and personalization may become the next big wedge: today’s models mostly reward frequency of mentions, but in the future, swyx expects product choice to be shaped much more by personalized memory systems
The state of the AI coding wars: why coding has become one of the largest and fastest-growing categories in AI, how Anthropic, OpenAI, Cursor, and Cognition have all ridden the wave, and why the category may still have more room to run
Capability exploration vs. efficiency: why the industry is still in a token-maxing, experiment-heavy phase where people are rewarded for spending more rather than less
Claude Code vs. Codex and the strange stickiness of coding products: why first magical product experiences may matter more than expected, and why the bigger mystery may be why only a few names have emerged as real winners so far
What the end state of the coding market might look like: two major players, a longer tail of niche products, and possible disruption if Microsoft, Mistral, xAI, or the Chinese labs push harder into coding
Where application companies still have room against the labs: why frontier labs are trying to expand into verticals like finance and healthcare, but still leave space for focused companies that own the workflow and the last mile
Why coding may be a preview of every other AI market: the first category to truly go parabolic, the clearest example of foundation model companies colliding with application companies, and a template for how future vertical AI markets may develop
Why AI valuations now feel unbounded: from billion-dollar ARR products built in a year to trillion-dollar market caps, swyx and Jacob unpack how the AI market has broken traditional startup intuitions about scale and durability
Consumer AI vs. coding AI: why ChatGPT’s consumer category may have plateaued on frequency and product design, while coding continues to feel like a daily-use category with real momentum
The next product frontier beyond coding: consumer agents, computer use, and “coding agents breaking containment,” with swyx’s thesis that 2025 was the year of coding agents and 2026 may be the year they begin to do everything else
Whether foundation models are really killing startup categories: why swyx is less worried for early founders, more worried for mid-size startups and traditional SaaS, and why building something ambitious may now be the best job interview for a frontier lab
AI vs. SaaS and the internal culture war around adoption: the tension between AI-native employees who want to rip out expensive software and skeptics who think quick AI-built replacements create fragile systems
Why traditional SaaS may be under real pressure: swyx’s own experience spending six figures on event and sponsor management software, the temptation to rebuild it cheaply with AI, and the broader question of whether teams will trust custom AI-native replacements
Biosafety, security, and frontier model access: why swyx raised biosafety at a dinner with Anthropic’s Mike Krieger, why Krieger argued security is the bigger issue, and what restricted model releases reveal about Anthropic vs. OpenAI
The era of giant models: why 10T+ parameter systems may only be a temporary rationing phase before bigger clusters arrive, why labs may increasingly keep their most powerful models private for distillation, and why scale alone no longer feels like a complete answer
Memory as the slowest scaling factor in AI: why context windows have improved far more slowly than people hoped, why million-token context still has not changed most real workflows, and why memory may be the key bottleneck for the next generation of systems
What swyx changed his mind on in the past year: becoming more bullish on open models, more convinced that the top tier of agent startups behaves very differently from the median AI company, and more optimistic about fine-tuning and specialized model adaptation
“Dark factories” and zero-human-review coding: the next frontier after zero human-written code, where models not only write the code but ship it without human review, forcing companies to rethink testing and verification from first principles
Why RL and post-training may matter more than people assumed: even if the resulting models get thrown out every few months, the data, workflows, and domain-specific improvements persist
Synthetic rubrics, Doctor GRPO, and multi-turn RL: why reinforcement learning is becoming much more domain-specific and multi-step than many people realize, opening the door to much deeper customization
The next frontier after coding: memory, personalization, and world models, including why swyx thinks world models matter not just for robotics or gaming, but for giving AI something closer to lived understanding
Fei-Fei Li, spatial intelligence, and the Good Will Hunting analogy: the idea that today’s LLMs may know everything by reading it all, but still lack the lived experience that turns knowledge into a deeper kind of intelligence

Timestamps

00:00:00 Intro preview: AI coding wars, startup pressure, and market structure
00:00:28 Welcome to the Latent Space × Unsupervised Learning crossover
00:01:17 What AI builders are focused on now: OpenClaw, harnesses, and infra
00:04:33 Why AI infra is harder than apps, and where startups can still win
00:06:39 Should companies train their own models?
00:09:28 Open models, custom chips, and the new inference race
00:11:25 Designing products for agents, not just humans
00:16:49 The state of the AI coding wars in 2026
00:19:27 Capability exploration, token-maxing, and why coding is going parabolic
00:21:41 What the end state of the coding market could look like
00:23:50 Where app companies still have room against the labs
00:27:02 Why AI valuations and market swings feel unprecedented
00:28:56 Consumer AI vs. coding AI, and why sticky products still matter
00:32:28 What the next breakthrough product experience might be
00:32:53 2026 thesis: coding agents break containment and eat the world
00:35:27 Are foundation models wiping out startup categories?
00:37:33 AI vs. SaaS, vibe coding, and internal team tensions
00:40:01 Biosafety, security, and the politics of restricted model releases
00:42:19 Giant models, compute constraints, and the limits of scale
00:44:30 Memory as the real bottleneck in AI
00:44:57 Why swyx changed his mind on open models
00:47:44 Dark factories and the future of zero-human-review coding
00:49:36 Why post-training and RL may matter more than people think
00:51:50 Memory, world models, and the next frontier of intelligence
00:53:54 The Good Will Hunting analogy for LLMs
00:54:21 Outro

Transcript

[00:00:00] swyx: Isn’t that crazy? That number is just mind boggling.

[00:00:03] Jacob Effron: What is the state of the AI coding wars today?

[00:00:05] swyx: We’re in a phase of sort of like capability exploration. The general thesis that I have been pursuing now is that the same way that 2025 was a year coding agents 2026 is coding agents breaking containments to do everything else.

[00:00:16] Jacob Effron: Do you worry about the foundation models just getting into a bunch of these startup categories?

[00:00:21] swyx: Mid-size startups. Yes.

[00:00:23] Jacob Effron: What do you think the end state of this market is

[00:00:25] swyx: for the market structure to, to significantly change? There would be

[00:00:28] Jacob Effron: today on unsupervised learning. We had a, a fun episode and what’s really become an annual tradition, a crossover episode with our friends at Latent space.

Swix and I sat down and we talked about everything happening in the AI ecosystem today. What we thought of the various changes at the model layer, what’s happening in the infra world, the coding wars, and a bunch of other things. It’s a ton of fun to do this with someone I really respect and another great podcaster in the game.

Without further ado, here’s our episode. Well switch. This is, uh, super fun to be back with another unsupervised learning, uh, latent space crossover episode.

[00:01:02] swyx: Yeah,

[00:01:02] Jacob Effron: I feel like a lot of places we could start, but you know, one thing I always find fascinating, uh, about the way you spend your time is you obviously are like at the epicenter of this engineering movement and community, and you run these events and conferences and put on these.

Awesome talks and, and I think just have a great pulse on the zeitgeist of what’s going on.

[00:01:16] swyx: Yeah.

[00:01:17] Jacob Effron: Maybe to, to start just what are the biggest topics people are thinking about right now?

[00:01:21] swyx: Yeah, so I just came back from London, uh, where we did a IE Europe and we’re doing roughly one per quarter now, which Yeah, you’ve

[00:01:27] Jacob Effron: really up

[00:01:27] swyx: the, hopefully

[00:01:28] Jacob Effron: up the, up the pace.

[00:01:29] swyx: It’s trying. We’re trying to match AI speed, you

know?

[00:01:30] Jacob Effron: Yeah, exactly. The tops would be completely different, I imagine. Uh,

[00:01:33] swyx: yeah. You know, I definitely curate the tracks, like you can see what I think. When you see the track list and the, the speakers that I invite, obviously Open Claw is like the story of the last four or five months, and then be, be just below that.

I would consider harness engineering, context engineering to be two related topics in agents and rag. And then there’s a long tail of Evergreen stuff like evals, observability, GPUs, uh, and uh, LM infra and just general, just in general. We also have other updates on like multimodality and, uh, generative media, let’s call it.

Um, but I definitely, the, the first three that I mentioned are top of mind people. Yeah.

[00:02:13] Jacob Effron: I think harness is particular like, so interesting. Um, you know, there was this tweet from Harrison Chase, the, the lane chain, CEO, that, that caught my eye recently where he said, you know, it finally feels like we have stability, uh, around the infrastructure for, uh, you know, around ai.

And I think what. He basically was implying his like, look over the past two, three years as a company at the epicenter of AI infrastructure, it was a bit like playing whack-a-mole, right? You were constantly moving around with, however, the building patterns were evolving

[00:02:36] swyx: for Harrison for sure. Right? Like he’s basically had to reinvent the company every year since he started Lang Chain.

Right? It was Lang chain, Ang graph and LP agents and like, uh, I think he’s like one of the most nimble, adept sharp people about this. Yeah. Yeah.

[00:02:49] Jacob Effron: Saying now, now is finally the time stability

[00:02:51] swyx: this. Yeah.

[00:02:52] Jacob Effron: Yeah. Um, do you buy that or what have you kind of make of that take?

[00:02:56] swyx: I think that. It, it’s very expensive to say this Time is different sometimes, but when you’re just writing code, like it’s actually okay to just like try to make a call and I think it may not even matter if this call is right or not.

Like I just don’t even care that much because you can be right on a thesis, but if you don’t, you don’t figure out how to monetize the thesis, then who cares if you said something first that said, um, it does feel like, for example. Uh, we went through a lot of different ways of passion packaging integrations up with, uh, with agents.

And it feels like we’ve landed at skills, which is like the minimal viable format. Yeah. Which is just a markdown file, uh, with some scripts attached to it, and I don’t see how it can be more simple than that. And so there is some justification for. The stability around harnesses. I feel like there may be more adaptation with regards to maybe like the real time elements or subagents or memory or any of those like agent disciplines, let’s call it in, in agent engineering.

Uh, but if, if the thesis is that, okay, you just want agents are LMS with tools in the loop with a file system, what they can do. Retrieval with, with skills and all these like standard tooling that now seems to be relatively consensus then probably. That makes sense. Um, I just think like there’s no point trying to stake your reputation on this thesis that we’re there because if it changes again, just change with it.

It’s fine.

[00:04:33] Jacob Effron: Yeah. It’s always, you know, I’ve always been struck by how that is. Much more challenging for infrastructure companies and application companies. Like obviously I think, yeah. You know, on the application side you’ve seen, you know, Brett Taylor from Sierra Max, from Lara. Like, they’re like, look, we build, you know, what’s ahead of the models and we’re willing to throw everything out every three months, you know, as the models get better and better.

Exactly. Yeah. But the thing you at least have there is you have. Uh, you have an end customer, right? That’s like decently sticky. Um, you know, they will mostly stick, you know, they’ll, they’ll give you a shot at least of, of building these things. What I’ve always found more challenging, uh, at, at the kind of like, you know, reinvent yourself every three months of the infrastructure layer, it’s like, you know, developers are definitely a, a pickier audience maybe than an accounting firm or, uh, you know, a bank.

Yeah. And so it’s definitely a, a, a more challenging position to be in to, to have to constantly reinvent yourself.

[00:05:17] swyx: Yeah. Yeah. Yeah. And, and like when they turn, it’s like. Very complete. Like, they’ll leave to like the, the hot new thing, uh, because there’s like no defensibility, I guess. Like e even, even if you are a database, like, uh, people can migrate workloads off databases.

Like it’s, it’s a, it’s a known thing. Uh, so I think like basically what we’re talking about is the vertical versus horizontal, uh, debate in, in AI startups. And uh, the way I think about it also is just that like when you are. Um, Lara, when you are a bridge, like you are the outsource AI team, right? You, you are, your job is to apply whatever state ofthe art AI methods.

[00:05:55] Jacob Effron: Yeah. Like this translation layer between model capabilities and your

[00:05:57] swyx: own customers. Yeah. To, to the end customers and like, well, if they didn’t have you, they would’ve to hire in house and they’re not gonna hire in house so they have you. And like, I think that’s like a reasonable, like very robust to any whatever trends and, and discoveries that people make in, in the engineering layer.

I do think like there is, um. It like sort of useful horizontal companies being built, but they’re all. Very much like, sort of like the reinventions of classic cloud in the AI era and the, the primary one being sandboxes. Yeah. Um, which like, it’s another form of compute guys, like, let’s not get too excited about it.

But I mean, like the, the workloads are enormous.

[00:06:38] Jacob Effron: Right.

[00:06:38] swyx: Yeah.

[00:06:39] Jacob Effron: It’s interesting, and I feel like as, as part of this, you know, the questions that folks are asking around infrastructure, there’s a lot around, you know, the extent to which companies should have their own AI teams and what they should be doing in-house.

And, you know, uh, I think there’s questions around should people be training their own models? Should people be doing, you know, rl, uh, in-house based on the data they have? I feel like, you know, one has to evolve their takes on this every, every three months with paces. But where, where are you at on this today?

[00:07:00] swyx: I think, well, I mean actually all models have gone up. Um, and obviously I’m involved in cognition and also cursors doing, doing, uh, a lot of own model training. And I think that that is some part of the, what I’ve been calling the agent lab playbook, where you start off with the state of the art models from, uh, from the big labs and you, uh, specialize for your domain.

But once you have enough workload and enough high quality data from your users, then you can obviously train your own models and like save a lot on cost and latency and all that, all that good stuff. Um, you also get like a marketing bonus of like calling it some fancy name and putting out some research

[00:07:38] Jacob Effron: from my seat.

I can’t tell how much of it is like actual, you know, value that’s provided to the end user. And how much of it is that marketing bonus? Right. It seems some combination of the

[00:07:45] swyx: I think it’s both.

[00:07:46] Jacob Effron: Yeah.

[00:07:46] swyx: Um, no, no. There, there actually is real value. Um, and you, you know that for a number of reasons. Like one, even when it’s not subsidized, people do choose it as like one of the top four or five.

This is both composer two and, uh, suite 1.6 I one of the top five models. Like in a, in a fair market? In a free market, yeah. In a, in a, in a model switch. Or people do choose it and like, it’s not subsidized. Like, so that’s as good as it gets. Uh, but beyond that, like domain specific models, for example. For search with, with both, which both companies have absolutely makes, makes a ton of sense.

Everyone says like, yeah, we should always, always do this. And honestly like, I think the infrastructure for that is becoming easier with, um, like thinking machines tinker thing as well as primary like, uh, lab stuff. Yeah, I mean like, this is one of those like reversal of the, the bitter lesson where you first bootstrap on the large models and the general purpose models to get big.

And as you get very well-defined workloads that are just high quantity but not high variance, um, then you just distill down to a smaller model and run that on your own. Right. Which like totally makes sense.

[00:08:50] Jacob Effron: What I’m less clear on is the kind of DIY RL use case, which I think is really mostly around, you know, improved, uh, quality for, for different things.

Obviously there’s probably like more efficient ways to, you know, get a smaller model that’s that’s faster and cheaper. And it’ll be interesting to see whether. You know, obviously you had, you know, uh, two, three years ago this whole case of companies that were, you know, pre-training and claiming better outcomes in, in their domains than getting kind of cooked as each model iteration improved.

You know, I wonder whether that’s a, a similar story plays out in the, uh, in, in the, our all space. Yeah, for the focus on, on on pure outcomes and quality, not the cost side, which clearly your own models for cost at scale makes a ton of sense.

[00:09:28] swyx: I think there are this, there are two sides of the same coin.

Like you basically always want to hold, uh, quality constant or trade off a little bit of quality for a drastic decreasing cost. And that’s true for everyone. Uh, one element I wanted to bring out, which is very much in favor of open models, is custom chips. So this would be cereus, but also talu. And then there’s a huge range of stuff in between.

This has been a huge story this past year on just like everything non Nvidia is getting bid up, including like freaking MatX is working for, which is very, which is very rewarding for me, but I think one of those things where like, oh, like the suddenly, because the number of alternative. Hard, uh, hardware is increasing and the inference that you can get is insanely high.

Like, um, we’re talking thousands of tokens per second instead of less than a hundred. So the trade off for qua quality doesn’t hold as much anymore because the speed is so high.

[00:10:24] Jacob Effron: Have you seen a lot of companies go all in on the alternative chip?

[00:10:26] swyx: So cognition has Yeah. On Cerebras, uh, and, and so has OpenAI

Um, uh, and so no, I don’t think so beyond that, uh, and that, do you think that’s like a, that’s mostly, that’s foreshadowing of, that’s, yeah. I used to be kind of a skeptic in terms of like, okay, so what if I get my inference at a hundred to a hundred tokens per second sped up to 200 tokens per second. It’s only two X faster.

It’s not that big a deal. Um, but when you, uh, I think every 10 x does unlock a different usage pattern. Um, and you, we have proof in Talas and, and some of the others. That you can actually, um, drastically imp improve inference speed and what happens from there? I don’t even really know, like it’s, it’s so hard to predict when entire applications just appear at once.

Yeah. Uh, and it also isn’t that expensive, right? So like, um, this is one of those things where like, I, I think the, the investment cycle is gonna be multi-year. Um, and I. Would caution people to not dismiss it too, too quickly.

[00:11:25] Jacob Effron: Yeah. I mean, one other like infra question I was curious to get your thoughts on is obviously it seems increasingly a lot of the cutting edge infra companies are building for agents as the buyers of their product or users of their product, right?

[00:11:35] swyx: Ooh,

[00:11:36] Jacob Effron: and

[00:11:37] swyx: another huge theme. Yeah. Yeah.

[00:11:38] Jacob Effron: And I’m trying to figure out like what. What, what do you have to do differently about selling into agents? Um, are they just the ultimate rational developers? Uh, or is there, you know,

[00:11:46] swyx: no, absolutely not. Um, I think they are easily prompt, injected and, uh, very tuned towards like, basically com compounding existing winners.

[00:11:57] Jacob Effron: Yeah,

[00:11:57] swyx: so like if, like, congrats if you won the lottery for getting into the training data right before 2023, because now you’re like installed in there for the foreseeable future. But yeah. Uh, you know, one stat that Versal, uh, CTO Malta dropped at my conference was that there are now, uh, 60% of traffic to Elle’s, um, like app arch, like admin app architecture for like configuring versal applications, uh, is bought.

It’s not, it’s not human. Uh, so like your primary customer is agents now. Um, and it’s mostly co like mostly coding agents, mostly people using CLI on CP or whatever. But yeah, I mean, I think. More. I, I think step one, if it doesn’t exist as an API that agents can use, it doesn’t exist. Right, right. Which I think is like, uh, it’s a good hygiene thing anyway, to, to make everything API available, but not as like an extra, um.

Push on like products, people to not only work on the ui, um, you should probably work on the on SCLI stuff. Beyond that, I think honestly there is like, so I, I come from the sensibility of, I think everything that you are trying to do for agents experience now, which is the term that Matt Bowman and Nullify is trying to coin, is the same thing that you should have been doing for developer experience.

That you should have had good docs, you should have had a consistent API, uh, that is. Mostly stateless. Um, you should have, I guess, discoverable or progressive disclosure or like search or like whatever. And so now that people have energy in like finding these customers to do that, that’s great. Um, do I believe in.

Extending beyond that into something like a EO, um, for gaming The chatbots? Not necessarily, but obviously there’s gonna be huge advantages when people who figure out the short term wins. Yeah. And short term wins can compound.

[00:13:43] Jacob Effron: Do you think these compounding advantages to like the, the pre-training data cutoff companies, like, you know, obviously over some period of time, I imagine that doesn’t persist.

And so as you think about like. I dunno, three, four years from now what the, you know, selection criteria end up being. Do you think it still mirrors exactly what you were saying before? Like it’s exactly what you should have been doing all along to sell a good product to developers?

[00:14:01] swyx: It could be, except that I think in three, four years we’ll probably have much better memory and personalization.

So then general a EO or GEO doesn’t really matter as much. So I think whatever memory or personalization system we end up with will probably d determine what you end up choosing much more. Than, than what is currently the case, which is just frequency of mentions, let’s call it. Yeah,

[00:14:26] Jacob Effron: yeah.

[00:14:26] swyx: Uh, so you just spa quantity and I think that’s, I mean, that’s something I’m looking forward to.

I do think, like, like, you know, I, I think that the fundamental exercise to work through for yourself is if you start a new, um, sort of. Uh, disruptor company. Now there’s a, there’s a big incumbent that everyone knows, like, like superb base. Super base is like, kind of like the Postgres, like database, uh, incumbent.

If you wanna start like new superb base, how would you compete with them? And I don’t necessarily have the answer, but I, I, I do think like people, like resend like relatively new. I think they would start like 20, 23 and still there was, there was a recent survey where like, people. Checked what Claude recommends by default.

If you just don’t prompt it with anything, just say, gimme an email provider and says, resent as in like 70, 70% of each cases. Like the fact that you can get in there with like such a relatively short existence, I think is, is encouraging.

[00:15:14] Jacob Effron: Yeah.

[00:15:14] swyx: I do think like. Um, you do want to do whatever it is to, to like to, to get in that Very short mentions this because, um, it’s not gonna be 20 of them, it’s gonna be like three.

[00:15:26] Jacob Effron: No, definitely. It feels like, uh, you know, probably more, more consolidation than ever. Uh, or, or kind of like, you know, uh, a winner take most market than maybe the, the, the physics of go-to market in the past. Yeah. Might have, uh, enabled.

[00:15:38] swyx: The other thing also is like, semantic association is gonna be very important, uh, in the sense that like, you want to do like the combo articles where you’re like, use my thing with for sale, with blah, blah.

And like that all gets picked up in a, in a corpus. And so that’s. Probably one thing that you, you wanna do? Well, I don’t know what else. Uh, it’s, it’s, it’s, it’s one of those things where like, I think I feel, I feel I’m behind, uh, I don’t know how you feel about this, but like,

[00:16:04] Jacob Effron: I think AI is just everyone constantly feeling like they’re behind some, uh,

[00:16:08] swyx: yeah.

With,

[00:16:09] Jacob Effron: I wanna meet the person that doesn’t feel behind,

[00:16:11] swyx: but like with, with ax, right? Like, so, so like, my, my stance was that exactly what I said before, like everything that you, that you should do for agents is something that you should have done for humans anyway. Yeah. And so. To the extent that you’re just getting it more energy to, to do things for agents, great.

But like, uh, it’s hard to articulate what new thing apart from just like more spam, um, that you should be doing. Anyway, that would be my take right now. Um, I I, I do think like there, there will be more turns at this. I think the personalization turn that is coming, um, will be big. And I don’t know what that looks like because like basically we’re kind of, we feel kind of tapped out on the memory side of things.

[00:16:49] Jacob Effron: Yeah. I, I guess since we last chatted, you know, you, you took this role over at cognition, um, and you’ve obviously have a, have a front row seat to the AI coding space today. You know, I feel like coding in many ways. You know, people view it as this, like, I mean, besides being like the, the mother of all markets and this massive opportunity, I think it’s kinda a preview of like, what’s to come for many other spaces.

Both. Yeah. You know, I feel like agents are most advanced in coding. I also feel like the, you know, competition between foundation models and application companies, you know, and, uh, mirrors what we may see in other spaces. And so maybe for our listeners, can you just lay out like what is the state of the AI coding wars today?

[00:17:25] swyx: Um, it is massive, right? Like, uh, and I don’t think necessarily, last time we talked about this, we appreciated the size of what

[00:17:32] Jacob Effron: No, I wish we did.

[00:17:33] swyx: I state of AI coding wars today, um, both opening eye philanthropic have made it their p serials to competing coding. Um, and. Tropic is like 2.5 billion in a RR just from Cloud Code.

The way they recognize a RR is. Opt for debate, uh, open ai. I don’t think the, a public number is known, but let’s call it 2 billion as well. And then cursor is like, rumored to be 2 billion, you know? And, and those, those are like the public numbers that are known? Yeah. Um, so like huge markets that have just been created in the past one year.

Like, like anthropic, just like Claude Code just recently celebrated their one year anniversary, which is, yeah, pretty nice. Um, so, and then I think, like the other thing that I see is there’s, there’s some other people who are like, oh, here’s like the, the sort of relative penetration of, uh, Claude use cases, right?

Like, and it’s like coding 50% and then legal, whatever. Health, uh, it’s like the, the remaining ones. And there was a very popular tweet that was like, okay, I’ll look at the, the empty space and all these other use cases. If you are a new founder today, you should be betting on the other stuff because on, on a sort of catch up Yeah.

Theory and my. Consider my, my pushback is the same pushback that, uh, I had on app over Google, which is like, well, well why is this time different? Like, why, if it went from let’s say 10 to 50% in the past year, why can’t I keep going? Uh, and like getting that wrong is actually a very painful one because you could have just did, did the momentum bet.

Instead of the mean reversion bed. So I, I, I think that that is the, the state of things now that people are very, very much into psychosis. Um, they’re are getting rewarded for spending more rather than spending less. And I think we’re not in that phase of efficiency. We’re in a phase of sort of like capability exploration.

So I think people who are more crazy, who are more. Uh, creative, um, get rewarded comparatively. Yeah.

[00:19:27] Jacob Effron: Well, it’s interesting. I mean, it feels like behind these like token maxing, leaderboards and whatnot is this, it’s like the first phase of this transition from a workforce perspective is you just gotta show your employer like, Hey, I, I use these tools.

[00:19:37] swyx: Here’s my nu number of tokens I cost, and that’s it. They don’t care about the quality. Right. It is, uh, maybe distasteful to someone who cares about the craft and, and all that. Um, but directionally everyone just wants you to go up regardless. And so, um, there it is not very discerning. It’s, and it’s probably very sloppy, but I think it’s net fine because we’re still probably underusing ai just in generally.

Yeah. Um, and so I think that’s like very interesting. Like we had on the podcast, uh, Ryan La Poplar from OBI, who spends a billion tokens a day. Yeah. Um, and that’s for those county home, it’s like something like 10,000 worth, $10,000 worth a day of API tokens. If they, they did market rates, um, and like most of us can’t afford that.

Yeah. But like. And, and, and probably a lot of what he does is slop.

[00:20:25] Jacob Effron: Right.

[00:20:25] swyx: But like, he’s going to dis, he’s like, if there were a new capability, he would discover it first before you because he was, he was trying and you were not trying. Right. And like, you only do things that work like, well, good for you.

But like the, the people who are going to discover the next hot thing are living at the edge.

[00:20:42] Jacob Effron: Right and increase in living at the edge of just having the compute budget to like run these experiments. I mean, kind of similar to what living at the edge on the research side has always been. You know, it was constrained in many ways by the amount of compute you had to run these experiments.

It feels similarly on the, almost on the builder or like actualizing these tools now.

[00:20:56] swyx: Yeah. The other thing that’s, I mean, very obvious is philanthropic is kind of like the high price premium player. Um, that where, you know. Restricting limits or restricting model releases even is like the name of the game.

Whereas Codex is like, come on in guys, use our SDK, use our login and we don’t care. We’re gonna reset limits. Whatever you do want to try to exploit the subsidies where you can get it. And definitely Codex is super subsidized right now. Gemini also very subsidized. Um, and. Comparatively, like, I think you should make, Hey, I guess while, while that’s going on, it’s not that bad to be a capabilities explorer on just the $200 a month plan from Cloud Code or from OpenAI.

Um, and, uh, I I, I, my sense is that people aren’t even there yet.

[00:21:41] Jacob Effron: How do you think this, like, market ultimately plays? I mean, it’s obviously such a big market that, you know, any slice of that market is interesting for, for anyone going after it. But I think what, what makes people so interesting in the coding market particularly is it feels like it’s kind of this.

Foreshadowing of what will happen in other, you know, any other kind of application market that the foundation models eventually turn to and are all their models against and gather data around. And so how do you think, you know, like does there end up being room for lots of different kinds of players or like, what do you think the end state of this market is and is that, do you think that’s applicable to other markets?

[00:22:10] swyx: I feel like there will be, I mean. Status quo is probably the most likely outcome, which is there are two big players and there’s a small range of longer tail people that, um, fit other use cases that the, the two big players don’t. That feels right to me. I think that, um, for it to, for the market structure to, to significantly change there would be, there needs to be significant change in like the economics or like the, the brand building or like the, the, the, the value propositions of the, of the companies involved and I.

Haven’t seen any in the last six months that, that have really changed the stories materially. So I feel like they would just keep going until something, something else happens. Something else happens, meaning like Microsoft wakes up and like goes like. Guys, we have GitHub, we have, uh, you know, we, we, we’ll, we’ll do something much bigger here than other, other than just copilot.

Um, and, uh, that would be a big change. Um, MSL has put out a model now, and I was in a breakfast with, uh, Alex Wang, where they were like, yeah, like, we, we really, really want to go after the coding use case. We haven’t done anything yet, but like, don’t underestimate them. Right. Um, and, and similarly for the Chinese labs.

Um, I think they’re trying to go after it. Like ZAI is doing stuff. GLM uh, ZI and GLM is same thing. Um, uh, and, and so it’s, so like everyone’s trying to get a piece of that pie. I, I feel like the, the status quo has been pretty stable for the past, like almost a year I’ll say.

[00:23:39] Jacob Effron: Yeah. And is the room for the, not like, you know, for, for the application companies more on like the enterprise side or like where do the, where do the, like what surface area do the model companies leave for application companies?

[00:23:50] swyx: Yeah, that’s a good one. Um. It’s very much evolving. Um, it, I, I, I will say because opening I did not have this, the, this level of attention on coding. Yeah. Uh, a year ago. We just don’t have that much history. Right. Um, and it seems like, for example, so the big push at Open I now is the Super app. Um, is that a consumer thing?

Is that like a products like. Portfolio rationalization thing, how much is that gonna take away attention from coding at the time when they actually do want to put more coding? I think it’s, it’s very unclear. So I do think like there’s, there’s all these, like in both big labs, there’s. Uh, sorry. Both of the, and, and drop and, and deep minus and XAI are are separate cases.

Um, they are trying to see the other time expansion areas. So cloud code for finance. Yeah. Um, uh, cloud cowork, all those, all those things. Whereas I think cursor and cognition are like comparatively just focused on coding and so I, I do think they leave space and I do think for the other verticals that also means the same thing.

Right. That, uh, that they’re not gonna be that. Um, intensely focused on, on, on that domain. Except for, I, I think I would mark out finance and healthcare as like the next ones, um, that they’re clearly going after. Uh, I, I would say comparatively, healthcare seems more thorny. There, there, there’ve been some announcements about it, but like, I would respect the, the finance work a lot more just because like the, the path to money is a lot clearer.

[00:25:12] Jacob Effron: Yeah, no, I mean, obviously like, I, I think, you know, maybe similar to, to the space that’s being left in these other domains, you know, there’s obviously. Uh, a lot that’s required to actually implement these tools in enterprises, uh, versus, you know, maybe just giving them, uh, giving model access to, to folks outta the box.

[00:25:27] swyx: Yeah, yeah. Yeah. So the, the agent lab thing is like, we’ll do the last mile for you. Whereas I think the model labs tend to just trust the model and, and be minimalist about it. Both of them work.

[00:25:38] Jacob Effron: Yeah.

[00:25:38] swyx: I, I don’t, I don’t necessarily think one, uh, beats the other, uh, for every, for every use case. Um, all I, all I do know is that it does seem like.

Uh, the large enterprises do want a dedicated partner that isn’t just the model labs, which is kind of interesting.

[00:25:55] Jacob Effron: We, we’ve been in this phase of, of pure capability exploration. And so I think nothing has been, you know, better for the large labs, right? I mean, they’re always gonna be, uh, uh, the frontier of, of capability exploration.

And so I think have a very good relationship with a lot of these enterprises. But ultimately over time, like. The, uh, the incentive structure of these labs is always gonna be maximal, you know, token consumption for, uh, for the end customers they work with. And there’s just, I think, so few companies that have actually gotten to massive scale.

Maybe coding again is the most interesting. So it’s the first space that really is just completely gone, you know? Yeah. You must love it every day. Like absolutely insane. And. I think it

[00:26:32] swyx: gets even. Okay. I mean, like, I think we, we say good things about crystal cognition, but the sheer liftoff of like both end UPIC and open ai.

‘cause they, they, they have independent valuations. I mean, let’s throw an XEI in there because it’s now I ping at 1.2 trillion. That number is just mind boggling. Like I, I feel like in normal investing or normal startups, there’s kind of like a ceiling market cap or valuation. Totally. That, that like you, you reach and you go like, all right, let’s, it’s gonna be chiller from now on.

And these guys are not slow down. No.

[00:27:02] Jacob Effron: Well, I also think the dynamic is fascinating about some of these later stage companies is, is, you know, in the past, I feel like in, in venture world, if you got to a certain level of scale, the question around you was really more a valuation question. And this is like why there was different phase, like, you know, types of venture people did and like the late stage growth people were just incredible at like, you know, a little bit of what’s the ultimate market opportunity of this company, but also what’s the right way to, to value it.

Like we know it’s, it’s in some bands of an outcome that is like. Sure there’s some variance to it, but it’s like relatively understood what that bands is and then maybe you get over time surprised to the upside. Whereas any kind of like later, even the labs themselves, any later stage company, the bands of which that company might be worth right now, even in a year or two years are so massive because of how fast the ecosystem changes that it’s like.

Even for later stage companies, every three months could be an existential level event to the upside to the downside. Yeah. Um, and I think that, like, you are obviously seeing it in the, in the positive with code, which, you know, if you think about a company like philanthropic, you know, that. For a while, it was like unclear if they were going to have access to enough capital, um, to really stay in the, in the race, right?

And then coding hit at the exact right time. They had the perfect model for it. They executed brilliantly. Um, and you know, now are, are, you know, uh, you know, one of the most valuable companies in the world.

[00:28:13] swyx: Uh, at the same time, I, I don’t find, I, I have zero sympathy for opening eye because they’re crushing it and they’re all rich.

You know, this is like a high class champagne problem to have to, uh, to be number two at coding or whatever. Like, who cares? Like, you’re, you’re doing great.

[00:28:27] Jacob Effron: Yeah. It’s funny though. I can’t even, I mean, you would be closer to this, uh, you know, even that you’re in the AI coding space, but it’s like a lot of people I talk to think Codex is just as good, if not better than Claude Code.

Right. I think one thing that I’ve been really surprised by, and maybe, maybe Cloud Code is a better product in some ways, I’m curious your thoughts is just in consumer AI with chat GBT. You saw this big first mover advantage, right? Where admittedly today, like, I don’t know, Claude Gemini. Great products.

Not sure, not abundantly clear chat GBTs any better, but like. People stick with chat, GBT, it’s the first thing to introduce them.

[00:28:56] swyx: They stay, but they’re not growing anymore. I don’t know if you’ve seen

[00:28:59] Jacob Effron: Right. But that to me is more of like a, a, a product problem than it is. They’re not like, it’s not like they’ve like lost share to someone else.

My understanding is the overall problem with consumer AI today is much more of a how do you take this tool and, you know, for, for folks like us, like knowledge workers, it’s like this incredible magic tool, but it’s not necessarily a daily active use tool for a lot of people around the world today. And what are the like products?

It’s, it’s kind of a category wide problem. Like in coding, for example, like. The entire space has gone parabolic. There may be some relative growth in, uh, in other consumer AI players, but it’s not like consumer AI as a category is like going parabolic and they’re not capturing most of that thing. I think it’s actually the larger problem is much more, hey, the category has kind of hit a bit of a plateau of people haven’t figured out how to bring, you know, tons more users on board.

Yeah, yeah. Or increase the frequency of those users. And so it seems more of a category wide problem than it is, you know, a massive market share of change. I was gonna draw the comparison to, to the coding space where Claude Co is the first product, obviously, to introduce people to this magical experience.

You know, by all accounts, codex is, is pretty damn close to as good, if not better. Um, but like still that first product, you, you would’ve thought that would not be a super sticky, uh, you know, product surface area. And it actually has, it turns out, I, it feels like the first lab to introduce you and experience really does, uh, keep a lot of, uh, a lot of the focus.

[00:30:12] swyx: I, I think. M maybe it’s like still, still early days. You know, Chad, BT is like three plus years old and Yeah. Cloud code is only one. Just turned a year. Yeah. So give it time, you know? Yeah. Like, yeah. I mean, definitely sometimes a lot of people have switched from to Codex. Maybe that will keep going. I, it’s like really hard to tell.

Uh, yeah. I, I, I do, I do think that. Because we are in this like, high volatility, high temperature phase. Um, the loyalty and stickiness to first movers and category creators, I don’t think is as high as it might be in some other, uh, areas in our careers that we’ve looked at.

[00:30:47] Jacob Effron: Yeah. Though, I mean, I’ve been surprised by the cloud code thing.

I, I would’ve thought that, like, in many ways I always worried about the

[00:30:52] swyx: enterprise. You think you would’ve been gone by now?

[00:30:53] Jacob Effron: Not gone. But I would’ve, I I always worried that the, that the consumer business of these companies would be quite sticky. And then the enterprise API business. Uh, was actually like, you know, in some ways like your least loyal buyers, like they would, they would move to,

[00:31:05] swyx: right, right.

But, but they worked out that it wasn’t the enterprise API it was enterprise product.

[00:31:09] Jacob Effron: Totally. And maybe that was the, that was the secret that like, but the amount of lock-in or just default behavior that has happened in that space, uh, is, is more than I might’ve imagined with two products that by all accounts are pretty damn similar.

Yeah.

[00:31:22] swyx: No fight there. Uh, I will say I do think that Codex is still in like a catch up. Like in terms of personal experience. Um, the only thing I like out of, out of Codex is the, is like Spark and like yeah. Uh, the, I, I feel like the skills integration is a little bit better. I feel like, uh, the, the speed is a bit better.

Maybe ‘cause it’s in, is written in rust or whatever. Um, very minor things that you like. Almost like telling yourself rather than like objectively assessing between two, two of them. I, I, I do think, like vibes wise, I think that’s going on. Um, the, the, you know, I, I feel like the, the missing questions, uh, in, in this whole debate is like, why is this so concentrated in only two names, right?

Yeah. Like, um, how, where, like, where is the Gemini? You know, presence, where’s the Xai presence? Um, and like they are trying, it’s just they haven’t made that much progress yet.

[00:32:12] Jacob Effron: But what the, what the Claude Co moment does show, and it actually in some ways makes you a little more bullish on the potential for someone else to catch up because it does feel like if you’re the first person to introduce some magical net new product experience, that that actually might be stickier than one might have imagined.

[00:32:27] swyx: Right, right, right. Okay. Yeah.

[00:32:28] Jacob Effron: And so it’s, everyone can believe they have shot

[00:32:29] swyx: that. What do you think that new product experience might be like? I, I, it’s, it’s like, and this is a failure of imagination on my part. Like, I always wonder, like, people always say this like, well, the, the thing that will save us is like being first to the next new thing.

Like what is it?

[00:32:41] Jacob Effron: Yeah.

[00:32:42] swyx: It’s like,

[00:32:45] Jacob Effron: I dunno, something around like, uh, consumer agent, computer use, like hybrid. I think, obviously, I think we’re like scratching the surface on the consumer side.

[00:32:53] swyx: So my, my current theory is like the. Open claw is like a vision of things to come.

[00:32:58] Jacob Effron: Totally.

[00:32:58] swyx: Um, and uh, it’s good that O open I has like the association with open claw, but by no means do they have the rights to win it.

The general thesis that I have been pursuing now is that the year the same way that 2025 was the year of coding agents, 2026 is coding agents breaking containment to do everything else. Um, and so coding agents continue to still win, but because they generate software and software eats the world, so like, it’s kind of like the trans.

Associated property of like software, eat the world, coding agents, eat software, therefore coding agents eat the world. Um, which is like an interesting,

[00:33:30] Jacob Effron: yeah, and breaking containment always an easier phase phrase in the consumer context than the enterprise one. You’ve seen people run these really cool, uh, experiments in their own personal lives.

I think like,

[00:33:37] swyx: yes.

[00:33:38] Jacob Effron: Figuring out, you know, how you, obviously everyone’s focused, you know, on the enterprise side now around how you create these experiences. I feel like the vibes, you know, people love to have these narratives of like, everything is completely shifted. It’s like I actually, you know, open AI.

Organizationally, uh, you know, volatility aside is, you know, great products, great team, great models like everyone else in the world is incentivized for there to be. Two, three more. Everyone would love more like great model companies. And so I feel like the, the natural forces of the world revolt when any one company, you know, is too much the star of the show, right?

There’s so many people in the ecosystem that are incentivized for that not to happen. And so I think I’d be shocked if we don’t have. Uh, uh, reversion of vibes, not maybe completely the other way, but at least a little bit more equal at some point over the next six, 12 months.

[00:34:24] swyx: I, I think there’s just a kind of different stages when, when you talk about the world, one wanting more model companies, I talked think about like the neo labs.

[00:34:30] Jacob Effron: Yeah.

[00:34:31] swyx: And I mean, I don’t know, is it fair to say none of them have really broken through in the past year?

[00:34:35] Jacob Effron: I think that’s totally fair,

[00:34:37] swyx: which is rough. Um, and well, how are we gonna, how are we gonna grow that diversity in, in, in choice, like. Um, that’s, this is it.

[00:34:46] Jacob Effron: Yeah. It’ll be really interesting to see what, what, what ends up happening with that.

And you’ve seen, you know, folks like Nvidia, you know, very incentivized to make sure there’s, there’s a broader platform of, of other model providers.

[00:34:57] swyx: I think, uh, I don’t know people say this, but I, I, I don’t think they try it hard. Nvidia tries harder to build neo clouds

[00:35:05] Jacob Effron: Yeah.

[00:35:06] swyx: Than neo labs.

[00:35:07] Jacob Effron: Well, they try pretty damn hard to build neo Cloud, so

[00:35:09] swyx: that’s,

[00:35:09] Jacob Effron: yeah.

[00:35:10] swyx: But like, you know, let’s call it like the, the core weaves of the world, much happier place in the, you know, than any neo lab built on top of them.

[00:35:18] Jacob Effron: Yeah. That one might argue it’s, it’s easier to, to enable a neo cloud to be successful than it is. Uh, you can’t will a neo lab into existence the same way you, so

Nvidia

[00:35:25] swyx: has more direct control over it.

Uh, for sure.

[00:35:27] Jacob Effron: What else is kind of catching your eye today on the startup side? I mean, you worry, there’s obviously this whole narrative of like, you know, the foundation models, you know, they announced a product and every stock goes down 15%. Like

[00:35:36] swyx: Yeah.

[00:35:37] Jacob Effron: Do you, do you worry about the foundation models just kind of eating into to a bunch of these startup categories?

[00:35:43] swyx: Not really. I, I think actually like. As, uh, there’s, there’s, okay, there’s, there’s, there’s the, there’s the point of view of like being an investor in startups, and there’s a point of view of like, do you wanna start something? And I think honestly, like the, the downside for all these is so. Minimal in, in a sense of like, the worst you do is you just get hired into one of these labs anyway.

So I, I think the, the market for people who just do things and try things and try to execute in like a competent way, even if like it doesn’t work out commercially, even if it just wasn’t that great anyway. Like, but like that’s your job interview to go into, into one of these things anyway, so, um, I don’t feel that.

From a, from a very, very small startup perspective, mid-size startups. Yes. Uh, I will say there’s been a lot of dead, um, LM Infra, a lot of LM infra consolidation like the, the, uh, lang fuses of the world getting absorbed into, into click house. And I, I think. Like people have maybe worked out the domain specific playbook, uh, and like, I think that’s okay.

Um, and, and yeah, I’m not that, not that worried about, uh, okay. So, um, I, I would say I’d be more worried about traditional SaaS, like low NPSS. This is the whole AI versus SaaS debate that has, that’s been going on. Uh, and, and like literally I’m going through that exact thing in my company where, so I like kind of.

Thinking through this on a very visceral, visceral level, right? On one hand you have the people who say you vibe coders don’t appreciate the amount of work that goes into A-A-C-R-M and like, yeah, you think you can rip out Salesforce? So did the 30 entrepreneurs before you, right? Like, like, you know, you classically underestimate the things that you don’t.

Deeply, no. And, and, and target audience is not you. Uh, at the same time, like we have never been able to build software so easily and customize software so easily and like Yeah, you’re not gonna use 90% of the things in Salesforce. So like, yeah. What’s the typical, so what have you, what

[00:37:33] Jacob Effron: have you done internally?

[00:37:34] swyx: So we have there the main SaaS that we do for event management and sponsor management. That’s, and we paid 200 KA year for that. Not, not huge, but like chunky for, for, for my, my scale. Um, and like, yeah, I could probably spend 2000 and, and build like a custom version of that. Um, the, the, the trick has been dealing with my, the rest of my team and getting them on board.

Yeah. ‘cause I’m the most ethical person on my team, but like, I can’t make that decision myself. And I think in the same way I’ve been telling with other CEOs team leaders as well, it’s like, well you can be super cloud pilled. You can be super LM psychosis and that you think that’s okay, but you like you have to bring your team with you.

And I think like there, the sort of widening disparity in LM psychosis in companies is causing real s real riffs because. And on one hand, on one hand, the people who are less AI native are not getting with the picture. They’re not, they’re actually like behind, they’re actually not waking up to the fact that like you, everything you think is necessary is not actually that necessary.

And in fact, exactly would be better of you if you just like held your nose and went in and when came out the other side. Yeah, only talking to agents in natural language and like your life would actually be better and you just, you’re just like close-minded. There’s that perspective. The other perspective is, oh, you vibe coder.

You, you did this in a weekend and you got the 80% solution and now the rest of your employees. Have to pick up the rest of your shit, right, that you, that you thought you were, you were such hot, amazing, uh, uh, at, but like, actually you didn’t figure it out. And like, actually LMS are still useless at this and blah, blah, blah.

So like, I think there’s this huge debate going on in every company right now. Um, and like, um, you know, I have a small microcosm of it, but like, yeah, it, it’s making me hesitate to, to pull the trigger. But like I will at some point, it’s like maybe I’ve put it off for one year, but not like five. Yeah, but like, so, so like SaaS is definitely getting squeezed.

Um, it does make me wonder, like, I, I do think that there’s an opportunity for a more AI native, um, system of record thing that is not just Postgres. Um, or not just MongoDB, although both are very good. Maybe it’s like a convex or like people Yeah. Bring up convex a lot. I don’t know, like, like, I, I just feel like the sort of quote unquote firebase of, of AI apps isn’t really a thing yet.

Um, beyond what we have. Uh, which, which is fine. It’s, it’s, it’s just. We could probably start in a more sort of rapid iteration cycle first before scaling up to like a Postgres or MongoDB, which are more sort of old tech. I was at a dinner with, uh, Mike Krieger, the CPO of en philanthropic, and, and he, we were just kind of going around the room going like, what are people most worried about?

Yeah. And, uh, for me, uh, I, instead of security, I brought up biosafety. Yeah,

[00:40:21] Jacob Effron: classic.

[00:40:22] swyx: Um, actually, like I said, it was. Cliche and classic, and the rest of the table were, were like, what do you mean? Someone sitting at home can manufacture a virus that wipes out half of humanity,

[00:40:32] Jacob Effron: almost like the OG Jeffrey Hinton.

Like, this is why you should be scared.

[00:40:35] swyx: I’m like, yeah, like the read the, you know, risk reports. Like this is like the thing. Um, I think, and Mike was just sitting there knowing he was sitting on Mythos and going like, actually it’s security. Um, and I think like, um, I think the, there’s, there’s, part of it is.

A very good marketing. Like too good. Yeah, like I would actually advise and topic to tune down the marketing because also it’s, it is just a very good model and you don’t have to make so many marketing claims around it. At the same time, it is not really a private model. If you give it to 40 companies.

Each of whom have like 10,000 employees or whatever. Right. It’s not, it’s not private, it’s, it’s like there’s bad actors in there.

[00:41:18] Jacob Effron: Yeah. Hopefully, hopefully not as, uh, as bad as releasing it widely, but, uh, no, I mean, it’s an interesting. You know, it’s an interesting case study for how all, I mean, many model releases might, I mean, you know, this might be the first model release that looks like the rest of ‘em from from now on, right?

[00:41:31] swyx: It, it, so it’s, it’s the, there’s an overall product strategy, uh, for anthropic of like bundle, uh, you know, restrict access bundle, uh, product with model maybe.

Whereas, uh, OpenAI has definitely been a lot more sort of. Philosophically aligned on like, we will just enable access everywhere and we don’t know what you, what will come out of it. Right.

[00:41:51] Jacob Effron: Right. Though, I mean, this current moment, uh, obviously the cynical take is also just ties to the amount of compute that both companies

[00:41:56] swyx: Yeah.

Right, right, right. Yeah, I think, I think that’s true. I I do think like the, the, this is the, the, the scale, the dawn of like larger than 10 trillion parameter models is very interesting. I don’t think it, I think it’s a temporary phenomenon because we have much larger compute clusters coming online for everyone over the next like three, five years.

It’s, and this is like already written in, in the cards.

[00:42:18] Jacob Effron: Yeah.

[00:42:19] swyx: So to the extent that like, you know, will we have rationing of models, uh, above 10 trillion, uh, in like two years? I don’t think so. I think everyone will have no, we’ll just

[00:42:29] Jacob Effron: have rationing of the next phase.

[00:42:30] swyx: Right. Right. But like, that’s as it should be almost like, um.

My, my classic example, which I, this is just me theorizing, not anything confirmed by Google. When Google announced Gemini, they actually announced three sizes, which was Flash Pro Ultra. They never released Ultra. They only have Pro and Flash. Um, so my theory is they have ultra sitting in a basement and they just could distilling from it for, for flashing pro.

Um, which like, yeah, I mean, I, I actually think that’s. As it should be for any lab that they, that they do that.

[00:43:02] Jacob Effron: Yeah. Just because those are the models that people actually wanna end up using. And it’s just like cost prohibit.

[00:43:06] swyx: It is more, yeah, it’s cost. Yeah. It’s, it’s not the want, it’s just, just, just the cost.

Um, I do think, like, uh, it is interesting that, uh, for a while I was, I was considering the theory that models capped out at two, 2 trillion, and I think that’s proving to be wrong. And well then if I’m wrong, how wrong? How wrong am I? Do we do 200 trillion? Do we do two quarter trillion, whatever? Um, and I don’t think we have the straight answer to that, but like, uh, it’s interesting that we are continuing to scale number of pers when everyone kind of assu like can see that we’re not going to get like the next thousand or 1 million x from this paradigm.

So like the others, like the alias of the world are working on other. Um, model architecture improvements. We need a different scaling law, I guess, because like, we’re, I, I feel like people already already feel like we’re tapped out on this. Like the, the end, the end state of this is we turn most of the world into data centers and like, I don’t know.

I don’t know if we want that.

[00:44:08] Jacob Effron: Yeah, I mean, uh, if the, if, if, if the return of intelligence are there, maybe, uh, maybe not so bad.

[00:44:13] swyx: I, I, I think there, there’s just a sheer amount of like, like un scalability that like is wrangling people’s sensibilities right now. Um, especially in terms of like context lengths.

Um, my classic quote is that context length is like the slowest scaling factor in, in lms.

[00:44:30] Jacob Effron: Yeah.

[00:44:30] swyx: Um, we, like, we took maybe. Three years to go from like 4,000 context length to a million and that’s about it. Yeah. Like Gemini has had a million token context length for two years now. Um, and no one’s using it.

Like, so like yeah, it’s memory. Memory is probably gonna be the, the biggest limiting constraint on all these things.

[00:44:50] Jacob Effron: Yeah. Certainly seems that way. I guess I’m curious over the last year since you recorded last, like what’s one thing you’ve changed your mind on?

[00:44:57] swyx: I feel like I was kind of bearish on open models like last year.

Um, in a sense of, like, I, I had just done the podcast with an Al

[00:45:07] Jacob Effron: Yeah.

[00:45:08] swyx: Of Braintrust where he, and he, I mean, you know, he has a good cross section of all the top AI companies and he says market share of open source is 5% and going down. Um, I think that’s changed. I think it’s going up. Um, and even if,

[00:45:22] Jacob Effron: even though the capability gap does seem to be increasing.

Spending on the

[00:45:26] swyx: time. It’s hard to tell. Yeah, it’s, it’s really hard to tell. ‘cause like, okay, for, for listeners, capability gap increasing is like on public benchmarks. And let’s say you’re comparing mythos versus like, I don’t know, G-T-O-S-S or like GLM 5.1. And, um, it’s, it is really hard to tell. ‘cause even if they were closing, you will also not believe that they were closing that much because it’s very easy to gain the benchmarks.

Yeah. So you just don’t really, really know. Um, all you know is like. Uh, there’s somewhat objective open router stats on like what people choose in a free market. And people do choose some of these open models in significant volume, except that a lot of them are heavily discounted. So you need to kind of like price adjust, uh, these things.

So even if, even if that were true, which I, I’m not sure, like I, I, I feel like the numbers just up now instead of down. Uh, I think the. Separation between what the top tier agent labs are doing versus the average startup in ai or the average GPT wrapper is significant enough that you should not worry about the, the, the sort of mean industry number.

And you should, you should cohort things into like, here’s the median here, here’s like the bottom 80% and here’s the top 20%. And top 20% acts very differently than the pome percent. And so top 20% is, which is what I all I care about, um, is. Definitely going towards more open models. Um, the fireworks and the togethers are crushing.

Um, and, uh, and so will all the fine tuners, right? So like, um, I think maybe last time we even said things like, fine tuning is a service doesn’t work. Well, now it’s gonna work. It’s, it’s a derivative of the open market, uh, open models market.

[00:47:01] Jacob Effron: Well, and also in the workload scaling to the point where people care about cost and speed, you know, more and more.

[00:47:06] swyx: Yeah.

[00:47:06] Jacob Effron: And that like the, you know, moving from just pure use case discovery of like, what can these models do to, okay, we know what they’re gonna do at scale now let’s do ‘em cheaper and faster.

[00:47:14] swyx: Yeah. Yeah. Um, so, so like, uh, that change I, I think, is probably the most significant in, in my mind. And like, I, I always like to do the mental math of like, uh, this is what.

Think about, uh, scheduling a learning rate, like when you’ve been wrong once. Yeah. What else were you wrong on? Um, and I, I’m kind of working through it. I, I, to me, the, the, the other thing was the coding one, um, which obviously I, I have now come full 360 on, but I think like. People are not appreciating dark factories enough, which I don’t know if you’ve discussed in the pod yet.

[00:47:44] Jacob Effron: No.

[00:47:45] swyx: Um, uh, and so this is a kind of a strong DM slash Simon Willis term. Uh, the, the general idea is, okay, there’s different levels of AI coding psychosis. You can have, um, the, the very first level, which I, I, by the way I encountered first in cognition five months ago was zero. Uh, human written code. Yeah.

Right. Which like, seems like a reasonable thing now was less reasonable five months ago. The next frontier that sounds as crazy today as it as, as zero coding was in in the past is zero Human review.

[00:48:17] Jacob Effron: Yeah.

[00:48:18] swyx: Like, just, just check it in without even. Reviewing it, and very few people are doing that, but opening Eyes is, is exploring this and I feel like it’s, it’s definitely the only scalable way to do this.

Uh, which it just means like you have to just kind of like flip the S-S-D-L-C or change large amounts of what, what you normally do. Um. Which is probably things you should have done anyway. More testing, more, you know, more automated verification or whatever. But like that is a frontier at which, like when you have unlocked that in your companies, um, you are just gonna produce much more quantity of software than than you’ve ever had.

Uh, and it’s gonna be like so much, so disposable, so cheap that you can probably innovate in quality a lot as well. Like that that quantity helps you get to quality.

[00:49:00] Jacob Effron: Yeah.

[00:49:01] swyx: Which I think people are very uncomfortable with. ‘cause like people associate more quantity with slop.

[00:49:07] Jacob Effron: Right. No, it’s back to exactly the discussion we’re having on like the reaction to these token maxing scoreboards and the, and the idea that like, today, maybe that’s not the most, uh, the, the, the, the best sign of, of, of productivity in efficiency, but going forward

[00:49:18] swyx: yeah, you, but you still get rewarded for it.

So they’re like, fuck it, whatever. But like, uh, I, I, I think like the, the, the people who are, who are doing well, who do well, who do most well in 2026, are not the cynics who go like, oh, that’s just slop. I’m not gonna participate in that. They’re like, okay, like this is happening with, with or without me. Bend this the right way.

[00:49:36] Jacob Effron: Yeah, no, I love that. Um, I mean, I think for, for me, like any kind of related thing on, on the open source model side is for so long, I really didn’t think it made any sense to do any sort of RL post-training, pre-training, anything you could do to like improve kind of overall quality. Certainly for like latency and cost, it always made sense to me.

But for overall quality, like God, you just get that for free in the models like three, six months later. I, I think what I’m starting to change my tune on a little bit is. You know, hearing all these app companies talk about, like, you know, we build stuff and then we throw it out three months later, as, as like the models improve.

You’re like, okay, well then what you’re doing for capability improvement is just another version of that, right? Like, I still don’t think that like your RL or like post train is gonna make you have a better model for like. Years and years to come. But maybe I, I think you still have to be pretty rigorous on like, is that the single best thing you can do to solve a customer problem?

And like, you know, oftentimes, like, it’s literally just like now, like add more data and like feed more data even via connectors to these models or like, I don’t know, do some clever engineering on the back end or whatever it is. But at the single best thing you can do for that three month time period to improve your customer’s outcomes is, you know, post-training in some way that like really improves the output of model even if you throw it out three months later because the general models get up there.

It still might have been worth doing. And so I think I’m like more open to

[00:50:45] swyx: you, you throw out the results, but you don’t throw out the raw data.

[00:50:47] Jacob Effron: Totally.

[00:50:48] swyx: And like, so like

[00:50:48] Jacob Effron: Right. Then you just run it again. And so basically there’s some, obviously at the level of cost of like $10 million, maybe that’s too much, but there’s some level of cost where

[00:50:55] swyx: No,

[00:50:55] Jacob Effron: it’s the, it’s

[00:50:56] swyx: not even 10 million,

[00:50:56] Jacob Effron: right?

No, of course it’s not. Uh, you know,

[00:50:58] swyx: yeah.

[00:50:58] Jacob Effron: There’s obviously some level of investment, uh, at which it’s the equivalent of just like staffing four engineers to go build something for three months.

[00:51:04] swyx: Yeah. Uh, so the other thing I really, uh, for, for listeners, I’m just gonna leave some, some droplets of info. Uh, look into like the, the long trajectory, the synthetic rubrics work that people are doing is very important, uh, including, uh, something that’s called Doctor GRPO.

I’ll just, I’ll just leave those key search terms in there. Um, I, I think it, what it means is that RL is going much more multi turn than. People think, and that means that you can customize the models in way more specific dimensions than traditional, let’s call it SFT, or uh, uh, you know, like a, a sort of shallow rl, um, that was done in a year ago.

Um, so like hundreds of turns.

[00:51:44] Jacob Effron: Yeah.

[00:51:45] swyx: Uh, and, and, and I think that that leads you down a path of like complete domain specificity.

[00:51:50] Jacob Effron: What else? Like are you, you know, uh, of these like unanswered questions in AI today? Are you like looking for, you know, in the next year? Are you, you, uh, you know, paying close attention to,

[00:51:58] swyx: I, I have a few thesis for like, what?

Is the sort of next frontier. Uh, one is memory, which memory and personalization we talked about. The other is really, uh, world models, which we’ve done a small little series on from Fefe Lee. Yeah, of course. To, uh, even Moon Lake. Um, and, uh, general intuition and there’s a lot of debate as to like. The relative importance of this.

I think a lot of it, it manifests as like 3D static walls that you kind of inhabit for a little bit and you walk around and they’re like, cool, but like, how does this help me with my B2B SaaS? Right. And

[00:52:29] Jacob Effron: it’s like all the hype now is robotics, right?

[00:52:31] swyx: Yeah. Um, and there’s a, obviously a correlation between, uh, role models and embodied.

Uh, vision and experiences, which leads to robotics. Uh, but I think role models is very interesting in just in improving intelligence itself. Um, from the next, from the next token prediction paradigm. Um, and so I think people are kind of testing their edges around that. One of our top articles this year so far has been on adversarial award models.

Um. I, I do think, like, uh, if you don’t do anything else, just read FE’S essay on spatial intelligence on why, um, LMS don’t need, don’t have it. And she is, she may, she may not have the solution yet, but she has the right problems statement. Yeah. And so everyone else is trying to solve that problem statement in their own way.

Um. And let’s see who wins. But like, I, I don’t think it does you any favor to equate role models to robotics or role models to gaming or some kind of like, uh, or like the current manifestations because what is at stake is a much more important. Conception of intelligence than just answering questions.

It is, does, does, does, does the AI understand what a table is? Like, what, what matter is, what physics is? It is almost like for, for those who are movie fans, it’s like Google Hunting where, um, Matt Damon like knows everything because he read it in a book, but he’s never lived. Great,

[00:53:54] Jacob Effron: great scene with

[00:53:55] swyx: Robin Williams.

With Robin Williams and I, I look at that scene and I go like, that’s exactly the, the, the difference between like a very intelligent LLM who knows everything but hasn’t experienced anything.

[00:54:04] Jacob Effron: Wow. That’s an awesome note to end on. Uh, that’s a, have you used that before? That’s great.

[00:54:08] swyx: Yeah. So, so one thing I’ve done with Lean Space is I moved to like, uh, adding daily writeups.

Yeah. And so one, one of the times I was doing this daily writeup, I wrote that.

[00:54:16] Jacob Effron: That’s a great

[00:54:17] swyx: one. I love

[00:54:17] Jacob Effron: that. Um, well, so it’s been a ton of fun. Thanks so much

[00:54:19] swyx: for, for Coming Man.

[00:54:21] Jacob Effron: I’m Jacob Effron and this has been Unsupervised Learning. A podcast where I get to talk to the smartest people in AI and ask them tons of questions about what’s happening with models and what it means for businesses in the world.

As I hope is clear, I have a ton of fun doing this. It’s a nights and weekends project in addition to my day job as an investor at RedPoint, but our ability to get these incredible guests on really comes from folks like you subscribing to the podcast, sharing it with friends. It’s really what ultimately makes this whole thing work.

And so please consider doing that. And thank you so much for your support and listening. We’ll see you next episode.

AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)

Full Episode on Their YouTube

We discuss:

Timestamps

Transcript

Discussion about this episode

Ready for more?