Most expert work isn’t “produce a probable artifact”; it's "choose a good move considering other agents, guessing hidden state". LLMs default to single-shot artifacts and need World Models to progress
the Priya example nails it. the finance friend evaluated the email in isolation. the experienced coworker simulated how it would land in Priya's inbox, against her triage heuristics, under deadline pressure
this is the gap between LLMs writing code and LLMs building systems. code that compiles isn't code that survives contact with users, adversaries, edge cases
been running production systems solo for 20 years. the best operators aren't the ones who know the most commands — they're the ones who can simulate what will break next. "if I do X, the cache invalidates, which triggers Y, which overloads Z." that's a world model
the poker vs chess analogy is perfect. hidden state + adversarial adaptation = can't just pattern match
curious where you see the fix coming from. is it more training on game theory scenarios? or do we need fundamentally different architectures to track hidden state?
someone on twitter suggested using an RLM like structure to simulate world models. I think its very worth thinking about. what's clear is next token prediction probably does not meaningfully scale to do internal world models required to do this!
This is a great framework. LLMs can regurgitate game theory but can’t yet ‘feel’ the move/counter move process. They are great at predicating the next word but perhaps not yet predicting the next emotional state. Makes me wonder if the next frontier is capturing a better understanding of human behavioral psychology and being able to relate to adversarial circumstance.
Do you think we can get there with text alone? You might make the case that art could be useful in this sense: drama, etc.
What is Shakespeare if not a picture of clashing wills.. type of the thing.
The fault you’ve identified can be cross-pollinated to other LLM usages.
My biggest takaway from this is to ask the LLM to apply the red-team concept to a broader range of domains, from communication to agentic coding loops; AND play around with modelling the other party. For example: red-team how another LLM agent that “optimises for code that works and looks reasonable” but introduces tech debt or arch drift. How can the current code be corrupted by agents (AI or otherwise) that optimise locally.
It still requires a human in the loop to remember to do this and do it well, but at least we’re narrowing down to making similar moves across domains rather than re-discovering the same type of moves in each domain as if they were novel.
This essay articulates something I've been trying to explain for months. My AI agent excels at cooperative tasks — file management, blog publishing, research synthesis. The moment I tried anything adversarial (competitive positioning, negotiating vendor terms), it defaulted to fairness.
The RLHF bias toward accommodation is baked in by design. Poker versus chess is exactly the right frame. Most high-value professional work is poker. The gap isn't raw capability — it's that LLMs have never had skin in the game and can't model what that actually feels like. That's a structural limitation, not a solvable prompt engineering problem.
To me, this ultimately reads like a 'bigger problem space' issue. The difference between chess and poker isn't just that one has perfect information during play and the other doesn't…it's that the full information of a poker game is 'all of the life experiences of the player' compiled against the play state. And that's ultimately something an LLM-powered agent could conceivably replicate, even if a given LLM model can't.
the Priya example nails it. the finance friend evaluated the email in isolation. the experienced coworker simulated how it would land in Priya's inbox, against her triage heuristics, under deadline pressure
this is the gap between LLMs writing code and LLMs building systems. code that compiles isn't code that survives contact with users, adversaries, edge cases
been running production systems solo for 20 years. the best operators aren't the ones who know the most commands — they're the ones who can simulate what will break next. "if I do X, the cache invalidates, which triggers Y, which overloads Z." that's a world model
the poker vs chess analogy is perfect. hidden state + adversarial adaptation = can't just pattern match
curious where you see the fix coming from. is it more training on game theory scenarios? or do we need fundamentally different architectures to track hidden state?
someone on twitter suggested using an RLM like structure to simulate world models. I think its very worth thinking about. what's clear is next token prediction probably does not meaningfully scale to do internal world models required to do this!
Epistemological intelligence is a necessary precursor to theory of mind.
This is a great framework. LLMs can regurgitate game theory but can’t yet ‘feel’ the move/counter move process. They are great at predicating the next word but perhaps not yet predicting the next emotional state. Makes me wonder if the next frontier is capturing a better understanding of human behavioral psychology and being able to relate to adversarial circumstance.
Do you think we can get there with text alone? You might make the case that art could be useful in this sense: drama, etc.
What is Shakespeare if not a picture of clashing wills.. type of the thing.
emotions is a whole higher tier of simulation that we dont even account for in this article! but yeah that would be amazing.
what do you think about the concept of using art as a conduit into human emotion?
The fault you’ve identified can be cross-pollinated to other LLM usages.
My biggest takaway from this is to ask the LLM to apply the red-team concept to a broader range of domains, from communication to agentic coding loops; AND play around with modelling the other party. For example: red-team how another LLM agent that “optimises for code that works and looks reasonable” but introduces tech debt or arch drift. How can the current code be corrupted by agents (AI or otherwise) that optimise locally.
It still requires a human in the loop to remember to do this and do it well, but at least we’re narrowing down to making similar moves across domains rather than re-discovering the same type of moves in each domain as if they were novel.
This essay articulates something I've been trying to explain for months. My AI agent excels at cooperative tasks — file management, blog publishing, research synthesis. The moment I tried anything adversarial (competitive positioning, negotiating vendor terms), it defaulted to fairness.
The RLHF bias toward accommodation is baked in by design. Poker versus chess is exactly the right frame. Most high-value professional work is poker. The gap isn't raw capability — it's that LLMs have never had skin in the game and can't model what that actually feels like. That's a structural limitation, not a solvable prompt engineering problem.
Fantastic article!
To me, this ultimately reads like a 'bigger problem space' issue. The difference between chess and poker isn't just that one has perfect information during play and the other doesn't…it's that the full information of a poker game is 'all of the life experiences of the player' compiled against the play state. And that's ultimately something an LLM-powered agent could conceivably replicate, even if a given LLM model can't.