In an otherwise heavy week packed with Microsoft Build, Google I/O, and OpenAI io, the worst kept secret in biglab land was the launch of Claude 4, particularly the triumphant return of Opus, which many had been clamoring for. We will leave the specific Claude 4 recap to AINews, however we think that both Gemini’s progress on Deep Think this week and Claude 4 represent the next frontier of progress on inference time compute/reasoning (at last until GPT5 ships this summer).
Will Brown’s talk at AIE NYC and open source work on verifiers have made him one of the most prominent voices able to publicly discuss (aka without the vaguepoasting LoRA they put on you when you join a biglab) the current state of the art in reasoning models and where current SOTA research directions lead. We discussed his latest paper on Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment and he has previewed his AIEWF talk on Agentic RL for those with the temerity to power thru bad meetup audio.
Full Video Episode
Timestamps
00:00 Introduction to the Podcast and Guests
01:00 Discussion on Claude 4 and AI Models
03:07 Extended Thinking and Tool Use in AI
06:47 Technical Highlights and Model Trustworthiness
10:31 Thinking Budgets and Their Implications
13:38 Controversy Surrounding Opus and AI Ethics
18:49 Reflections on AI Tools and Their Limitations
21:58 The Chaos of Predictive Systems
22:56 Marketing and Safety in AI Models
24:30 Evaluating AI Companies and Their Strategies
25:53 The Role of Academia in AI Evaluations
27:43 Teaching Taste in Research
28:41 Making Educated Bets in AI Research
30:12 Recent Developments in Multi-Turn Tool Use
32:50 Incentivizing Tool Use in AI Models
34:45 The Future of Reward Models in AI
39:10 Exploring Flexible Reward Systems










