[AIEWF Preview] Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect

Latent Space: The AI Engineer Podcast

0:00

-39:57

[AIEWF Preview] Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect

May 23, 2025

In an otherwise heavy week packed with Microsoft Build, Google I/O, and OpenAI io, the worst kept secret in biglab land was the launch of Claude 4, particularly the triumphant return of Opus, which many had been clamoring for. We will leave the specific Claude 4 recap to AINews, however we think that both Gemini’s progress on Deep Think this week and Claude 4 represent the next frontier of progress on inference time compute/reasoning (at last until GPT5 ships this summer).

Will Brown’s talk at AIE NYC and open source work on verifiers have made him one of the most prominent voices able to publicly discuss (aka without the vaguepoasting LoRA they put on you when you join a biglab) the current state of the art in reasoning models and where current SOTA research directions lead. We discussed his latest paper on Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment and he has previewed his AIEWF talk on Agentic RL for those with the temerity to power thru bad meetup audio.

Full Video Episode

Timestamps

00:00 Introduction to the Podcast and Guests

01:00 Discussion on Claude 4 and AI Models

03:07 Extended Thinking and Tool Use in AI

06:47 Technical Highlights and Model Trustworthiness

10:31 Thinking Budgets and Their Implications

13:38 Controversy Surrounding Opus and AI Ethics

18:49 Reflections on AI Tools and Their Limitations

21:58 The Chaos of Predictive Systems

22:56 Marketing and Safety in AI Models

24:30 Evaluating AI Companies and Their Strategies

25:53 The Role of Academia in AI Evaluations

27:43 Teaching Taste in Research

28:41 Making Educated Bets in AI Research

30:12 Recent Developments in Multi-Turn Tool Use

32:50 Incentivizing Tool Use in AI Models

34:45 The Future of Reward Models in AI

39:10 Exploring Flexible Reward Systems

[AIEWF Preview] Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect

Full Video Episode

Timestamps

Discussion about this episode

Ready for more?