NVIDIA's AI Engineers: Agent Inference at…

Mar 10

NVIDIA welcomes AI Engineers with a special pre GTC episode!

2 Comments

I love how they kept it real, being who they are instead of trying to be something they're not. Hard to do but really obvious in a world of vibe-coded products and vibe-coded cultures.

Really indexing into passion was an interesting way of putting it. I kinda like that. "Momentum is the only authority" is also a good one.

Two very fun guys.

Qarp

Mar 26

The framing of "speed of light" inference is fascinating — it maps directly to a tension that's playing out at the hardware layer too. Google's recent TurboQuant work (KV cache compression from 16-bit to 3-bit, 8x attention speedup on H100s) suggests we may be entering a phase where algorithmic efficiency gains outpace the need for raw memory bandwidth. Curious whether Dynamo's scheduling approach accounts for this kind of variable precision — or whether that's an orthogonal optimization layer. Wrote about what it means for AI builders here: https://agenticwork.substack.com/p/turboquant-the-algorithm-that-crashed