Information Theory for Language Models: Jack Morris

Latent Space: The AI Engineer Podcast

0:00

-1:18:13

Information Theory for Language Models: Jack Morris

Jul 02, 2025

Our last AI PhD grad student feature was Shunyu Yao, who happened to focus on Language Agents for his thesis and immediately went to work on them for OpenAI. Our pick this year is Jack Morris, who bucks the “hot” trends by -not- working on agents, benchmarks, or VS Code forks, but is rather known for his work on the information theoretic understanding of LLMs, starting from embedding models and latent space representations (always close to our heart).

Jack is an unusual combination of doing underrated research but somehow still being to explain them well to a mass audience, so we felt this was a good opportunity to do a different kind of episode going through the greatest hits of a high profile AI PhD, and relate them to questions from AI Engineering.

Papers and References made

AI grad school:

dr. jack morris@jxmnop

1:49 PM · Jun 14, 2025 · 118K Views

32 Replies · 213 Reposts · 2.05K Likes

A new type of information theory:

dr. jack morris@jxmnop

# A new type of information theory this paper is not super well-known but has changed my opinion of how deep learning works more than almost anything else it says that we should measure the amount of information available in some representation based on how *extractable* it is,

6:26 PM · Mar 24, 2025 · 231K Views

70 Replies · 360 Reposts · 2.76K Likes

Embeddings
- Text Embeddings Reveal (Almost) As Much As Text: https://arxiv.org/abs/2310.06816
- Contextual document embeddings https://arxiv.org/abs/2410.02525
  Harnessing the Universal Geometry of Embeddings: https://arxiv.org/abs/2505.12540
Language models
- GPT-style language models memorize 3.6 bits per param:

dr. jack morris@jxmnop

new paper from our work at Meta! **GPT-style language models memorize 3.6 bits per param** we compute capacity by measuring total bits memorized, using some theory from Shannon (1953) shockingly, the memorization-datasize curves look like this: ___________ / / (🧵)

2:08 PM · Jun 3, 2025 · 409K Views

81 Replies · 373 Reposts · 3.32K Likes

- Approximating Language Model Training Data from Weights: https://arxiv.org/abs/2506.15553

dr. jack morris@jxmnop

NEW RESEARCH: Approximating Language Model Training Data from Weights ever wonder how much information is available in an open-weights model? DeepSeek R1 weights are 1.2 TB... what can we learn from all those bits? our method reverses LLM finetuning to recover data: 🧵

12:53 PM · Jun 20, 2025 · 87.3K Views

27 Replies · 113 Reposts · 1.16K Likes

LLM Inversion
“There Are No New Ideas In AI.... Only New Datasets”

dr. jack morris@jxmnop

new blog post "There Are No New Ideas In AI.... Only New Datasets" in which i summarize LLMs in exactly four breakthroughs and explain why it was really *data* all along that mattered... not algorithms

9:47 PM · Apr 9, 2025 · 381K Views

64 Replies · 310 Reposts · 2.81K Likes

Token for Token

There Are No New Ideas in AI… Only New Datasets

Most people know that AI has made unbelievable progress over the last fifteen years– especially in the last five. It might feel like that progress is *inevitable* – although large paradigm-shift-level breakthroughs are uncommon, we march on anyway through a stream of slow & steady progress. In fact, some researchers have recently declared a…

10 months ago · 294 likes · 28 comments · Jack Morris

misc reference: https://junyanz.github.io/CycleGAN/

—

for others hiring AI PhDs, Jack also wanted to shout out his coauthor

Zach Nussbaum, his coauthor on Nomic Embed: Training a Reproducible Long Context Text Embedder.

Full Video Episode

Timestamps

00:00 Introduction to Jack Morris
01:18 Career in AI
03:29 The Shift to AI Companies
03:57 The Impact of ChatGPT
04:26 The Role of Academia in AI
05:49 The Emergence of Reasoning Models
07:07 Challenges in Academia: GPUs and HPC Training
11:04 The Value of GPU Knowledge
14:24 Introduction to Jack's Research
15:28 Information Theory
17:10 Understanding Deep Learning Systems
19:00 The "Bit" in Deep Learning
20:25 Wikipedia and Information Storage
23:50 Text Embeddings and Information Compression
27:08 The Research Journey of Embedding Inversion
31:22 Harnessing the Universal Geometry of Embeddings
34:54 Implications of Embedding Inversion
36:02 Limitations of Embedding Inversion
38:08 The Capacity of Language Models
40:23 The Cognitive Core and Model Efficiency
50:40 The Future of AI and Model Scaling
52:47 Approximating Language Model Training Data from Weights
01:06:50 The "No New Ideas, Only New Datasets" Thesis

Information Theory for Language Models: Jack Morris

Full Video Episode

Timestamps

Discussion about this episode

Ready for more?