We’re putting the “news” back in “newsletter”. Presenting our first monthly recap of everything we’ve heard that is relevant to the AI Engineer. This is free for current subscribers and then moved to supporters each month. Let us know what you think!
AI Fall. I originally wanted to title this letter “AI Fall” - somewhere between AI Summer and Winter - and the month started out that way, with Suhail and Sarah calling peak AI hype and Rand piling on the negative ChatGPT datapoints.
August is traditionally a slow news month (everything kicks back up after Labor day in the US tech news cycle)…
…But by the end of month VC markets were “completely wild” again, with Anthropic, Modular, HuggingFace and W&B all raising >$50m rounds and blowout Nvidia earnings showing no slowdown.
The real AI Fall calendar is packed, with at least 3 AI conferences happening in San Francisco (Ted AI, AIconf, and our own AI Engineer Summit) and both Google and Anthropic teasing large model releases on the way.
Finetuning is all you need? Probably the strongest theme of the month was on finetuning everything. OpenAI launched their finetuning API for GPT 3.5, and last month’s releases of Llama were followed by CodeLlama at Meta. Both are being actively finetuned by the cutting edge of AI Engineers (see notes below for rabbit holes), with notable results from Phind finetuning CodeLlama to beat GPT-4. There is always ongoing debate on the battle between Prompting and Finetuning, and those who have explored this before have generally ended up prompting because of the ease of use and iteration speed. However the AI Eng community will likely make much more headway this time around with all the new resources we have this month - both LangChain and LlamaIndex have already released great resources for finetuning (see below).
Multiplayer Agents. This is an ongoing theme since May, but both MetaGPT and the now-open-source Generative Agents paper are continuing to push forward the idea of autonomous agents collaborating with each other to do more than one can on its own. This is, in a sense, “very advanced chain of thought”, spending a horrendous amount of tokens for improved reflection and using roleplay to self critique, but there is real value here and lots to learn if you want to see the SOTA in prompt engineering.
We speculated that the infrastructure complement to multi agent systems is agent clouds, and this month E2B shipped their first AI Playgrounds. There is also multi-company collaboration proceeding on the Agent Protocol which will hopefully see a more formal announcement at the AI Eng Summit.
One Year of Stable Diffusion. Did you miss it? Stable Diffusion 1.0 was released in August 2022 to much hype (which precipitated our first post) and then followed up by SD 2.0 in Nov 2022 (which we covered) and SDXL this July. Stability has also made progress in other domains, releasing StableLM in April and StableCode this month. It has also been wracked by a string of executive departures, with Bloomberg publishing a constant barrage of hit pieces and Emad defending himself. Meanwhile MidJourney seems to be doing well and the former Google Imagen team has now emerged with Ideogram, making the text to image space that much more competitive, though Stability is still king in open source.
RWKV: Reinventing RNNs for the Transformer Era — our first pod on an alternative to Transformers and the open source, non-US AI community
Cursor.so: The AI-first Code Editor — Cursor.so blew up this month, and we were proud to be Aman’s first ever pod.
The Mathematics of Training LLMs - we tend to focus on inference over training, but were honored to follow up Transformers Math 101 with Quentin, our first, and definnitely not last, guest from Eleuther AI. A big hit on HN.
LLMs Everywhere - our dive into Machine Learning Compilation with TQ, followed up later this month by the launch of WebLLM which can run Llama2 70B in the browser
Summer AI Technical Roundup - our third crossover pod with other AI podcasts, with a personal favorite, NLW of the AI Breakdown.
We ran Anything But Wrappers: Llama Finetune Edition with Brev.dev (tweet recap, video)
The raw notes from which I draw everything above. You can always see the raw material on Github and of course I can’t see everything, check the other newsletters and podcasters that do.
ChatGPT UI Updates (tweet, news coverage)
1. Example prompts: No more staring at a blank page!
2. Suggested replies: ChatGPT automatically synthesizes follow up questions.
3. GPT-4 by default: When starting a new chat as a Plus user, ChatGPT will remember your previously selected model.
4. Uploading multiple files is now supported in the Code Interpreter beta for all Plus users.
5. Stay logged in: You’ll no longer be logged out every 2 weeks + new welcome page
6. Keyboard shortcuts: Work faster with shortcuts, Try ⌘ (Ctrl) + / to see the complete list.
V4 of TypeScript/Node.js SDK (v3 → v4 migration guide, tweets)
Streaming responses for chat & completions
Carefully crafted TypeScript types
Support for ESM, Vercel edge functions, Cloudflare workers, & Deno
Better file upload API for Whisper, fine-tune files, & DALL·E images
Improved error handling through automatic retries & error classes
Increased performance via TCP connection reuse
Simpler initialization logic
Big win for StainlessAPI, a new startup that autogenerates SDKs.
GPT3.5 Finetuning API (blog, docs, Scale AI as official partner)
Acquired Global Illumination - a small game design studio
ChatGPT Enterprise launched
we previously talked to Shyamal Anandkat when it was still called ChatGPT for Business, announced in April
A weird move - Microsoft launched an “Azure ChatGPT” repo which was then taken down. Must have been fun convos between OpenAI and Microsoft.
OpenAI Passes $1 Billion Revenue Pace as Big Companies Boost AI Spending
Sherwin Wu (EM of API team) gave a talk on function calling
Logan picked out top 7 cookbooks to know
more Custom Instructions discussions in the community
I will focus on notable text, code, vision, and speech models here, and put the other modalities in the Misc section
Meta CodeLlama (Blog, Paper, GitHub, HN discussion)
With lots of interest in finetuning Llama and CodeLlama
Good month for Chinese-origin models competitive to US ones
IDEFICS vision model (HF space, Jim Fan)
Meta’s Seamless M4T translation model (post)
the first multimodal model representing a significant breakthrough in speech-to-speech and speech-to-text translation and transcription. Publicly-released under a CC BY-NC 4.0 license, the model supports nearly 100 languages for input (speech + text), 100 languages for text output and 35 languages (plus English) for speech output.
I've tried the demo. It’s good.
(thanks to Roboflow team for filling me in on the latest in CV news)
Langchain released an “expression language” - upcoming guest!
LlamaIndex 0.8.0 - was a big update!
“We’ve made huge changes that improve our default values for everything: LLM, prompts, text splitters, and more. Get more performant RAG/agents out of the box without painful customization/tuning. NOTE: breaking changes” (thread recap)
also continuing to make push on Data Agents.
Hegel AI Prompttools, repo
Outlines from Normal Computing
generates valid JSON matching a regex. compare with JSONformer and Guidance.
They are quite confident in their methodology for speed: “in each state we get a list of symbols which correspond to completions that partially match the regular expression. We mask the other symbols in the logits returned by a large language model, sample a new symbol and move to the next state. The subtelty is that language models work with tokens, not symbols, so we derive a new FSM whose alphabet is the model's vocabulary. We can do this in only one pass over the vocabulary.”
Langfuse - open source o11y for LLM apps
Wrappers Delight - Yohei’s latest
Light-weight open-source OpenAI wrapper to add:
Auto-log every interaction, Simple analytics, AI-assisted query of logs, (optional) Reflection of prompts, UI-template
Thiggle - exposes ReLLM and ParserLLM projects as API
DSPy - the framework for solving advanced tasks w/ LMs.
Express *any* pipeline as clean, Pythonic control flow. Just ask DSPy to 𝗰𝗼𝗺𝗽𝗶𝗹𝗲 your modular code into auto-tuned chains of prompts or finetunes for GPT, Llama, and/or T5 (thread)
builds on DSP from January: “Demonstrate–Search–Predict (𝗗𝗦𝗣), a framework for composing search and LMs w/ up to 120% gains over GPT-3.5. No more prompt engineering - Describe a high-level strategy as imperative code and let 𝗗𝗦𝗣 deal with prompts and queries”
Roboflow launched Inference: An easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.
Sweep.dev - File Issues to PR
Cloudflare launced an AI microsite
Poozle - "Plaid for LLMs"
Rag-Stack: Deploy a private ChatGPT alternative hosted within your VPC. Connect it to your organization's knowledge base and use it as a corporate oracle. Supports open-source LLMs like Llama 2, Falcon, and GPT4All.
Smol Podcaster (tweet) We use smol-podcaster to take care of most of Latent Space transcription work.
Generative Agents: Interactive Simulacra of Human Behavior “Smallville” code was open sourced (Jim Fan)
MetaGPT (paper, author tweet), framework that incorporates efficient human workflows as a meta programming approach into LLM-based multi-agent collaboration.
gets 82 on HumanEval vs GPT4’s 67
“Specifically, MetaGPT encodes Standardized Operating Procedures (SOPs) into prompts to enhance structured coordination. Subsequently, it mandates modular outputs, empowering agents with domain expertise comparable to human professionals, to validate outputs and minimize compounded errors. In this way, MetaGPT leverages the assembly line paradigm to assign diverse roles to various agents, thereby establishing a framework that can effectively and cohesively deconstruct complex multi-agent collaborative problems.”
Dangbot - “an experimental autonomous agent platform”
ChatDev - “stands as a virtual software company that operates through various intelligent agents holding different roles, including Chief Executive Officer, Chief Technology Officer, Programmer, Tester, and more. These agents form a multi-agent organizational structure”
AgentBench - A Comprehensive Benchmark to Evaluate LLMs as Agents - my comments and paper reading
Lists galore! you need an agent to go thru the lists of agents
JungleGym: Open Source Analytics Playground for AI agents
AgentFlow: Complex LLM Workflows from Simple JSON.