FlashAttention 2: making Transformers 800% faster w/o approximation - with Tri Dao of Together AI
www.latent.space
How FlashAttention became the new industry standard architecture, how FlashAttention 2 is 2x faster still, life inside the Stanford Hazy Research lab, and hints of the post-Transformers future
FlashAttention 2: making Transformers 800% faster w/o approximation - with Tri Dao of Together AI
FlashAttention 2: making Transformers 800…
FlashAttention 2: making Transformers 800% faster w/o approximation - with Tri Dao of Together AI
How FlashAttention became the new industry standard architecture, how FlashAttention 2 is 2x faster still, life inside the Stanford Hazy Research lab, and hints of the post-Transformers future