GPT-Fast - blazingly fast inference with PyTorch (w/ Horace He)

51.8K subscribers

3,358 views

About
Share

Published On Mar 7, 2024

Become a Patreon:   / theaiepiphany
👨‍👩‍👧‍👦 Join our Discord community:   / discord

Horace He joined us today to talk more about how to make inference fast using just PyTorch native operations!

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
https://pytorch.org/blog/accelerating...
https://github.com/pytorch-labs/gpt-fast
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

⌚️ Timetable:
00:00 - 00:45 Intro
00:45 - 02:23 HyperStack GPUs! (sponsored)
02:23 - 08:40 What is GPT-Fast?
08:40 - 28:15 PyTorch compile
28:15 - 32:15 int8 quantization
32:15 - 40:12 Speculative Decoding
40:12 - 42:05 Int 4 quantization
42:05 - 45:25 Putting it all together, tensor parallelism
45:25 - 58:10 Bonus optimizations
58:10 - 01:05:04 Outro, questions

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
💰 SPONSOR

The AI Epiphany -   / theaiepiphany
One-time donation - https://www.paypal.com/paypalme/theai...

Huge thank you to these AI Epiphany patreons:
Eli Mahler
Petar Veličković

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

💼 LinkedIn -   / aleksagordic
🐦 Twitter -   / gordic_aleksa
👨‍👩‍👧‍👦 Discord -   / discord

📺 YouTube -    / theaiepiphany
📚 Medium -   / gordicaleksa
💻 GitHub - https://github.com/gordicaleksa
📢 AI Newsletter - https://aiepiphany.substack.com/

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

#gptfast #inference #pytorch

Published On Mar 7, 2024

Share/Embed

Video Link