GPT-Fast - blazingly fast inference with PyTorch (w/ Horace He)
3,358 views
0

 Published On Mar 7, 2024

Become a Patreon:   / theaiepiphany  
👨‍👩‍👧‍👦 Join our Discord community:   / discord  

Horace He joined us today to talk more about how to make inference fast using just PyTorch native operations!

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
https://pytorch.org/blog/accelerating...
https://github.com/pytorch-labs/gpt-fast
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

⌚️ Timetable:
00:00 - 00:45 Intro
00:45 - 02:23 HyperStack GPUs! (sponsored)
02:23 - 08:40 What is GPT-Fast?
08:40 - 28:15 PyTorch compile
28:15 - 32:15 int8 quantization
32:15 - 40:12 Speculative Decoding
40:12 - 42:05 Int 4 quantization
42:05 - 45:25 Putting it all together, tensor parallelism
45:25 - 58:10 Bonus optimizations
58:10 - 01:05:04 Outro, questions

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
💰 SPONSOR

The AI Epiphany -   / theaiepiphany  
One-time donation - https://www.paypal.com/paypalme/theai...

Huge thank you to these AI Epiphany patreons:
Eli Mahler
Petar Veličković

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

💼 LinkedIn -   / aleksagordic  
🐦 Twitter -   / gordic_aleksa  
👨‍👩‍👧‍👦 Discord -   / discord  

📺 YouTube -    / theaiepiphany  
📚 Medium -   / gordicaleksa  
💻 GitHub - https://github.com/gordicaleksa
📢 AI Newsletter - https://aiepiphany.substack.com/

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

#gptfast #inference #pytorch

show more

Share/Embed