Published On Mar 7, 2024
Become a Patreon: / theaiepiphany
👨👩👧👦 Join our Discord community: / discord
Horace He joined us today to talk more about how to make inference fast using just PyTorch native operations!
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
https://pytorch.org/blog/accelerating...
https://github.com/pytorch-labs/gpt-fast
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
⌚️ Timetable:
00:00 - 00:45 Intro
00:45 - 02:23 HyperStack GPUs! (sponsored)
02:23 - 08:40 What is GPT-Fast?
08:40 - 28:15 PyTorch compile
28:15 - 32:15 int8 quantization
32:15 - 40:12 Speculative Decoding
40:12 - 42:05 Int 4 quantization
42:05 - 45:25 Putting it all together, tensor parallelism
45:25 - 58:10 Bonus optimizations
58:10 - 01:05:04 Outro, questions
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
💰 SPONSOR
The AI Epiphany - / theaiepiphany
One-time donation - https://www.paypal.com/paypalme/theai...
Huge thank you to these AI Epiphany patreons:
Eli Mahler
Petar Veličković
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
💼 LinkedIn - / aleksagordic
🐦 Twitter - / gordic_aleksa
👨👩👧👦 Discord - / discord
📺 YouTube - / theaiepiphany
📚 Medium - / gordicaleksa
💻 GitHub - https://github.com/gordicaleksa
📢 AI Newsletter - https://aiepiphany.substack.com/
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
#gptfast #inference #pytorch