The matrix math behind transformer neural networks, one step at a time!!!

1.17M subscribers

41,723 views

About
Share

Published On Apr 7, 2024

Transformers, the neural network architecture behind ChatGPT, do a lot of math. However, this math can be done quickly using matrix math because GPUs are optimized for it. Matrix math is also used when we code neural networks, so learning how ChatGPT does it will help you code your own. Thus, in this video, we go through the math one step at a time and explain what each step does so that you can use it on your own with confidence.

NOTE: This StatQuest assumes that you are already familiar with:
Transformers:    • Transformer Neural Networks, ChatGPT'...
The essential matrix algebra for neural networks:    • Decoder-Only Transformers, ChatGPTs s...

If you'd like to support StatQuest, please consider...
Patreon:   / statquest
...or...
YouTube Membership:    / @statquest

...buying my book, a study guide, a t-shirt or hoodie, or a song from the StatQuest store...
https://statquest.org/statquest-store/

...or just donating to StatQuest!
paypal: https://www.paypal.me/statquest
venmo: @JoshStarmer

Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
  / joshuastarmer

0:00 Awesome song and introduction
1:43 Word Embedding
3:37 Position Encoding
4:28 Self Attention
12:09 Residual Connections
13:08 Decoder Word Embedding and Position Encoding
15:33 Masked Self Attention
20:18 Encoder-Decoder Attention
21:31 Fully Connected Layer
22:16 SoftMax

#StatQuest #Transformer #ChatGPT

Published On Apr 7, 2024

Share/Embed

Video Link