How to Build an LLM from Scratch | An Overview
Shaw Talebi Shaw Talebi
23.4K subscribers
163,845 views
0

 Published On Oct 5, 2023

Book a call: https://calendly.com/shawhintalebi

This is the 6th video in a series on using large language models (LLMs) in practice. Here, I review key aspects of developing a foundation LLM based on the development of models such as GPT-3, Llama, Falcon, and beyond.

Series Playlist:    • Large Language Models (LLMs)  
📰 Read more: https://towardsdatascience.com/how-to...

More Resources
[1] BloombergGPT: https://arxiv.org/pdf/2303.17564.pdf
[2] Llama 2: https://ai.meta.com/research/publicat...
[3] LLM Energy Costs: https://www.statista.com/statistics/1...
[4] arXiv:2005.14165 [cs.CL]
[5] Falcon 180b Blog: https://huggingface.co/blog/falcon-180b
[6] arXiv:2101.00027 [cs.CL]
[7] Alpaca Repo: https://github.com/gururise/AlpacaDat...
[8] arXiv:2303.18223 [cs.CL]
[9] arXiv:2112.11446 [cs.CL]
[10] arXiv:1508.07909 [cs.CL]
[11] SentencePience: https://github.com/google/sentencepie...
[12] Tokenizers Doc: https://huggingface.co/docs/tokenizer...
[13] arXiv:1706.03762 [cs.CL]
[14] Andrej Karpathy Lecture:    • Let's build GPT: from scratch, in cod...  
[15] Hugging Face NLP Course: https://huggingface.co/learn/nlp-cour...
[16] arXiv:1810.04805 [cs.CL]
[17] arXiv:1910.13461 [cs.CL]
[18] arXiv:1603.05027 [cs.CV]
[19] arXiv:1607.06450 [stat.ML]
[20] arXiv:1803.02155 [cs.CL]
[21] arXiv:2203.15556 [cs.CL]
[22] Trained with Mixed Precision Nvidia: https://docs.nvidia.com/deeplearning/...
[23] DeepSpeed Doc: https://www.deepspeed.ai/training/
[24] https://paperswithcode.com/method/wei...
[25] https://towardsdatascience.com/what-i...
[26] arXiv:2001.08361 [cs.LG]
[27] arXiv:1803.05457 [cs.AI]
[28] arXiv:1905.07830 [cs.CL]
[29] arXiv:2009.03300 [cs.CY]
[30] arXiv:2109.07958 [cs.CL]
[31] https://huggingface.co/blog/evaluatin...
[32] https://www.cs.toronto.edu/~hinton/ab...

--
Homepage: https://shawhintalebi.com/

Socials
  / shawhin  
  / shawhintalebi  
  / shawhint  
  / shawhintalebi  

The Data Entrepreneurs
🎥 YouTube:    / @thedataentrepreneurs  
👉 Discord:   / discord  
📰 Medium:   / the-data  
📅 Events: https://lu.ma/tde
🗞️ Newsletter: https://the-data-entrepreneurs.ck.pag...

Support ❤️
https://www.buymeacoffee.com/shawhint

Intro - 0:00
How much does it cost? - 1:30
4 Key Steps - 3:55
Step 1: Data Curation - 4:19
1.1: Data Sources - 5:31
1.2: Data Diversity - 7:45
1.3: Data Preparation - 9:06
Step 2: Model Architecture (Transformers) - 13:17
2.1: 3 Types of Transformers - 15:13
2.2: Other Design Choices - 18:27
2.3: How big do I make it? - 22:45
Step 3: Training at Scale - 24:20
3.1: Training Stability - 26:52
3.2: Hyperparameters - 28:06
Step 4: Evaluation - 29:14
4.1: Multiple-choice Tasks - 30:22
4.2: Open-ended Tasks - 32:59
What's next? - 34:31

show more

Share/Embed