How the Gemma/Gemini Tokenizer Works - Gemma/Gemini vs GPT-4 vs Mistral
Chris Hay Chris Hay
14K subscribers
1,251 views
0

 Published On Feb 25, 2024

in this video, we go under the hood of the gemini and gemma-7b and gemma-2b tokenizer. we look at the large vocabulary and the impact that it has on the size of the model, and how Google has put a focus on people, places, culture, languages and things over efficient vocabulary and frequent sub-words. in this video chris introduced his new tokenizer benchmark test, dataset and tokenizer visualizer tools

github
---------------
https://github.com/chrishayuk/tokeniz...

show more

Share/Embed