LLaVA: A large multi-modal language model
Learn Data with Mark Learn Data with Mark
7.05K subscribers
2,410 views
0

 Published On Dec 10, 2023

In this video, we'll learn about LLaVA (Large Language And Vision Assistant), a multimodal model that integrates a CLIP vision encoder and the VICUNA LLM.

We'll see how well it gets on describing a cartoon cat, a photo of me with AI generated parrots, and a bunch of images created by the Mid Journey Generative AI tool.

And most importantly, we'll find out whether it knows who Cristiano Ronaldo is!

#AI #MultimodalModels #llava #GPT4 #ImageRecognition #Streamlit #MachineLearning #AndrewNg #llms

LLaVA - https://llava-vl.github.io/
llamafile - https://github.com/Mozilla-Ocho/llama...
Mid Journey Dataset - https://huggingface.co/datasets/vivym...
Streamlit app - https://github.com/mneedham/LearnData...
Code repository - https://github.com/mneedham/LearnData...

show more

Share/Embed