Published On Dec 10, 2023
In this video, we'll learn about LLaVA (Large Language And Vision Assistant), a multimodal model that integrates a CLIP vision encoder and the VICUNA LLM.
We'll see how well it gets on describing a cartoon cat, a photo of me with AI generated parrots, and a bunch of images created by the Mid Journey Generative AI tool.
And most importantly, we'll find out whether it knows who Cristiano Ronaldo is!
#AI #MultimodalModels #llava #GPT4 #ImageRecognition #Streamlit #MachineLearning #AndrewNg #llms
LLaVA - https://llava-vl.github.io/
llamafile - https://github.com/Mozilla-Ocho/llama...
Mid Journey Dataset - https://huggingface.co/datasets/vivym...
Streamlit app - https://github.com/mneedham/LearnData...
Code repository - https://github.com/mneedham/LearnData...