Published
- 1 min read
LLAVA
#LLaVA
address
Click to view 👆 click 👋
Introduction
LLaVA (Large Language and Vision Assistant) is a large multi-modal model jointly released by researchers from the University of Wisconsin-Madison, Microsoft Research, and Columbia University. The model demonstrates some image and text understanding capabilities approaching multimodal GPT-4: achieving a relative score of 85.1% relative to GPT-4. When fine-tuned on Science QA, the synergy of LLaVA and GPT-4 achieved a new SoTA with 92.53% accuracy.
Features
- Free image recognition capabilities
- Support adjusting parameters