ChatNetAI

Published

- 1 min read

LLAVA

img of LLAVA

#LLaVA

address

Click to view 👆 click 👋

Introduction

LLaVA (Large Language and Vision Assistant) is a large multi-modal model jointly released by researchers from the University of Wisconsin-Madison, Microsoft Research, and Columbia University. The model demonstrates some image and text understanding capabilities approaching multimodal GPT-4: achieving a relative score of 85.1% relative to GPT-4. When fine-tuned on Science QA, the synergy of LLaVA and GPT-4 achieved a new SoTA with 92.53% accuracy.

Features

  1. Free image recognition capabilities
  2. Support adjusting parameters

screenshot