Qwen released QVQ, the first open-weight model for visual reasoning
AlibabaDecember 25, 2024
Building on the foundation of Qwen2-VL-72B, QvQ integrates architectural improvements that enhance cross-modal reasoning. Its open-weight design underscores the team’s commitment to making advanced AI more accessible.
Article
Tweets
🎄Happy holidays and we wish you enjoy this year. Before moving to 2025, Qwen has the last gift for you, which is QVQ!
— Qwen (@Alibaba_Qwen) December 24, 2024
🎉 This may be the first open-weight model for visual reasoning. It is called QVQ, where V stands for vision. It just reads an image and an instruction, starts… pic.twitter.com/BX1ORiltIf
QVQ:突破视觉智能的里程碑 - Qwen团队最新多模态推理模型解析
— meng shao (@shao__meng) December 24, 2024
TL;DR
QVQ是一个基于Qwen2-VL-72B开发的开创性AI模型,通过结合视觉和语言能力,在数学推理、科学分析等复杂任务中展现出卓越表现,特别是在MMMU基准测试中获得70.3的高分,标志着AI在视觉理解和推理能力方面的重大突破
基本介绍
-… https://t.co/hhun89O3Qd pic.twitter.com/tvANoJo0O5
QvQ-72B-Preview now on MLX 🚀🎄
— Prince Canuma (@Prince_Canuma) December 24, 2024
TLDR
🏆SoTA open-source multimodal
🧠 Capable of step-by-step reasoning
💪🏾 Competitive MMMU score with o1, GPT-4o and Sonnet 3.5
🔥 Beats GPT-4o and Sonnet 3.5 on MathVista and MathVision
You can now run inference and finetune (QLora) locally on… https://t.co/qaVJ2AhoPA pic.twitter.com/hUq8EChYwW
comments