DeepSeek open-sourced DeepSeek-V3-Base, a 685B parameter model
DeepseekDecember 25, 2024
LiveBench reported by r/LocalLlama - DeepSeek v3 is the BEST open weight LLM AND SECOND BEST non-reasoning LLM after `gemini-exp-1206`
Video
(46) DeepSeek-V3 (Free API) + Cline & Aider : This is The BEST AI Coding Setup Right Now! (Beats Cursor!) - YouTube
Article
Tweets
‼ DeepSeek chat is powered by V3 and is powerful ‼
— Ivan Fioravanti ᯅ (@ivanfioravanti) December 25, 2024
Here an MVP of Asteroids game with AI companies logos. Fully built with it in few minutes!
Sonnet 3.5 is not the King 👑 anymore 🤷♂️
Anthropic it's your turn!
🧵Artifact created in the comment pic.twitter.com/FCMZTb52fQ
Resource constraints are a beautiful thing. Survival instinct in a cut-throat AI competitive land is a prime drive for breakthroughs.
— Jim Fan (@DrJimFan) December 27, 2024
I’ve been following DeepSeek for a long time. They had one of the best open coding models last year. Superior OSS models put huge pressure on… https://t.co/ARtRjAXiOJ
Cool things from DeepSeek v3's paper:
— Daniel Han (@danielhanchen) December 27, 2024
1. Float8 uses E4M3 for forward & backward - no E5M2
2. Every 4th FP8 accumulate adds to master FP32 accum
3. Latent Attention stores C cache not KV cache
4. No MoE loss balancing - dynamic biases instead
More details:
1. FP8: First large… pic.twitter.com/06AO8EFv4p
DeepSeek V3 实测:与 Claude 3.5 Sonnet、o1 Pro 代码能力对比
— nicekate (@nicekate8888) December 27, 2024
本期视频将深入解析DeepSeek最新发布的V3版本,包括其671亿参数、14.8T token 预训练等核心规格。
通过多轮测试,分别与Claude 3.5 Sonnet和o1 Pro在Python、JavaScript、Swift、Java等编程语言上进行了对比。
时间戳
0:00 -… pic.twitter.com/KtvxViaqTZ
A full day (24h) of continuously generating with Deepseek V3 costs $1.50
— Tom Dörr (@tom_doerr) December 27, 2024
comments