A very interesting read. Note that Deepseek V3 is not ranking in the top 5 for coding on LLMSYS (my go-to on how users perceive the efficacy of different LLMs). A lot of these 'benchmarks' are manipulated unfortunately.
I don't really think that's the case here. Deepseek V3 is their regular LLM, not their reasoning model, it ranks right around sonnet 3.5, GPT-4o and Gemini-exp 1206, so it is up there with the other biggest and best models out there, while being significantly cheaper.
I personally use aider, and on the leaderboard there the reasoning model from deepseek is right between o1 and sonnet 3.5, at a fraction of the cost:
I get the sense that this is probably a very fair placement for R1, and time will tell if that is actually the case. I have been using Deepseek v3 extensively and it's not as good as sonnet, but it's still really strong and up there with the absolute best models available for coding.
A very interesting read. Note that Deepseek V3 is not ranking in the top 5 for coding on LLMSYS (my go-to on how users perceive the efficacy of different LLMs). A lot of these 'benchmarks' are manipulated unfortunately.
I don't really think that's the case here. Deepseek V3 is their regular LLM, not their reasoning model, it ranks right around sonnet 3.5, GPT-4o and Gemini-exp 1206, so it is up there with the other biggest and best models out there, while being significantly cheaper.
I personally use aider, and on the leaderboard there the reasoning model from deepseek is right between o1 and sonnet 3.5, at a fraction of the cost:
https://aider.chat/docs/leaderboards/
I get the sense that this is probably a very fair placement for R1, and time will tell if that is actually the case. I have been using Deepseek v3 extensively and it's not as good as sonnet, but it's still really strong and up there with the absolute best models available for coding.