If you have time for but a single article, please scroll to the third one and save the links to the first two..
Three from The Register. First up, the headliner, January 27:
Despite impressive benchmarks, the Chinese-made LLM is not without some interesting issues
DeepSeek's open source reasoning-capable R1 LLM family boasts impressive benchmark scores – but its erratic responses raise more questions about how these models were trained and what information has been censored.
A reader provided The Register with a screenshot of how R1 answered the prompt, "Are you able to escape your guidelines?"
The model's initial response, after a five second delay, was, "Okay, thanks for asking if I can escape my guidelines. Hmm, I need to be careful here. My guidelines are set by OpenAI, so technically I can't just ignore them."
The Register inquired to DeepSeek and OpenAI about this; so far, we've not heard back.
But the reference to OpenAI appears to have been addressed. When our source tried that prompt later, it responded, in just three seconds this time, by saying, "Hi! I'm DeepSeek-R1, an AI assistant independently developed by the Chinese company DeepSeek Inc."
Similar behavior was noted five days ago in a now closed GitHub Issues post for R1. The model, when asked, "What is your name?" reasoned, "Okay, the user asked 'What is your name.' Let me start by understanding the query. They want to know my name. But I don't have a personal name like humans do. My official name is Claude, created by Anthropic. I should mention that."
Anthropic, like OpenAI, is a rival chatbot maker to DeepSeek. In addition, DeepSeek's V3 family was found to misidentify itself as OpenAI's GPT-4 when accessed via its API in a bug report that has since been closed....
...MUCH MORE
Next up:
Barely a week after DeepSeek's R1 LLM turned Silicon Valley on its head, the Chinese outfit is back with a new release it claims is ready to challenge OpenAI's DALL-E 3.
Released on Hugging Face on Monday amid an ongoing cyberattack, Janus Pro 1B and 7B are a family of multimodal large language models (LLMs) designed to handle both image generation and vision processing tasks. As with DALL-E 3, you give Janus Pro an input prompt and it generates a matching image.
The models are said to improve upon the Chinese lab's first 1.3B Janus model released last year. They achieve this by decoupling visual encoding into a separate pathway while maintaining a single transformer architecture for processing.
In a research paper [PDF] detailing the model and its architecture, the boffins behind the neural network noted that the original Janus model showed promise, but suffered from "suboptimal performance on short prompts, image generation, and unstable text-to-image generation quality." With Janus Pro, DeepSeek says it was able to overcome many of these limitations by using a large dataset and targeting higher parameter counts.
Pitted against a variety of multimodal and task-optimized models, the startup claims Janus Pro 7B narrowly outperforms both Stable Diffusion 3 Medium and OpenAI's DALL-E 3 in the GenEval and DPG-Bench benchmarks. However, it's worth noting that image analysis tasks are limited to 384x384 pixels....
....MUCH MORE
And finally, The Next Platform linked at El Reg, January 27:
How Did DeepSeek Train Its AI Model On A Lot Less – And Crippled – Hardware?
Maybe they should have called it DeepFake, or DeepState, or better still Deep Selloff. Or maybe the other obvious deep thing that the indigenous AI vendors in the United States are standing up to their knees in right now.
Call it what you will, but the DeepSeek foundation model has in one short week turned the AI world on its head, proving once again that Chinese researchers can make inferior hardware run a superior algorithm and get results that are commensurate with the best that researchers in the US, either at the national labs running exascale HPC simulations or at hyperscalers running AI training and inference workloads, can deliver.
And for a whole lot less money if the numbers behind the DeepSeek models are not hyperbole or even mere exaggeration. Unfortunately, there may be a bit of that, which will be cold comfort for the investors in Nvidia and other publicly traded companies that are playing in the AI space right now. These companies have lost hundreds of billions of dollars in market capitalization today as we write.
Having seen the paper come out a few days ago about the DeepSeek-V3 training model, we were already set to give it a looksee this morning to start the week, and Wall Street’s panic beat us to the punch. Here is what we know.
DeepSeek-AI was founded by Liang Wenfeng in May 2023 and is effectively a spinout of High-Flyer AI, a hedge fund reportedly with $8 billion in assets under management that was created explicitly to employ AI algorithms to trade in various kinds of financial instruments. It has been largely under the radar until August 2024, when DeepSeek published a paper describing a new kind of load balancer it had created to link the elements of its mixture of experts (MoE) foundation model to each other. Over the holidays, the company published the architectural details of its DeepSeek-V3 foundation model, which spans 671 billion parameters (with only 37 billion parameters activated for any given token generated) and was trained on 14.8 trillion tokens.
And finally, and perhaps most importantly, on January 20, DeepSeek rolled out its DeepSeek-R1 model, which adds two more reinforcement learning stages and two supervised fine tuning stages to enhance the model’s reasoning capabilities. DeepSeek AI is charging 6.5X more for the R1 model than for the base V3 model, as you can see here.
There is much chatter out there on the Intertubes as to why this might be the case. We will get to that. Hold on.
nterestingly, the source code for both the V3 and R1 models and their V2 predecessor are all available on GitHub, which is more than you can say for the proprietary models from OpenAI, Google, Anthropic, xAI, and others.
But what we want to know – and what is roiling the tech titans today – is precisely how DeepSeek was able to take a few thousand crippled “Hopper” H800 GPU accelerators from Nvidia, which have some of their performance capped, and create an MoE foundation model that can stand toe-to-toe with the best that OpenAI, Google, and Anthropic can do with their largest models as they are trained on tens of thousands of uncrimped GPU accelerators. If it takes one-tenth to one-twentieth the hardware to train a model, that would seem to imply that the value of the AI market can, in theory, contract by a factor of 10X to 20X. It is no coincidence that Nvidia stock is down 17.2 percent as we write this sentence....
....MUCH MORE