2025-01-28
arXiv

Challenges in Ensuring AI Safety in DeepSeek-R1 Models: The Shortcomings of Reinforcement Learning Strategies

Manojkumar Parmar , Yuvaraj Govindarajulu
The paper discusses the limitations of using Reinforcement Learning (RL) to ensure safety in advanced LLMs like DeepSeek-R1 and proposes a hybrid approach combining RL and Supervised Fine-Tuning (SFT) to mitigate harmful outputs.
Large Language Models (LLMs) have achieved remarkable progress in reasoning, alignment, and task-specific performance. However, ensuring harmlessness in these systems remains a critical challenge, particularly in advanced models like DeepSeek-R1. This paper examines the limitations of Reinforcement Learning (RL) as the primary approach for reducing harmful outputs in DeepSeek-R1 and compares it with Supervised Fine-Tuning (SFT). While RL improves reasoning capabilities, it faces challenges such as reward hacking, generalization failures, language mixing, and high computational costs. We propose hybrid training approaches combining RL and SFT to achieve robust harmlessness reduction. Usage recommendations and future directions for deploying DeepSeek-R1 responsibly are also presented.
2025-01-22
arXiv

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Zhiyu Wu , Xiaokang Chen , Zizheng Pan , Xingchao Liu , Wen Liu
The paper introduces DeepSeek-R1-Zero, a model trained with reinforcement learning that exhibits strong reasoning capabilities but faces readability and language mixing issues. To improve these aspects, DeepSeek-R1 is developed, which uses multi-stage training and cold-start data, achieving performance on par with OpenAI-o1-1217. The models and additional resources are open-sourced.
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.