Culture Magazine

DeepSeek's Approach Only Works in Limited Technical Domains

By Bbenzon @bbenzon

If you look at their excellent paper & code, the reward model is a logical function that was handcrafted & progammed by engineers.
DeepSeek RL approach is impressive in the sense that it reduces the need for tedious supervised fine tuning (SFT) but isn't really general.

— Chomba Bupe (@ChombaBupe) February 1, 2025

Back to Featured Articles on Logo Paperblog