China has reportedly found a way around the limitations of NVIDIA’s reduced-capability AI accelerators. DeepSeek, leading this innovation, has launched a new project that equates to a jaw-dropping eightfold increase in TFLOPS performance with the Hopper H800s AI accelerators.
In a significant move, China’s AI industry seems determined to maximize the potential of its hardware, without solely relying on external sources. DeepSeek, a prominent player in this arena, has demonstrated how advanced software can truly push boundaries. By harnessing the capabilities of NVIDIA’s “cut-down” Hopper H800 GPUs, DeepSeek has ingeniously optimized memory consumption and resource allocation across inference requests to significantly boost performance.
Recently, DeepSeek announced its “OpenSource” week, a platform dedicated to releasing innovative technologies accessible to everyone through GitHub repositories. On the first day, they introduced FlashMLA, a specially designed decoding kernel for NVIDIA’s Hopper GPUs. Before diving into its mechanics, it’s worth noting how this tool is shaking up the market with its groundbreaking improvements.
DeepSeek boldly claims to have achieved 580 TFLOPS for BF16 matrix multiplication on the Hopper H800—about eight times the industry’s usual performance benchmark. This is impressive on its own, but when paired with the efficient use of memory that FlashMLA offers, the memory bandwidth dramatically increases to a staggering 3000 GB/s, almost doubling the H800’s expected theoretical limit. What’s remarkable is that these enhancements come solely from clever coding, not hardware upgrades.
FlashMLA boasts “low-rank key-value compression,” which might sound complex, but it essentially breaks data into smaller, more manageable pieces, speeding up processing while cutting memory consumption by a notable 40%-60%. Moreover, its block-based paging system dynamically allocates memory based on task demand rather than sticking to a fixed value. This adaptability allows the processing of variable-length sequences more efficiently, leading to better overall performance.
DeepSeek’s strides in AI technology emphasize that innovation isn’t restricted to a single aspect; it’s about the synergy of various elements. Presently, the FlashMLA tool is tailored for Hopper GPUs, and many are curious about the potential boosts this technology could offer to the H100 when applied.