Model Introduction
MiniMax-Text-01 is a 456-billion parameter model combining Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE). It uses advanced parallel strategies to achieve a training context of 1 million tokens and can handle up to 4 million tokens during inference, showcasing top-tier performance.