In a surprising breakthrough, Microsoft has unveiled its latest language model, Phi-1, with 1.3 billion parameters. Contrary to the conventional belief that larger models perform better, Microsoft’s approach focuses on the quality of the training data. Phi-1, trained on a meticulously curated “textbook-level” dataset, has outperformed GPT-3.5 with 100 billion parameters.

The training time for Microsoft’s model was merely 4 days

Microsoft’s Phi-1 language model, built on the Transformer architecture, has garnered attention for its impressive performance. The team behind Phi-1 placed emphasis on the quality of training data, a departure from the prevailing trend of increasing model stack size. Using a high-quality dataset consisting of “textbook-level” content sourced from the internet, the Microsoft team processed the information using GPT-3.5. With the assistance of 8 Nvidia A100 GPUs, the training process was completed in just four days.

Microsoft

According to Microsoft, the focus on enhancing training data quality, rather than escalating parameter count, has yielded promising results. In comparative tests, Phi-1 achieved an accuracy score of 50.6%, surpassing GPT-3.5’s performance of 47% with a staggering 175 billion parameters.

Microsoft plans to open source Phi-1 on HuggingFace, further strengthening the accessibility and collaborative potential of this language model. This isn’t the first instance of Microsoft developing a smaller language model; previously, they introduced Orca, a 13 billion parameter model trained on synthetic data using GPT-4. Even Orca has proven to outperform ChatGPT. The research paper on Phi-1 has been published on arXiv, providing detailed insights into its architecture and training methodology. For those interested in exploring the technical aspects, the paper offers a comprehensive overview of Phi-1’s development.

Microsoft’s Phi-1 language model challenges the notion that increased stack size is essential for improved performance. By focusing on high-quality training data, Phi-1 has showcased remarkable accuracy, surpassing even larger models. The open sourcing of Phi-1 further demonstrates Microsoft’s commitment to advancing the field of natural language processing.

RELATED:

(Via)