Microsoft’s Latest Small Language Model Outperforms ChatGPT with a Fraction of Parameters

Jun 27, 2023

In a surprising breakthrough, Microsoft has unveiled its latest language model, Phi-1, with 1.3 billion parameters. Contrary to the conventional belief that larger models perform better, Microsoft’s approach focuses on the quality of the training data. Phi-1, trained on a meticulously curated “textbook-level” dataset, has outperformed GPT-3.5 with 100 billion parameters.

The training time for Microsoft’s model was merely 4 days

Microsoft’s Phi-1 language model, built on the Transformer architecture, has garnered attention for its impressive performance. The team behind Phi-1 placed emphasis on the quality of training data, a departure from the prevailing trend of increasing model stack size. Using a high-quality dataset consisting of “textbook-level” content sourced from the internet, the Microsoft team processed the information using GPT-3.5. With the assistance of 8 Nvidia A100 GPUs, the training process was completed in just four days.

According to Microsoft, the focus on enhancing training data quality, rather than escalating parameter count, has yielded promising results. In comparative tests, Phi-1 achieved an accuracy score of 50.6%, surpassing GPT-3.5’s performance of 47% with a staggering 175 billion parameters.

Microsoft plans to open source Phi-1 on HuggingFace, further strengthening the accessibility and collaborative potential of this language model. This isn’t the first instance of Microsoft developing a smaller language model; previously, they introduced Orca, a 13 billion parameter model trained on synthetic data using GPT-4. Even Orca has proven to outperform ChatGPT. The research paper on Phi-1 has been published on arXiv, providing detailed insights into its architecture and training methodology. For those interested in exploring the technical aspects, the paper offers a comprehensive overview of Phi-1’s development.

Microsoft’s Phi-1 language model challenges the notion that increased stack size is essential for improved performance. By focusing on high-quality training data, Phi-1 has showcased remarkable accuracy, surpassing even larger models. The open sourcing of Phi-1 further demonstrates Microsoft’s commitment to advancing the field of natural language processing.

RELATED:

(Via)

Microsoft’s Latest Small Language Model Outperforms ChatGPT with a Fraction of Parameters

The training time for Microsoft’s model was merely 4 days

Oppo A3 Pro: Top Features of This Highly Durable Midrange Smartphone

Amazon Summer Sale: Best Deals on Mobile Phones, and Electronics

Huawei launches Pura 70 series in Malaysia, including Pura 70 Ultra with Kirin 9010, a...

The training time for Microsoft’s model was merely 4 days

RELATED ARTICLESMORE FROM AUTHOR

GC Daily: Xiaomi unveils Smart Camera for Babies, Audio Emojis coming to Google Phone App

New “Temporary Chat” Feature Now Available in ChatGPT Web, Mobile App Rollout Coming Soon

Microsoft’s Leaked Internal Emails Shed Light on the Company’s Competition with Google AI and More

Oppo A3 Pro: Top Features of This Highly Durable Midrange Smartphone

Amazon Summer Sale: Best Deals on Mobile Phones, and Electronics

Huawei launches Pura 70 series in Malaysia, including Pura 70 Ultra with Kirin 9010, a...

RELATED ARTICLES MORE FROM AUTHOR