Microsoft Phi 3 Mini: The Tiny Model That Runs on Your Phone

NISHANT TIWARI 25 Apr, 2024 • 5 min read

Introduction

In the field of artificial intelligence (AI), there’s always been a belief that bigger is better. But Microsoft has just shaken things up with their latest creation, Phi-3-mini. It’s a small AI model that’s turning heads by showing that size isn’t everything. Despite being much smaller than its counterparts, Phi-3-mini can hold its own when it comes to understanding language and making sense of things. This challenges the idea that only large language models (LLMs) can do the heavy lifting in AI. This article delves into what this new model is all about and how it is redefining AI innovation.

Microsoft Phi 3 Mini: The Tiny Model That Runs on Your Phone

Why Big isn’t Always Better in AI?

Recently, there has been a significant focus on scaling up LLMs, believing that bigger models lead to better performance. However, the phi-3-mini model, despite achieving a similar level of language understanding and reasoning ability as much larger models, is still fundamentally limited by its size for certain tasks. The model cannot store extensive “factual knowledge,” resulting in lower performance on tasks such as TriviaQA. This limitation has prompted the exploration of augmentation with a search engine to address the model’s weakness. Additionally, the model’s language capabilities are mostly restricted to English, highlighting the need to explore multilingual capabilities for Small Language Models (SLMs) as an important next step.

Phi-3: A Family of Powerful Small Language Models (SLMs)

Microsoft’s Phi-3-mini is part of a family of powerful SLMs developed to challenge the assumption that bigger is always better. These SLMs have been designed to achieve high performance with a significantly smaller number of parameters compared to larger models. The phi-3-mini model, with 3.8 billion parameters, has been trained on 3.3 trillion tokens.

Despite the small size, it demonstrates performance that rivals much larger models, such as Mixtral 8x7B and GPT-3.5. The innovation lies in the dataset used for training, which is a scaled-up version of the one used for phi-2. This dataset consists of heavily filtered web data and synthetic data. This approach has enabled the development of powerful SLMs that can be deployed on devices with limited computational resources.

Phi-3: A Family of Powerful Small Language Models (SLMs)

Inside Phi-3

Phi-3 refers to a series of language models developed by Microsoft, with Phi-3-mini being a notable addition. Phi-3-mini is a 3.8 billion parameter language model trained on 3.3 trillion tokens, designed to be as powerful as larger models while being small enough to be deployed on a phone. Despite its compact size, Phi-3-mini boasts impressive performance, rivaling that of larger models such as Mixtral 8x7B and GPT-3.5. It achieves 69% on MMLU and 8.38 on MT-bench, showcasing its prowess in language understanding and reasoning.

Furthermore, Phi-3-mini can be quantized to 4 bits, occupying approximately 1.8GB of memory, making it suitable for deployment on mobile devices. The model’s training data, a scaled-up version of the one used for Phi-2, is composed of heavily filtered web data and synthetic data, contributing to its remarkable capabilities.

Microsoft Phi 3 Mini vs other language models

The Secret Sauce of Phi-3’s Success

The success of Phi-3 can be attributed to its training methodology, which utilizes high-quality training data to improve the performance of SLMs. The training data consists of heavily filtered web data and synthetic data, following the sequence of works initiated in “Textbooks Are All You Need.” This method allows Phi-3-mini to reach the level of highly capable models such as GPT-3.5 with only 3.8B parameters. This showcases the effectiveness of the training approach. Additionally, the model is chat-finetuned, aligning it for robustness, safety, and chat format, further contributing to its success.

Where Phi-3 Shines and What It Still Learns

Phi-3-mini exhibits strengths in its compact size, impressive performance, and the ability to be deployed on mobile devices. Its training with high-quality data and chat-finetuning contribute to its success. This allows it to rival larger models in language understanding and reasoning.

However, the model is fundamentally limited by its size for certain tasks. It cannot store extensive “factual knowledge,” leading to lower performance on tasks such as TriviaQA. Nevertheless, efforts to resolve this weakness are underway, including augmentation with a search engine and exploring multilingual capabilities for Small Language Models.

Safety First with Phi-3

Phi-3-mini was developed with a strong emphasis on safety and responsible AI principles, in alignment with Microsoft’s guidelines. The approach to ensuring safety involved various measures such as safety alignment in post-training, red-teaming, and automated testing. It also involved evaluations across multiple categories of responsible AI (RAI) harm. The model’s training data was carefully curated and modified to address RAI harm categories, leveraging both existing datasets and in-house generated ones.

An independent red team at Microsoft played a crucial role in identifying areas of improvement during the post-training process. This led to the refinement of the dataset and a significant decrease in harmful response rates. The post-training process itself consisted of supervised finetuning (SFT) and direct preference optimization (DPO), which utilized high-quality data across diverse domains to steer the model away from unwanted behavior.

Microsoft Phi 3 Mini Safety Standards

Despite the diligent RAI efforts, challenges around factual inaccuracies, biases, inappropriate content generation, and safety issues remain, as is the case with most LLMs. However, the use of carefully curated training data and targeted post-training, along with insights from red-teaming, has significantly mitigated these issues.

Conclusion

The Phi-3 model, including phi-3-mini, phi-3-small, and phi-3-medium, has been extensively evaluated and compared with other available language models. The results of the benchmarks demonstrate the model’s impressive reasoning ability, language understanding, and performance in multi-turn conversations. The model’s capacity to handle long context tasks, while maintaining high quality, has been highlighted.

Additionally, the post-training process, including the development of a long context version of phi-3-mini, has further enhanced the model’s capabilities. Going forward, the model’s main advancements would be its multilingual capabilities and the use of a search engine to improve factual knowledge. Overall, the Phi-3 model has shown promising results and potential for further development and application.

NISHANT TIWARI 25 Apr 2024

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear