Microsoft launches its smallest AI model, Phi-3

Microsoft has introduced the latest iteration of its lightweight AI model, Phi-3 Mini, marking the beginning of a series of three smaller models slated for release.

Read: Meta launches its AI assistant to take on ChatGPT

Phi-3 Mini boasts 3.8 billion parameters and is trained on a relatively smaller dataset compared to larger language models like GPT-4. It is now accessible on Azure, Hugging Face, and Ollama platforms. Microsoft’s roadmap includes subsequent releases of Phi-3 Small (7B parameters) and Phi-3 Medium (14B parameters), with parameters indicating the complexity of instructions a model can comprehend.

Following December’s launch of Phi-2, which demonstrated performance on par with larger models like Llama 2, Microsoft asserts that Phi-3 surpasses its predecessor and can deliver responses akin to models ten times its size. Eric Boyd, corporate vice president of Microsoft Azure AI Platform, likens Phi-3 Mini’s capabilities to those of LLMs such as GPT-3.5, albeit in a more compact form.

Compared to their larger counterparts, small AI models are often more cost-effective to operate and exhibit enhanced performance on personal devices like smartphones and laptops. Reports earlier this year indicated Microsoft’s focused efforts on developing lighter-weight AI models. In addition to Phi, the company has developed Orca-Math, a model tailored for solving mathematical problems.

Rival companies have also introduced their own small AI models, targeting tasks such as document summarization or coding assistance. Google’s Gemma 2B and 7B excel in simple chatbots and language tasks. Anthropic’s Claude 3 Haiku is proficient in reading dense research papers with graphs and summarizing them rapidly, while Meta’s recently unveiled Llama 3 8B may find utility in certain chatbot applications and coding assistance.

Boyd explains that developers trained Phi-3 using a “curriculum,” drawing inspiration from how children learn from simplified materials. They employed an LLM to generate “children’s books” teaching Phi through a list of over 3,000 words.

Phi-3 builds upon its predecessors’ advancements, with Phi-1 focusing on coding and Phi-2 progressing to reasoning capabilities. While Phi-3 exhibits proficiency in coding and reasoning and possesses some general knowledge, it cannot match the breadth of a GPT-4 or another LLM trained on the entirety of the internet.

For many companies, smaller models like Phi-3 prove more effective for custom applications due to their compatibility with smaller internal datasets and lower computing requirements, making them a cost-effective solution.