Microsoft is expanding its Phi-3 family of small language models with the introduction of Phi-3-vision. Unlike its siblings, Phi-3-vision isn’t just focused on text – it’s a multimodal model that can analyze and understand images as well.
The model is great for recognizing objects in images
This 4.2 billion parameter model is designed for mobile devices and excels at general visual reasoning tasks. Users can ask Phi-3-vision questions about images or charts, and it will provide insightful answers. While not an image generation tool like DALL-E or Stable Diffusion, Phi-3-vision excels at image analysis and comprehension.

The arrival of Phi-3-vision comes on the heels of Phi-3-mini, the smallest member of the Phi-3 family at 3.8 billion parameters. The complete family now includes Phi-3-mini, Phi-3-vision, Phi-3-small (7 billion parameters), and Phi-3-medium (14 billion parameters).
This focus on smaller models reflects a growing trend in AI development. Smaller models require less processing power and memory, making them ideal for mobile devices and other resource-constrained environments. Microsoft has already seen success with this approach, with its Orca-Math model reportedly surpassing larger competitors in solving math problems. Phi-3-vision is currently available in preview, while the rest of the Phi-3 family (mini, small, and medium) can be accessed through Azure’s model library.
RELATED:
- Microsoft proposes relocation for China-based AI staff amid escalating US-China tensions
- New Call of Duty to come to Xbox Game Pass? Microsoft Gears Up for Xbox Showcase, Hints at Big Changes
- Lenovo Legion Y700 2024: Latest Gaming Tablet with Enhanced Display now available at Giztop
- Xiaomi 13 Ultra Premium Camera Phone is now only $799
- Top 6 HyperOS Features You Absolutely Can’t Miss
(Via)







Comments