Microsoft built a supercomputer for OpenAI after investing $1 billion in the startup in 2019. The challenge was to train a large set of artificial intelligence programs called models, which required powerful cloud computing services for long periods of time. To meet this challenge, Microsoft had to string together tens of thousands of Nvidia Corp.’s A100 graphics chips and change the way servers were positioned on racks to prevent power outages.

The supercomputer allowed OpenAI to release ChatGPT, a viral chatbot that attracted more than 1 million users within days of going public in November 2020. Microsoft now uses the same set of resources to train and run its own large artificial intelligence models, including the new Bing search bot introduced last month, and sells the system to other customers.

Microsoft and openai logo
v

Training a massive AI model requires a large pool of connected graphics processing units in one place, like the AI supercomputer Microsoft assembled. Once a model is in use, answering all the queries users pose requires a slightly different setup, and Microsoft deploys graphics chips for inference, but those processors are geographically dispersed throughout the company’s more than 60 regions of data centers.

Microsoft is adding the latest Nvidia graphics chip for AI workloads, the H100, and the newest version of Nvidia’s Infiniband networking technology to share data even faster. The new Bing is still in preview with Microsoft gradually adding more users from a waitlist. The team working on this holds a daily meeting to figure out how to bring greater amounts of computing capacity online quickly and fix problems that crop up.

Microsoft services

The pit crew has to deal with a shortage of cable trays, so they designed a new cable tray that Microsoft could manufacture itself or find somewhere to buy. They’ve also worked on ways to squish as many servers as possible in existing data centers around the world so they don’t have to wait for new buildings. Microsoft had to think about where the machines were placed and where the power supplies were located to prevent power outages.

RELATED:

(via)