Key Highlights
- Microsoft unveiled three proprietary AI models: MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2, accessible via Microsoft Foundry.
- MAI-Transcribe-1 achieves superior accuracy across 25 languages, surpassing OpenAI’s Whisper and Google Gemini Flash in benchmark testing.
- A contract renegotiation with OpenAI in late 2025 granted Microsoft permission to develop frontier AI models independently.
- Development teams consisted of fewer than 10 engineers per model, utilizing approximately 50% fewer GPU resources than rival offerings.
- Microsoft AI CEO Mustafa Suleiman announced intentions to develop a frontier large language model, pursuing complete AI autonomy.
Microsoft has made its boldest move yet toward AI self-sufficiency, unveiling three proprietary models on Wednesday that position the company as a direct rival to OpenAI, Google, and emerging AI ventures.
The trio of models — MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 — launched through Microsoft Foundry alongside a new MAI Playground interface. They address speech-to-text conversion, text-to-speech synthesis, and image creation. Mustafa Suleiman, Microsoft’s AI CEO, characterized the release as the inaugural deployment from his “superintelligence team,” established half a year ago.
MICROSOFT ANNOUNCED PLANS TO DEVELOP ADVANCED AI MODELS BY 2027.
— First Squawk (@FirstSquawk) April 2, 2026
MSFT shares concluded their weakest quarterly performance since 2008, declining approximately 17% year-to-date. This model release marks Suleiman’s initial public response to shareholder demands for measurable returns on the corporation’s substantial AI investments.
MAI-Transcribe-1 stands as the flagship offering. It delivers the lowest average Word Error Rate on the FLEURS benchmark spanning the 25 most-used languages across Microsoft products, recording an average error rate of 3.8%. The company asserts it surpasses OpenAI’s Whisper-large-v3 across all 25 languages and Google’s Gemini 3.1 Flash on 22 of 25. The system handles MP3, WAV, and FLAC formats up to 200MB, with batch processing speeds 2.5 times faster than Azure’s current solution. Internal testing is underway within Teams and Copilot Voice.
MAI-Voice-1 produces 60 seconds of human-like audio within one second and enables custom voice generation using merely seconds of sample recordings. Pricing is set at $22 per million characters. MAI-Image-2 secures a top-three position on the Arena.ai leaderboard and is being integrated into Bing and PowerPoint, with costs of $5 per million input tokens and $33 per million image output tokens. WPP represents one of the initial enterprise adopters deploying it at scale.
Contract Revision Enabled Independence
This launch would have been impossible twelve months earlier. Until October 2025, Microsoft faced contractual restrictions preventing independent pursuit of artificial general intelligence under its original 2019 OpenAI agreement.
When OpenAI pursued additional compute resources beyond Microsoft — establishing partnerships with SoftBank and other entities — Microsoft renegotiated terms. The updated agreement permits Microsoft to construct its own frontier models while maintaining licensing rights to all OpenAI developments through 2032.
Suleiman informed VentureBeat: “Back in September of last year, we renegotiated the contract with OpenAI, and that enabled us to independently pursue our own superintelligence.” He emphasized the OpenAI partnership continues through at least 2032.
Compact Teams, Ambitious Performance
Among the launch’s most striking revelations: each model emerged from teams numbering under 10 engineers. Suleiman stated the audio model required 10 people and that performance advantages stemmed from architectural design and data strategy rather than workforce size.
“Our image team, equally, is less than 10 people,” he revealed. This methodology contrasts sharply with prevailing industry patterns, where organizations like Meta have allegedly offered individual researchers compensation packages valued between $100 million and $200 million.
Microsoft indicates its pricing strategy is intentionally competitive — structured to undercut Amazon and Google. Suleiman called it “the cheapest of any of the hyperscalers.” The organization is currently planning frontier-scale GPU infrastructure deployments over the coming 12 to 18 months.
Suleiman verified a large language model appears on the development timeline, stating Microsoft aims to achieve “complete independence” while delivering “state of the art models across all modalities.”
