Editor

Riene Dimakiling is a journalist and editor covering technology, product, and business through the lens of the people building them. His work focuses on clear, insight-driven stories in AI, software, fintech, and cybersecurity, helping readers make sense of the ideas and trends shaping the digital world.

AI Software Engineering

Microsoft Unveils Three In-House AI Models to Challenge OpenAI and Google Across Speech, Voice, and Images

03.04.2026 · Riene Dimakiling · Views: 2,203 · ⏱ 3 min time to read

Microsoft has launched three new AI models developed in-house, aiming to compete more directly with OpenAI, Google, and other AI companies, even as it continues its partnership with OpenAI.

Microsoft AI released MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 on Thursday. This launch is the strongest indication so far that Microsoft wants to develop its own models, not just distribute those from other companies.

Microsoft’s new model lineup spans three key AI tasks

These three models focus on some of the most important AI tasks for businesses.

TechCrunch reported that MAI-Transcribe-1 turns speech into text in 25 languages and is 2.5 times faster than Microsoft’s previous Azure Fast service.

VentureBeat noted that MAI-Voice-1 can create 60 seconds of natural-sounding audio in just one second.

MAI-Image-2 is Microsoft’s improved image model, now available through Microsoft Foundry and the new MAI Playground. TechCrunch also mentioned that MAI-Image-2 first appeared on MAI Playground on March 19 before this wider launch.

Mustafa Suleyman’s team is putting Microsoft’s own models front and center

The models were developed by Microsoft’s MAI Superintelligence team, the AI group led by Mustafa Suleyman, CEO of Microsoft AI.

The team was formed and announced in November 2025. Suleyman created it about six months ago as part of a push toward what he has called AI self-sufficiency.

In Microsoft’s blog post, Suleyman said the company is building Humanist AI and is focusing on models that are optimized for how people actually communicate and trained for practical use.

Transcription model is being positioned as the flagship release

MAI-Transcribe-1 is the main product and achieved an average Word Error Rate of 3.8% on the FLEURS benchmark for the top 25 languages used by Microsoft customers.

Microsoft claims this model outperforms OpenAI’s Whisper-large-v3 in all 25 languages, Google’s Gemini 3.1 Flash in 22 out of 25, and both ElevenLabs’ Scribe v2 and OpenAI’s GPT-Transcribe in 15 out of 25. Suleyman shared that Microsoft now has the very best in the world for transcription and said the company can run the model using half the GPUs of the state-of-the-art competition.

Pricing and rollout show Microsoft wants enterprise traction quickly

Microsoft is also competing on price in a busy market. MAI-Transcribe-1 starts at $0.36 per hour, MAI-Voice-1 at $22 per million characters, and MAI-Image-2 at $5 per million tokens for text input and $33 per million tokens for image output.

MAI-Image-2 is now a top-three model family on the Arena.ai leaderboard and generates images at least twice as fast as the previous version on Foundry and Copilot. WPP is one of the first large partners using the image model at scale.

New models arrive as Microsoft faces pressure to prove its AI strategy

This launch comes at a challenging time, as Microsoft’s stock just ended its worst quarter since 2008 and investors want proof that big AI investments will pay off. The new models are seen as Suleyman’s first major response to these concerns, especially since they are priced competitively and could reduce Microsoft’s own AI costs.

Even with these new models, Suleyman confirmed that Microsoft’s partnership with OpenAI is unchanged.

A more self-reliant Microsoft AI strategy is taking shape

Overall, these three launches show that Microsoft is moving beyond mainly supporting OpenAI models and is working to become a full AI provider itself. This release shows Microsoft’s ongoing effort to build its own multimodal AI stack.

The company is now making its boldest move yet against competitors in speech transcription, voice generation, and image creation. The message from Microsoft is clear: it still values its partnership with OpenAI, but it also wants to control more of its own AI technology.

Microsoft Unveils Three In-House AI Models to Challenge OpenAI and Google Across Speech, Voice, and Images

Microsoft’s new model lineup spans three key AI tasks

Mustafa Suleyman’s team is putting Microsoft’s own models front and center

Transcription model is being positioned as the flagship release

Pricing and rollout show Microsoft wants enterprise traction quickly

New models arrive as Microsoft faces pressure to prove its AI strategy

A more self-reliant Microsoft AI strategy is taking shape

Apple at 50 Faces an AI Crossroads as Former Insiders Argue the Company Still Has a Path Back

Microsoft’s Copilot Terms Call AI ‘Entertainment Only,’ Raising New Questions About How the Tool Should Be Used

US Closes Tesla Smart Summon Investigation After Software Updates Cut Concerns Over Low-Speed Crashes

Telegram Blames Russia’s VPN Crackdown for Payment Chaos as Pavel Durov Calls Users to ‘Digital Resistance’

Hims & Hers Discloses Customer Support System Hack, Raising Fresh Questions for Telehealth Security

Microsoft Plans $10 Billion Japan Buildout for AI Infrastructure and Cybersecurity Push

Apple Backports DarkSword Security Patch to Older iPhones and iPads as Web Attack Risk Spreads

Italy Moves to Curb Social Media Addiction With New Bill Targeting Algorithms and User Profiling

WhatsApp Warns About Spyware Campaign Linked to Italian Surveillance Firm After Around 200 Users Install Fake App

Google Begins Letting US Users Change Gmail Addresses Without Starting Over