Apple’s latest hardware is doing something pretty unexpected on the AI side, though it comes with a clear catch. The iPhone 17 Pro has been shown running a 400-billion parameter language model locally, which sounds almost unreal for a phone.

The demo comes from an open-source project called Flash-MoE, shared by developer @anemll. Models of this size usually need well over 200GB of memory to even load, so getting one to run on a device with 12GB of RAM shouldn’t really be possible in the usual sense.
What’s happening here is a bit different. Instead of loading the whole model into memory, the system pulls in pieces from storage as needed. It also relies on a Mixture of Experts setup, where only a small portion of the model is active at any given moment. That combination is what makes it run at all.
The problem is speed. Or rather, the lack of it. The model generates at about 0.6 tokens per second, which means you’re waiting a couple of seconds for a single word. It’s slow enough that even simple prompts start to feel like a test of patience. Battery drain is another likely issue here, though that’s expected with this kind of workload.
Still, it’s interesting to see. Not because it’s usable right now, but because it shows where things might be heading. Running something this large entirely on-device, without relying on the cloud, wasn’t even part of the conversation not too long ago.
For now, though, there’s a clear gap between what’s possible and what actually makes sense to use. Smaller models are still the practical choice. But experiments like this do give a glimpse of what future phones might eventually handle more comfortably.
Don’t miss a thing! Join our Telegram community for instant updates and grab our free daily newsletter for the best tech stories!
For more daily updates, please visit our News Section.
(Source: @anemll on X)







Comments