Ollama is Now Powered by MLX on Apple Silicon in Preview

Introduction

In 2026, innovation is not slowing down, especially not at Apple. Ollama, a key player in the development of coding agents and personal assistants, is now powered by MLX, Apple's machine learning framework, on Apple Silicon chips. This new integration promises spectacular performance, transforming how developers and businesses use these technologies.

What is MLX?

MLX is Apple's machine learning framework designed to leverage Apple Silicon's unified memory architecture. By incorporating MLX, Ollama can now fully exploit the capabilities of the new M5, M5 Pro, and M5 Max chips, optimizing response time and token generation speed.

Breathtaking Performance

Recent tests have shown significant performance improvements in Ollama. With version 0.19, the token generation rate has nearly doubled compared to the previous version. For instance, prefill now reaches 1851 tokens/s, and decoding accelerates to 134 tokens/s. These figures perfectly illustrate the impact of MLX on improving efficiency and speed.

Using NVFP4 for Quality Responses

Ollama now incorporates NVIDIA's NVFP4 format, allowing it to maintain model accuracy while reducing memory and storage requirements. This means Ollama users can benefit from the same high-quality results as those obtained in a production environment. This compatibility also paves the way for model optimization through NVIDIA's model optimizer.

Cache Improvements for Increased Responsiveness

The upgraded cache system in Ollama is another major advantage. By reducing memory usage and increasing cache hits, Ollama ensures more efficient coding and agent tasks. Intelligent checkpoints and smarter cache eviction allow for greater efficiency, even when using shared systems like Claude Code.

Real-World Use Cases

Take the example of a startup developing a personal assistant app. With Ollama and MLX, their assistant can now process user requests almost instantly, improving user experience and customer satisfaction. Additionally, developers can integrate coding agents like Claude Code, providing quick and accurate assistance, reducing development time and operational costs.

The Future of AI on Apple Silicon

With this advance, Apple Silicon and MLX position Ollama as a preferred solution for companies looking to automate and optimize their AI operations. The future looks promising for those who embrace this technology, offering unprecedented potential for growth and innovation.

Conclusion

Ollama, powered by MLX on Apple Silicon, paves the way for a new era of efficiency and innovation in AI. For those looking to stay at the cutting edge of technology and maximize productivity, this is an opportunity not to be missed.

Want to automate your operations with AI? Book a 15-min call to discuss.