Apple Foundation Models: 3 Key Upgrades in Third Gen

Prev Article Next Article

These Apple foundation models are not just one model but a family of five custom-built systems, developed in collaboration with Google. They are designed to work across both on-device AI and server-based AI, giving you a blend of speed and power.

The lineup includes AFM 3 Core, AFM 3 Core Advanced, AFM 3 Cloud, ADM 3 Cloud, and AFM 3 Cloud Pro. Each model serves a specific purpose, from handling everyday tasks on your device to tackling more complex requests in the cloud. This approach lets Apple tailor performance to the task at hand, whether you need quick on-device responses or the heavier lifting of server-based processing.

On-Device Models: Sparse vs. Dense Architecture

When you look at the on-device side of Apple’s latest AI push, you’ll find two distinct approaches. The Apple foundation models lineup includes AFM 3 Core and AFM 3 Core Advanced, each designed for different on-device scenarios. AFM 3 Core is a dense 3 billion parameter model — a workhorse that handles everyday tasks efficiently. AFM 3 Core Advanced, on the other hand, is a 20 billion parameter multimodal model that uses a sparse architecture to stay within your device’s memory limits.

Apple foundation models - real-life example — Bild: holdosi / Pixabay

AFM 3 Core: A Dense 3B Parameter Workhorse

Think of AFM 3 Core as the reliable, always-ready option. Its dense design means all 3 billion parameters are active for every request. This keeps things simple and fast, making it ideal for quick on-device inference tasks like text suggestions or photo analysis. Because it’s purpose-built for Apple silicon, it runs efficiently on the Apple Neural Engine without draining your battery.

AFM 3 Core Advanced: Sparse Multimodal with Mixture of Experts

AFM 3 Core Advanced takes a different route. It’s a sparse mixture of experts (MoE) model, which means it doesn’t use all 20 billion parameters at once. Instead, it activates just 1 to 4 billion parameters per request. This is a clever trick for DRAM optimization — your device’s memory can’t hold a full 20 billion parameter model, but it can handle a sparse one. The trade-off? More complexity and potential latency, since the model has to decide which experts to call on for each task.

How Instruction-Following Pruning Works

The secret sauce behind AFM 3 Core Advanced’s efficiency is a technique called Instruction-Following Pruning. Here’s how it works: when you make a request, a lightweight dense block first analyzes the prompt. It then routes the task to a combination of shared and routed experts — specialized sub-networks within the model. This per-prompt selection ensures only the most relevant parameters are activated, keeping on-device inference fast and memory-friendly. It’s a practical solution for running powerful AI on hardware that has real-world limits.

Server-Based Models: Cloud, ADM, and Cloud Pro

When on-device capabilities reach their practical limits, Apple’s server models step in to handle the heavier lifting. The Apple foundation models lineup extends beyond your device with three distinct server options: AFM 3 Cloud, ADM 3 Cloud, and AFM 3 Cloud Pro. Each model is purpose-built for a specific role, from efficient cloud inference to advanced image generation and agentic AI tasks. Think of them as a scalable toolkit — you get the lightweight on-device experience for everyday requests, and the cloud models kick in when you need more power.

AFM 3 Cloud: Purpose-Built for Apple Silicon

The AFM 3 Cloud model is built from the ground up for Apple Silicon, ensuring tight integration with Apple’s server infrastructure. Like the on-device models, it benefits from the same architectural efficiencies, making cloud inference faster and more resource-friendly. When you send a complex request that your device can’t handle alone, this model takes over without introducing unnecessary latency. It’s the workhorse for general-purpose cloud tasks, designed to scale efficiently across Apple’s own hardware.

ADM 3 Cloud: Image Generation Capabilities

Need to generate or edit images on the fly? That’s where ADM 3 Cloud comes in. This is a dedicated image generation model optimized for cloud inference, so you can create visuals without taxing your local hardware. It works alongside the other server models, giving you a complete creative toolkit — text understanding from the AFM models, image creation from ADM — all routed through the cloud as needed.

AFM 3 Cloud Pro: Enabling Agentic Tool Use and Complex Reasoning

The most capable server model is AFM 3 Cloud Pro, built for agentic AI — tasks that require multiple steps, tool use, and deeper reasoning. Apple hasn’t disclosed its exact parameter count, but its role is clear: it powers features that go beyond standard chatbot interactions, allowing the system to plan, execute actions, and adapt as it goes. Interestingly, while the other cloud models run on Apple Silicon, Cloud Pro operates on NVIDIA GPU hardware within Google Cloud. This gives it the extra muscle needed for complex reasoning workloads that demand raw computational throughput rather than tight integration efficiency.

Privacy Guarantees with Private Cloud Compute

That raw computational power in the cloud naturally raises a big question: what happens to your data once it leaves your device? Apple’s answer is Private Cloud Compute, a system designed to keep your information completely private even when it’s being processed remotely. The core promise is straightforward: user data is never stored or shared with anyone, including Apple itself. This isn’t just a policy statement; it’s a technical guarantee built into the infrastructure.

How Private Cloud Compute Protects User Data

Private Cloud Compute works by creating a secure, isolated environment for each request. When you send a query to a server-based model, the system processes it in a temporary space that leaves no trace behind. The data is used only for that single interaction and then discarded. This approach, known as confidential computing, ensures that even Apple’s own engineers cannot access your information. It’s a practical layer of protection that makes cloud AI feel as private as on-device processing.

Extending Privacy to NVIDIA GPUs in Google Cloud

What makes this setup particularly interesting is how Apple extended Private Cloud Compute to the AFM 3 Cloud Pro. Even though this model runs on NVIDIA GPUs inside Google Cloud, the same privacy guarantees apply. Apple adapted its secure cloud inference architecture to work across third-party hardware, maintaining the same strict data protections. This extension shows a unique approach to building a secure cloud AI infrastructure—one that prioritizes data privacy regardless of where the computation happens. For you, it means you can trust that your sensitive information stays protected, whether the heavy lifting is done on Apple’s own servers or on partner hardware.

Collaboration with Google: Custom-Built Models and Infrastructure

Building on that trust, Apple’s latest foundation models extend into a multifaceted partnership with Google. This goes beyond standard cloud services, involving custom-built models and specialized infrastructure for AFM 3 Cloud Pro.

Inspiration for Apple foundation models — Bild: geralt / Pixabay

The Scope of the Apple-Google Collaboration

The five foundation models were custom-built with Google, indicating joint development or co-design. This suggests a close working relationship on model architecture and training, though specifics like dataset handling or optimization remain undisclosed. For you, this means the Apple foundation models benefit from Google’s cloud AI expertise while adhering to Apple’s stringent privacy standards.

Infrastructure for AFM 3 Cloud Pro: NVIDIA GPUs in Google Cloud

AFM 3 Cloud Pro is deployed on NVIDIA GPUs in Google Cloud, a clear sign of deep infrastructure collaboration. This deployment provides the high-performance computing required for complex AI tasks, ensuring reliable and efficient performance when you use Apple’s intelligent features. The choice of NVIDIA GPUs underscores a focus on computing power for real-time applications. While exact details of joint training or co-optimization are not public, the hardware partnership indicates a strong foundation for scalability and security.

Overall, the Google Cloud partnership enhances the Apple foundation models by combining Apple’s design approach with Google’s infrastructure capabilities. This collaboration supports both privacy and performance, giving you a seamless AI experience across Apple devices.

Use Cases Across the Model Lineup

With the infrastructure foundation in place, Apple’s third-generation models are designed to handle very different jobs. Each model in the lineup is purpose-built, so you get the right balance of speed, privacy, and capability depending on what you’re doing. Whether you need instant results on your phone or heavy lifting in the cloud, there’s a model tuned for that task.

On-Device Applications: Real-Time and Private

The on-device models are where you’ll notice Apple foundation models most directly. These lightweight models run entirely on your iPhone, iPad, or Mac, which means they can process data without sending anything to a server. This keeps your information private and delivers results with almost no delay. Everyday on-device AI tasks include text prediction as you type, smart replies in messages, and real-time image understanding — like identifying objects in a photo or recognizing text in a scene. Because the model is always available locally, features like live captions or contextual suggestions feel instant and responsive.

Cloud-Based Reasoning and Generation

For more demanding work, Apple’s cloud models step in. The ADM 3 Cloud model specializes in image generation, handling tasks like creating custom emoji or generating visuals from a description. Meanwhile, the AFM 3 Cloud model focuses on general server-side reasoning — answering complex questions, summarizing long documents, or analyzing data that requires more processing power than your device can provide. These cloud models are designed to work seamlessly with on-device intelligence, so you only notice the extra capability, not the transition.

Agentic Workflows with AFM 3 Cloud Pro

The most advanced model in the lineup, AFM 3 Cloud Pro, pushes into agentic tool use. This means the model can take autonomous actions — like booking a reservation, sending a follow-up email, or coordinating multiple apps to complete a task. It handles complex reasoning steps in sequence, checking its own work and adjusting as it goes. For example, you could ask it to plan a trip, and it would research flights, check your calendar, and draft an itinerary without you needing to guide each step. This is where Apple Intelligence features move from simple assistance to proactive, multi-step help.

Missing Details: Parameter Counts and Performance Benchmarks

Still, for all the ambition behind these Apple foundation models, Apple has left some key questions unanswered. If you are the type who likes to compare model sizes or dig into benchmark comparisons, you will notice a few gaps. The company has chosen not to disclose the exact model parameter count for its server-side models, including AFM 3 Cloud, ADM 3 Cloud, and the larger AFM 3 Cloud Pro. This makes it hard to judge how these models scale compared to others in the industry.

Beyond raw size, there is also a lack of public benchmark comparison data. Apple has not released any LLM evaluation scores that pit these third-generation models against their own previous versions or against external competitors. When you are trying to decide if an upgrade matters, model performance metrics like accuracy on reasoning tasks or speed of text generation are crucial. Without them, you have to rely on subjective impressions of how Siri or writing tools behave in practice.

Undisclosed Parameter Sizes for Server Models

Parameter counts are a common shorthand for a model’s complexity. Larger counts often mean more capability, but also higher computational cost. Apple’s silence on the exact numbers for its server models leaves a gap for anyone doing a technical comparison. You can still get a sense of capability from how the models perform on specific tasks, but the lack of hard numbers makes it harder to benchmark them systematically.

Absence of Public Performance Benchmarks

Similarly, the company has not published any standard benchmarks that compare these models to prior versions or to other systems. There is no public data on latency versus accuracy trade-offs for the on-device sparse model, for example. This means you cannot easily verify claims about speed or efficiency. For now, the best way to evaluate these models is to use Apple Intelligence features directly and see if they meet your needs, rather than relying on published scores.

Frequently Asked Questions

How does AFM 3 Core Advanced overcome DRAM constraints with its sparse architecture?

The sparse architecture of AFM 3 Core Advanced activates only a subset of parameters for each inference task. This selective activation reduces the memory footprint, allowing the model to run efficiently within the DRAM limits of Apple silicon. You get faster performance without compromising on model capability.

What is the difference between the dense AFM 3 Core and the sparse AFM 3 Core Advanced?

The dense AFM 3 Core uses all its parameters for every computation, which demands more memory and power. In contrast, the sparse AFM 3 Core Advanced activates only relevant parameters per task, making it more memory-efficient. This design lets you run advanced models on devices with limited DRAM without sacrificing accuracy.

What privacy guarantees do the server-based models offer through Private Cloud Compute?

Private Cloud Compute ensures that your data is processed on Apple’s servers without being stored or logged. The system uses end-to-end encryption and audited code to guarantee that Apple foundation models cannot access your personal information. You get the power of cloud-based AI while maintaining strong privacy protections.

Prev Article Next Article

Introducing Apple’s Third Generation Foundation Models

On-Device Models: Sparse vs. Dense Architecture

AFM 3 Core: A Dense 3B Parameter Workhorse