Grafana Pyroscope 2.0 Makes Continuous Profiling Practical

The Four Pillars and the Missing Piece

For years, the observability community has rallied around three core telemetry signals: metrics, logs, and traces. Together, they form a powerful toolkit for understanding system behavior. Metrics tell you what is happening. For example, CPU usage spiked to 95%. Traces tell you where the latency is coming from. Perhaps the payment service is the bottleneck. Logs provide the detailed context, such as error messages or stack traces from a crash.

pyroscope 2.0

Yet a critical piece was missing. None of these signals could tell you why a specific function was slow or which line of code was responsible for the memory allocation. Continuous profiling fills this void. It captures the call stack of an application at a high frequency. This creates a time-series database of stack traces. Engineers can then answer questions like, “Which function is consuming the most CPU right now?” or “Where are my memory allocations going?”

The challenge has always been the overhead. Profiling generates massive amounts of data. A single profile can be tens of megabytes. Storing, querying, and managing this data at scale required significant infrastructure and operational effort. This is precisely the problem that Grafana Labs set out to solve with the release of pyroscope 2.0 on 21 April 2026.

Why Continuous Profiling Matters

Debugging a production performance issue often feels like searching for a black cat in a dark room. You know something is wrong. Your metrics dashboards are flashing red. Your traces pinpoint the offending microservice. But the exact function consuming all the CPU cycles remains frustratingly hidden.

Consider a hypothetical scenario. A fintech application slows down during peak trading hours. Metrics show high CPU usage on the order-processing service. Traces reveal that the bottleneck is a specific endpoint. But which function inside that endpoint is the culprit? Is it the JSON serialization? The database query builder? The data validation logic? Without profiling, you are left guessing. You might add more instrumentation, restart the service, or simply throw more hardware at the problem.

Continuous profiling captures these moments as they happen. As Christian Simon, a staff engineer at Grafana Labs, wrote in the official announcement, “Continuous profiling captures these moments as they happen, so you don’t have to rely on luck with a debugger.” This level of detail is essential for making targeted optimizations rather than simply adding hardware to mask the problem.

The Architecture That Made It Hard

To appreciate the leap forward, it helps to understand the limitations of the original design. The first version of Pyroscope was built on Cortex. This is the same foundational architecture used by early versions of Grafana Mimir and Loki. While this allowed for a rapid initial release, it also inherited some architectural trade-offs that became painful as adoption grew.

The 3x Write Amplification Problem

One of the most significant cost drivers in v1 was write-path replication. For resilience and durability, every incoming profile was written three times. When a single profile payload can easily exceed 10 MB, writing it three times creates a massive amplification factor.

To put this in concrete terms, imagine you are profiling 1,000 services. Each service produces a 10 MB profile every 60 seconds. That is roughly 10 GB of raw profile data per minute. With 3x replication, you are storing 30 GB per minute. That is over 40 TB per day. The monthly object storage cost for that volume can easily run into the tens of thousands of dollars. Storage bills grew quickly, especially for organizations profiling hundreds or thousands of services. Simon noted in the announcement that this 3x amplification had a “meaningful effect on storage bills.”

Stateful Components and Scaling Headaches

The read path in v1 was stateful. Query processing was tied to specific nodes. This meant that capacity had to be provisioned for peak load, even if that capacity sat idle 99% of the time. Profiling data has a notoriously bursty access pattern. During an incident, dozens of engineers might hammer the system with queries simultaneously. Outside of incidents, query traffic might be near zero.

Stateful architectures struggle with this kind of workload. Scaling up required careful data rebalancing. Scaling down was risky. Deploying a new version of the software could take 8 to 12 hours. This operational complexity created a high barrier to entry. Teams that could have benefited from continuous profiling were often priced out by the storage costs or deterred by the operational overhead. The technology was powerful, but it was not practical for the average engineering organization.

The Symbolization Tax

A raw profile is just a series of memory addresses. To make it human-readable, you need symbol files. These files map those memory addresses to function names, file paths, and line numbers. Symbol files can be very large, especially for compiled languages like C++ or Go. In v1, these symbols were stored redundantly alongside each profile. This meant that if you had 1,000 profiles from the same service, you stored the same symbol information 1,000 times. This redundancy was a major driver of storage costs.

Pyroscope 2.0: A Ground-Up Rearchitecture

Grafana Labs took the lessons learned from v1 and from the broader evolution of Mimir and Loki. They rebuilt Pyroscope from the ground up. The result is an architecture that is simpler, cheaper, and more scalable. It applies the principle of making object storage the single source of truth. This principle is adapted specifically for the unique characteristics of profiling data: large payloads, significant symbolic information, and bursty query patterns.

Single Writes and Data Co-location

The most impactful change is the elimination of write-path replication. Pyroscope 2.0 writes each profile exactly once to object storage. This immediately slashes storage costs by 67% compared to v1, before any other optimizations are applied.

But the savings do not stop there. The new architecture introduces data co-location. Profiles from the same service are stored together. This allows the system to aggressively deduplicate symbolic information. Function names, file paths, line numbers, and stack traces are stored once per service. In Grafana’s own production environment, this deduplication reduced the symbol storage footprint by up to 95%. For a company running a large-scale profiling deployment, this translates to thousands of dollars in monthly savings on object storage bills.

Stateless Queriers for Elastic Scaling

The read path has been completely redesigned to be stateless. Any querier can process any query. This means the querier fleet can scale up and down elastically in response to demand, without complex data rebalancing. In a Kubernetes environment, you can configure a Horizontal Pod Autoscaler (HPA) based on query depth or CPU utilization of the queriers.

This is particularly valuable given the bursty nature of profiling queries. Consider a scenario where an LLM-powered agent is automatically investigating a performance regression. It might fire off hundreds of queries in rapid succession. With stateless queriers, the system can spin up additional replicas to handle the load. It can spin them down just as quickly when the investigation is complete. You only pay for the capacity you use, when you use it. Simon describes this as handling spikes “gracefully without paying for idle capacity the rest of the time.”

Operational Simplicity

Fewer stateful components means fewer failure modes. The segment writer is now diskless. The store-gateway component has been removed entirely. Deployments that previously took 8 to 12 hours now complete in minutes. This dramatic reduction in operational overhead makes continuous profiling accessible to smaller teams with fewer dedicated infrastructure resources. The system is simply easier to run.

New Capabilities Unlocked by the Clean Slate Design

The architectural improvements are not just about cost and speed. They enable entirely new capabilities that were simply not feasible with the v1 architecture.

Metrics Derived from Profiles

Instead of querying individual profiles, you can now aggregate profiling data into fleet-wide metrics. This allows you to compare CPU or memory consumption across different services, deployments, or versions at a glance. This is incredibly powerful for capacity planning and for identifying regressions introduced by a new release. You can see at a glance that version 2.3.1 of your checkout service uses 15% more CPU than version 2.3.0.

You may also enjoy reading: Save $150: The Best Breville Coffee Machine Deal Now.

Single Profile Inspection

Sometimes the aggregate view is not enough. You need to look at a single instance of a profile to understand a specific anomaly. Perhaps a single pod is behaving differently from the rest of the fleet. Pyroscope 2.0 makes it possible to inspect individual profiles directly, giving engineers the granularity they need for deep debugging.

Heatmap Queries

Understanding how profiles change over time is crucial. The new architecture supports heatmap queries. These visualize the distribution of profiling data across a time range. This makes it easy to spot patterns, such as a function that periodically consumes more CPU, or a memory leak that grows over time. Simon describes these new capabilities as a natural consequence of the cleaner data model and stateless read path, rather than a separate engineering effort.

Alignment with OpenTelemetry and the Broader Ecosystem

The release of pyroscope 2.0 comes at a pivotal moment for the observability industry. Continuous profiling is gaining recognition as a standard telemetry signal, alongside metrics, logs, and traces.

OpenTelemetry Adopts Profiling

In August 2024, OpenTelemetry announced that it had incorporated continuous profiling as a core telemetry signal. Elastic donated its continuous profiling agent to the project. The OpenTelemetry Profiles signal is now in alpha. Pyroscope 2.0 supports the OpenTelemetry Protocol (OTLP) for profiling. This means it can ingest profiles from any OTLP-compatible agent. This alignment with the open standard reduces vendor lock-in. It makes it easier for organizations to adopt profiling as part of their broader observability strategy. You can use the same OpenTelemetry Collector pipeline to collect metrics, logs, traces, and profiles.

The Rise of eBPF and Alternative Approaches

Pyroscope is not the only open source player in the continuous profiling space. Polar Signals builds Parca, another popular open source profiling project. Parca leverages eBPF (Extended Berkeley Packet Filter) for low-overhead kernel-level instrumentation. eBPF allows you to profile the kernel and applications without any code changes. Pyroscope, on the other hand, offers deep language-specific SDKs that provide richer context, such as exact line numbers and variable values. The existence of multiple strong open source projects validates the importance of continuous profiling as a practice. It also gives the community options, driving innovation in instrumentation, storage, and query performance.

Security and Compliance Considerations

It is worth noting that stack traces can sometimes contain sensitive information. File paths, user names, or even data values can appear in a stack trace. Organizations with strict compliance requirements should carefully consider what they profile and how they store the resulting data. Pyroscope 2.0 provides mechanisms for controlling data retention and access. You can configure retention policies to automatically delete profiles older than a certain age. You can also set up access controls to restrict who can view profiling data. This allows you to align profiling with your security policies.

Practical Steps for Adopting Pyroscope 2.0

If you are considering adding continuous profiling to your observability stack, here are some practical steps to get started.

Assess Your Current Observability Infrastructure

Do you already run Grafana? Do you use metrics, logs, and traces? Profiling is most powerful when it is integrated with your existing telemetry signals. Pyroscope 2.0 integrates natively with the Grafana ecosystem. You can correlate a CPU spike in a metrics dashboard with a flame graph from a profile. You can click on a span in a trace and see the profile from that specific request.

Choose Your Instrumentation Strategy

You can profile applications using the Pyroscope SDKs, or you can use an OTLP-compatible agent. For languages like Go, Java, Python, Ruby, and Rust, the SDKs provide deep integration. They allow you to add custom labels and context to your profiles. For environments where you cannot modify the application code, eBPF-based agents offer a compelling alternative. You can profile the kernel or container runtimes without any code changes.

Start with a Single Service

Do not try to profile everything at once. Pick a service that is known to have performance issues or high resource consumption. Instrument it, deploy pyroscope 2.0, and start exploring the flame graphs. Once you are comfortable with the workflow, you can expand to other services. A good candidate is a service that is frequently the subject of incident reviews.

Learn to Read Flame Graphs

Flame graphs are the most common way to visualize profiling data. The x-axis shows the stack profile population, sorted alphabetically. The y-axis shows stack depth. Each rectangle represents a function. The width of the rectangle represents the proportion of time the CPU spent in that function. A wide rectangle at the top of the graph is a strong signal that a specific function is a hot path. Pyroscope 2.0 integrates flame graphs directly into the Grafana UI, making them accessible alongside your metrics and traces. Spend some time exploring the flame graphs for your service. Look for functions that are unexpectedly wide.

Integrate Profiling into Your Incident Response Playbooks

When an incident occurs, make it a standard step to check the profiles. Is there a function that suddenly started consuming more CPU? Is a memory allocation pattern causing the garbage collector to run hot? Continuous profiling provides the answers that metrics and traces cannot. This can drastically reduce mean time to resolution (MTTR). Add a link to the profiling dashboard in your incident response runbook.

The journey of continuous profiling from a niche practice to a standard pillar of observability has been long. With pyroscope 2.0, Grafana Labs has made it practical for organizations of all sizes to see exactly what their code is doing in production. The black cat in the dark room has finally been found.

Add Comment