Broadcom Donates Velero: A Top Kubernetes Backup Tool

Prev Article Next Article

Managing stateful applications in a containerized world often feels like trying to catch smoke with your bare hands. While orchestrators like Kubernetes are brilliant at managing ephemeral workloads, the moment you introduce persistent data, the complexity levels skyrocket. For many platform engineers, finding a reliable kubernetes backup tool that can handle the nuances of Custom Resource Definitions (CRDs) and persistent volumes without breaking under pressure is a constant struggle. This struggle reached a significant turning point during KubeCon + CloudNativeCon Europe 2026 in Amsterdam, where a major shift in the ecosystem was announced that promises to change how we approach disaster recovery and cluster portability.

kubernetes backup tool

A New Era for Cloud-Native Resilience

Broadcom has officially transitioned Velero, a cornerstone of the Kubernetes ecosystem, into the Cloud Native Computing Foundation (CNCF) as a Sandbox project. This move is much more than a simple administrative change; it represents a fundamental shift in how the industry views the governance of essential infrastructure. For years, Velero has been the go-to solution for teams needing to snapshot their cluster states, but its lineage has always been tied to specific commercial entities. By moving into the CNCF Sandbox, the project is shedding its single-vendor identity to become a truly neutral, community-driven standard.

The announcement has sent ripples through the DevOps community. To some, it looks like a strategic retreat, but to those deeply embedded in the cloud-native lifecycle, it looks like a massive expansion. When a tool moves from being “owned” by a company to being “stewarded” by a foundation, the horizon of its potential expands. It allows for a broader range of contributors, more diverse use cases, and, most importantly, a level of trust that is essential for mission-critical data protection. If you are building a production environment, you want to know that your recovery strategy isn’t tied to the quarterly earnings or the acquisition roadmap of a single corporation.

This transition addresses a long-standing psychological barrier in the enterprise space. Many large-scale organizations hesitate to standardize their entire disaster recovery pipeline on a tool that carries a “VMware-centric” or “Broadcom-centric” label. Even if the tool performs flawlessly, the fear of vendor lock-in remains a potent deterrent. The CNCF governance model provides the “trust repair” necessary to convince the skeptics that Velero is a permanent fixture of the landscape, regardless of where Broadcom decides to focus its commercial energy next.

The Evolution of Velero: From Heptio to the CNCF

To understand why this move is so significant, we have to look back at where Velero started. The project’s DNA traces back to Heptio, a company founded by industry luminaries Joe Beda and Craig McLuckie, both of whom were instrumental in the early days of Kubernetes at Google. Heptio was built on the idea that Kubernetes needed better tooling to handle real-world, stateful workloads. When VMware acquired Heptio in 2019, Velero became part of a massive corporate ecosystem. Eventually, through the complex series of acquisitions and shifts that led to Broadcom’s current position, Velero came under Broadcom’s stewardship.

Throughout this journey, Velero has maintained an impressive level of community respect, evidenced by its nearly 9,900 stars on GitHub. It isn’t just another utility; it is a sophisticated engine that operates at the Kubernetes API layer. Unlike traditional backup solutions that might look at the storage layer or the hypervisor, Velero understands the “language” of Kubernetes. It uses Custom Resource Definitions (CRDs) to capture the entire state of a cluster, including namespaces, RBAC policies, and persistent volume claims. This makes it a uniquely capable kubernetes backup tool because it doesn’t just save data; it saves the context required to make that data useful again.

The move to the CNCF Sandbox is the logical culmination of this evolution. It takes a tool born from the very creators of the technology it supports and places it in the hands of the global community. While Broadcom remains a primary maintainer alongside heavyweights like Red Hat and Microsoft, the decision-making process is shifting. The project now utilizes a consensus-based model involving supermajority voting and a five-day lazy-consensus period for reviews. This ensures that no single company can unilaterally steer the project in a direction that only benefits their own proprietary products.

Why Governance Matters for Your Disaster Recovery Strategy

In the world of software, there is a vital distinction between vendor-neutral governance and vendor-independent operations. Understanding this difference is crucial for any CTO or Lead Architect planning their long-term infrastructure. Governance refers to who makes the decisions: who decides which features are added, how the code is reviewed, and how the roadmap is shaped. Operations, on the other hand, refer to how the tool actually runs in your environment.

Even with CNCF governance, Velero’s operational requirements remain the same. It still relies on external object storage (like AWS S3 or Google Cloud Storage), IAM credential chains, and a healthy target cluster for restores. The governance change doesn’t change the “how” of the tool, but it profoundly changes the “why” and the “who.” By moving to a neutral foundation, the project ensures that the “how” will continue to evolve in ways that benefit the entire ecosystem, not just a subset of users.

Consider a hypothetical scenario where a major cloud provider decides to change its API in a way that breaks certain backup workflows. In a single-vendor model, you are at the mercy of that vendor’s willingness to patch the tool. In a CNCF-governed model, the community can rally, contributors from different companies can collaborate, and a fix can be pushed much more rapidly. This collective intelligence is the greatest asset of the open-source movement, and Velero is finally positioned to fully leverage it.

The Challenges of Kubernetes State Management

Before we dive into how to implement these solutions, we must acknowledge the sheer difficulty of what we are trying to achieve. Kubernetes was originally designed for stateless microservices. If a pod dies, you just spin up a new one. But in the real world, we have databases, message queues, and file systems. These are “stateful” workloads. If you simply snapshot a disk while a database is writing to it, you might end up with a corrupted, unmountable filesystem upon restoration.

The challenges generally fall into three categories:

Data Consistency: Ensuring that the snapshot of the persistent volume matches the state of the Kubernetes API at that exact moment.
Metadata Complexity: A cluster isn’t just disks; it’s a web of secrets, configmaps, service accounts, and network policies. If you restore the data but forget the security policies, your application won’t run.
Portability: Moving a workload from an on-premises cluster to a managed service like GKE or EKS is notoriously difficult because of the underlying storage drivers and networking plugins.

This is why a specialized kubernetes backup tool like Velero is indispensable. It attempts to solve the “consistency” problem by coordinating the backup of both the Kubernetes objects and the underlying data volumes simultaneously.

Implementing a Robust Backup Workflow

If you are looking to implement Velero to solve these challenges, you shouldn’t just “turn it on” and hope for the best. A professional-grade implementation requires a structured approach. Here is a step-by-step guide to building a resilient backup architecture.

Step 1: Define Your Storage Backend

Velero needs a place to put its “backups.” This should always be an object storage service that is highly durable and geographically distributed. If your Kubernetes cluster is in US-East, your backup bucket should ideally be in a different region or even a different provider to protect against regional outages. Use S3-compatible storage and ensure you have strict lifecycle policies in place to prevent your storage costs from spiraling out of control as backups accumulate.

You may also enjoy reading: 7 Reasons You Might Not Need a Tablet: Ask Hackaday.

Step 2: Configure Volume Snapshots

There are two primary ways Velero handles data: File System Backup (using a tool like Restic or Kopia) and Volume Snapshots (using the CSI – Container Storage Interface). For high-performance databases, you should lean toward CSI snapshots. CSI allows Velero to communicate directly with your storage provider to take a point-in-time snapshot of the block device. This is much faster and more reliable than copying files manually. Ensure your storage class supports the VolumeSnapshot capability before committing to this path.

Step 3: Automate with Schedules

Manual backups are a recipe for disaster. You must use Velero’s scheduling capabilities to automate the process. However, don’t just set a single “daily” schedule. A sophisticated strategy uses a tiered approach:

High-frequency snapshots: For critical databases, every 15 to 30 minutes.
Daily full backups: Capturing the entire namespace configuration.
Weekly long-term archives: For compliance and historical auditing.

Step 4: The “Restore Test” Mandate

The most common mistake in disaster recovery is assuming a backup works because the job finished with “Success.” A backup is only as good as your ability to restore it. You must implement a “Chaos Engineering” practice where you periodically spin up a completely isolated “test” cluster and attempt to restore a random production namespace into it. If you cannot restore your data within your defined Recovery Time Objective (RTO), your backup strategy has failed.

Broadcom’s Strategy: Investment, Not Divestiture

One of the most interesting aspects of the recent announcement is the clarification provided by Dilpreet Bindra, a Senior Director of Engineering at Broadcom. There has been significant speculation that Broadcom might be “walking away” from Velero to focus on their core commercial products. Bindra has been very clear: this is an investment in their VKS (VMware Kubernetes Service) and IaaS (Infrastructure as a Service) story.

This is a sophisticated business move. By donating the project to the CNCF, Broadcom is actually increasing the value of their own commercial offerings. If Velero becomes the industry standard, then Broadcom’s platforms—which are designed to run Velero seamlessly—become more attractive to enterprise customers. They are essentially building a larger “ocean” for their “boats” to sail in. They want to be the builders of the ecosystem, not just the owners of a single tool.

Furthermore, Broadcom has announced parallel efforts with the etcd community. Etcd is the “brain” of Kubernetes, the distributed key-value store that holds the entire cluster state. By working on diagnosis and recovery tooling for etcd, Broadcom is attacking the problem of Kubernetes resilience from both the application layer (Velero) and the core orchestration layer (etcd). This holistic approach suggests they are doubling down on the stability of the entire cloud-native stack.

The Future of the Kubernetes Ecosystem

As organizations scale their cloud-native workloads, the focus is inevitably shifting from simple orchestration to long-term resilience and data management. We are moving out of the “experimental” phase of Kubernetes and into the “industrial” phase. In this new era, the ability to recover from a catastrophic failure is just as important as the ability to deploy a new feature.

The move of Velero into the CNCF Sandbox is a bellwether for the rest of the industry. It signals that the most critical components of the cloud-native stack are moving toward a model of shared stewardship. This reduces the “single point of failure” risk for the entire industry. When a tool becomes a community standard, it gains a level of scrutiny, innovation, and longevity that no single company can provide on its own.

For the individual engineer, this means more tools, better documentation, and a more stable career path. For the enterprise, it means a more predictable and secure way to manage the most valuable asset they own: their data. The transition of Velero is not just a change in a GitHub repository; it is a maturation of the entire Kubernetes movement.

The shift toward CNCF governance for Velero marks a significant milestone in the maturation of cloud-native infrastructure, ensuring that the tools we rely on for data integrity remain robust, neutral, and community-driven for years to come.