Dynamic Workflows: Durable Execution for Tenants

Prev Article Next Article

Platform engineers have long faced a frustrating trade-off. You want to give each customer their own customized workflow logic, but your durable execution engine demands that every workflow class be baked into the deployment. Change one tenant’s pipeline, and you redeploy the entire application. Cloudflare’s new Dynamic Workflows library shatters that constraint. It lets every tenant, agent, or request run its own durable execution code at runtime without requiring you to know what that code looks like ahead of time. The implications for multi-tenant platforms are enormous.

dynamic workflows

What Makes Dynamic Workflows Different

Before Dynamic Workflows, Cloudflare Workflows required a one-to-one relationship between bindings and workflow classes. If you wanted to support ten different pipeline shapes, you needed ten separate bindings, all defined at deploy time. That worked fine for single-tenant applications. But for platforms serving hundreds or thousands of tenants, it created a maintenance nightmare. Every new workflow variation meant another deployment cycle.

Dynamic Workflows removes that constraint entirely. The library, which is MIT-licensed and roughly 300 lines of TypeScript, extends Cloudflare’s durable execution engine so that workflow code can differ per tenant, per agent, or per request at runtime. When a tenant calls env.WORKFLOWS.create(.), it looks like a normal Workflow binding. Behind the scenes, a Worker Loader wraps the call with tenant metadata, the engine persists the payload, and when execution resumes seconds, hours, or days later, the metadata routes back to the correct tenant’s code.

What makes this genuinely novel is that all the standard durable execution semantics remain intact. Workflow IDs, pause and resume, retries, hibernation, step.sleep('24 hours'), and step.waitForEvent() all work exactly as they did before. The only difference is that the code being executed is resolved dynamically rather than statically at deploy time.

How the Worker Loader Routes Execution Per Tenant

The architectural key is a Worker Loader that sits between the Workflows engine and each tenant’s code, which Cloudflare calls a Dynamic Worker. When the engine wakes up to execute a step, the Worker Loader uses the stored tenant metadata to fetch and dispatch execution to the right code. This indirection is what makes per-tenant customization possible without sacrificing durability.

The library exports three primitives: createDynamicWorkflowEntrypoint, DynamicWorkflowBinding, and wrapWorkflowBinding. You import these into your own Worker, and the createDynamicWorkflowEntrypoint function accepts a loader callback that receives the environment and metadata, then returns the appropriate tenant’s workflow entrypoint. The loader can fetch tenant code from any source — a database, object storage, or even an external API — and compile it on the fly.

A typical implementation defines a loadTenant function that retrieves or generates the tenant’s module code, wraps the binding with tenant context, and returns a stub that points to the correct TenantWorkflow class. The engine never knows it is talking to different code each time. It simply calls run(event, step) on whatever entrypoint the loader provides.

Real-World Use Cases for Dynamic Workflows

The abstraction is elegant, but the real value emerges when you map it to concrete platform scenarios. Each use case reveals a different facet of what runtime-dynamic durable execution enables.

Multi-Tenant SaaS Platforms

Imagine you operate a SaaS product where each customer needs a slightly different approval workflow. One tenant wants a three-step chain with manager approval, finance sign-off, and CEO escalation. Another wants a simple two-step process with only team lead approval. A third needs a complex branch that forks based on invoice value. Before Dynamic Workflows, you would either build a monolithic workflow that handles every permutation or deploy separate workflow classes for each customer. Both approaches scale poorly.

With Dynamic Workflows, you let tenants upload their own workflow logic. Your platform stores each tenant’s code separately, and the Worker Loader dispatches to the correct version at runtime. When a tenant updates their workflow, the change takes effect immediately for new executions without any redeployment of your host application. Existing runs continue using the code version they started with, which you can manage through metadata tagging.

AI Application Platforms

Consider a platform where AI agents generate TypeScript code for each tenant. An AI writes a durable plan that includes steps like calling an external API, waiting for a human response, and writing results to a database. The plan is different for every tenant because every use case is different. Dynamic Workflows makes this pattern natural rather than forced.

The AI generates a WorkflowEntrypoint class with a custom run(event, step) function. That code gets stored as the tenant’s workflow definition. When the platform calls create(), the Worker Loader fetches the AI-generated code and hands it to the engine. The engine executes each step with full durable semantics, including hibernation during approval waits and automatic retries on transient failures. The AI never needs to think about infrastructure. It just writes the plan.

CI/CD Pipelines

This is where the architectural implications become most concrete. A CI/CD product where each repository has its own pipeline definition is a textbook case for Dynamic Workflows. The pipeline code lives in the customer’s repository as a TypeScript WorkflowEntrypoint. The platform loads it dynamically, and each step runs with full durable execution semantics.

Cloudflare’s blog post walks through a complete pipeline built on four primitives working together. Artifacts provides Git-native versioned storage with lazy tree hydration and instant fork per CI run. Dynamic Workers run lightweight steps like linting and typechecking in sandboxed isolates that boot in milliseconds. Dynamic Workflows holds the run together with durable, retryable steps that hibernate for free during approval waits. Sandboxes handle the heavy steps that need a full operating system, with snapshot-based warm starts in seconds.

Traditional CI burns a minute or more before any actual work begins. It allocates a VM, pulls a base image, clones the repo, and installs dependencies. The Dynamic Workflows stack skips all of that ceremony because the repository does not move and the compute comes to it. The code stays in place, and the execution engine reaches out to run it.

Agent Platforms and Durable Plans

Agent SDKs are another natural fit. Each agent writes its own durable plan as a run(event, step) function. The plan can include branching logic, external API calls, human-in-the-loop approval steps, and conditional retries. Dynamic Workflows makes each agent’s plan a first-class durable execution that survives crashes and network interruptions.

For agent platforms specifically, the benefit is that agents can define their own state machines without platform engineers needing to pre-deploy every possible shape. The agent writes the plan, the platform stores it, and the engine executes it reliably. This collapses the infrastructure cost of maintaining a library of predefined workflow templates because tenants define their own.

The Broader Platform Thesis

Dynamic Workflows is not an isolated feature. It is part of a deliberate platform strategy that Cloudflare has been building for years. Dynamic Workers solved the compute layer for multi-tenant dynamic code. Durable Object Facets solved the storage layer by giving each dynamically-loaded application its own isolated SQLite database. Dynamic Workflows now solves the durable execution layer.

The company has stated explicitly that every binding Workers currently exposes is heading for a dynamic counterpart. Queues, caches, databases, AI bindings, and MCP servers will all eventually be dispatchable per tenant, per agent, per request. If realized, this vision means platform engineers can stop thinking about binding management altogether. One binding serves many shapes. The infrastructure adapts to the tenant, not the other way around.

For teams scaling to hundreds of distinct workflow types, removing the one-binding constraint is transformative. It eliminates deployment complexity as a function of workflow variety. You add a new workflow shape by storing new code, not by modifying your deployment pipeline. The operational overhead of supporting many workflow types drops dramatically.

You may also enjoy reading: Norway’s $2.2 Trillion Sovereign Wealth Fund Sees 1.9% Loss.

Practical Considerations for Platform Engineers

Any powerful abstraction raises practical questions. Platform engineers evaluating Dynamic Workflows need to think through security, debugging, versioning, and cost before committing to the pattern.

Security and Isolation

What if one tenant’s workflow enters an infinite loop when a step wakes up? Cloudflare’s execution model already includes timeout limits and resource caps per step, but Dynamic Workflows adds a new surface area. Tenants upload arbitrary code that the engine executes with your credentials. The Worker Loader must validate, sandbox, and isolate each tenant’s code to prevent one tenant from consuming resources meant for another or accessing data they should not see.

Cloudflare’s approach uses sandboxed isolates that boot in milliseconds and have no access to the host environment beyond what the binding explicitly provides. The Dynamic Worker pattern ensures each tenant’s code runs in its own context with its own environment variables and bindings. Still, platform engineers should implement their own validation layer that checks tenant code for obvious issues before storing it, especially when tenants write code directly rather than through a constrained DSL.

Debugging Across Tenant-Specific Code

How do you debug a workflow that is different per tenant, especially if the tenant wrote the code? Traditional debugging tools assume a fixed codebase. Dynamic Workflows introduces variability at the execution level. When a step fails, the error trace points to tenant-specific code that you may not have written or reviewed.

Cloudflare’s answer is structured logging and metadata propagation. Each execution carries tenant context that flows through every step. You can aggregate logs by tenant, by workflow version, or by error type. The Worker Loader can inject tenant identifiers into every log line automatically. For deeper debugging, you can replay a failed workflow step with the exact tenant code version that was active when the failure occurred. Version-stamping each stored workflow definition is essential for this capability.

Versioning Long-Running Workflows

How do you manage versioning when tenants update their workflow code while existing runs are still alive? A durable execution may run for days or weeks, with steps that hibernate between activations. If the tenant updates their code mid-execution, the engine needs to know which version to use for subsequent steps.

The recommended pattern is to snapshot the workflow code version at creation time and store it as part of the workflow metadata. The Worker Loader then retrieves that specific version for every step activation, not the latest version. This ensures deterministic replay and avoids the confusing scenario where a step resumes with logic that did not exist when the workflow started. Tenants can deploy new versions freely without worrying about breaking in-flight executions.

Cost Implications

What are the cost implications of storing many small workflow definitions versus a single large class? The trade-off is nuanced. Storing many small definitions increases storage overhead and may increase the number of Worker Loader invocations. But it reduces the complexity of each individual workflow, which can lower execution costs because simpler workflows use fewer compute resources per step.

For platforms with hundreds or thousands of tenants, the per-tenant storage cost is negligible compared to the operational savings from eliminating deployment cycles. Each workflow definition is just a small TypeScript module. The real cost driver is the number of step executions, not the number of definitions. Dynamic Workflows shifts the cost model from deploy-time overhead to runtime granularity, which is almost always cheaper at scale.

How Dynamic Workflows Compares to Alternatives

The competitive context for per-tenant durable execution is thin. Temporal and Inngest are the most prominent alternatives, but neither offers the same dynamic per-tenant code loading at the isolation level that Cloudflare provides. Temporal requires you to register workflow types with the server before execution, and while you can register many types, each one is still a static class in your deployment. Inngest uses a function-based model that is more flexible, but it still expects you to define all functions ahead of time rather than loading them dynamically per tenant.

Cloudflare’s approach is different because the code itself is resolved at runtime, not at deploy time. The Worker Loader pattern means the engine never needs to know the full set of possible workflow shapes. It discovers them on demand. This makes Dynamic Workflows uniquely suited for platforms where tenants define their own logic, such as AI application platforms, agent SDKs, and custom workflow builders.

The MIT license on the library is another differentiator. It invites community contributions and alternative implementations beyond Cloudflare’s own infrastructure. You could theoretically adapt the pattern to run on other durable execution engines, though the tight integration with Cloudflare’s Worker runtime and Durable Objects makes that non-trivial.

Dynamic Workflows reopens the question of how to securely isolate multi-tenant code execution at scale. Every platform engineer building a multi-tenant product should evaluate whether their current durable execution strategy is flexible enough for the next generation of use cases. The answer, increasingly, is that deploy-time binding is a bottleneck that runtime-dynamic resolution removes.