Stronger Models Lose by 21% on Hybrid Data Tasks

Prev Article Next Article

When it comes to building AI agents, data teams often face a common challenge: questions that require joining structured data with unstructured content. These hybrid data tasks can be the Achilles’ heel of single-turn Retrieval-Augmented Generation (RAG) systems, which struggle to handle queries that mix precise structured filters with open-ended semantic searches.

hybrid data tasks

Hybrid Data Tasks: The Unsolved Problem

Imagine a question like “Which of our products have had declining sales over the past three months, and what potentially related issues are brought up in customer reviews on various seller sites?” This type of query requires combining structured data from a warehouse with unstructured content from review sentiment across seller sites. Single-turn RAG systems, which rely on a single query to retrieve relevant information, often fail to deliver accurate results in such cases.

Why Single-Turn RAG Fails

The core issue with single-turn RAG is its inability to split complex queries into manageable parts, route each half to the right data source, and combine the results. Consider the example above: a single-turn RAG system cannot issue a SQL query to retrieve sales data and a search query to extract review sentiment simultaneously. This limitation makes it difficult for RAG systems to handle hybrid data tasks effectively.

New Research from Databricks: A Breakthrough in Hybrid Data Tasks

Recently, Databricks’ AI research team tested a multi-step agentic approach against state-of-the-art single-turn RAG baselines across nine enterprise knowledge tasks. The results were impressive: the multi-step agent reported gains of 20% or more on Stanford’s STaRK benchmark suite, a widely used evaluation framework for semi-structured retrieval tasks.

The Performance Gap: An Architectural Problem

Databricks argues that the performance gap between single-turn RAG and multi-step agents on hybrid data tasks is an architectural problem, not a model quality problem. This means that the issue lies not with the individual models themselves, but with the way they are designed to handle complex queries.

The Supervisor Agent: A Production-Ready Solution

Databricks built the Supervisor Agent as the production implementation of this research approach. The agent’s architecture illustrates why the gains are consistent across task types. The Supervisor Agent includes three core steps:

Parallel Tool Decomposition

Instead of issuing one broad query and hoping the results cover both structured and unstructured needs, the Supervisor Agent fires SQL and vector search calls simultaneously. This parallel step allows the agent to handle queries that cross data type boundaries without requiring the data to be normalized first.

You may also enjoy reading: The Hidden Cost of Complexity: 7 Reasons You're Paying a Swarm Tax for AI Solutions.

Self-Correction

When an initial retrieval attempt hits a dead end, the Supervisor Agent detects the failure, reformulates the query, and tries a different path. This self-correction mechanism enables the agent to adapt to complex queries and improve its performance over time.

Declarative Configuration

The Supervisor Agent is not tuned to any specific dataset or task. Connecting it to a new data source means that the agent can learn to handle new types of queries without requiring extensive retraining.

Hybrid Data Tasks: The Practical Consequence

The performance gap between single-turn RAG and multi-step agents on hybrid data tasks has significant practical implications. As Michael Bendersky, research director at Databricks, noted, “RAG works, but it doesn’t scale.” If you want to make your agent even better and understand why you have declining sales, you need to help the agent see the tables and look at the sales data. Your RAG pipeline will become incompetent at that task.

Addressing the Class of Questions Enterprises Most Commonly Fail to Answer

The Supervisor Agent addresses the class of questions enterprises most commonly fail to answer with current agent architectures. By handling hybrid data tasks effectively, the Supervisor Agent can provide accurate and relevant information to support business decisions.