Table-augmented generation shows promise for complex dataset querying, outperforms text-to-SQL

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


AI has transformed the way companies work and interact with data. A few years ago, teams had to write SQL queries and code to extract useful information from large swathes of data. Today, all they have to do is type in a question. The underlying language model-powered systems do the rest of the job, allowing users to simply talk to their data and get the answer immediately.

The shift to these novel systems serving natural language questions to databases has been prolific but still has some issues. Essentially, these systems are still unable to handle all sorts of queries. This is what researchers from UC Berkeley and Stanford are now striving to solve with a new approach called table-augmented generation, or TAG.

It is a unified and general-purpose paradigm that represents a wide range of previously unexplored interactions between the language model (LM) and database and creates an exciting opportunity for leveraging the world knowledge and reasoning capabilities of LMs over data, the UC Berkeley and Stanford researchers wrote in a paper detailing TAG.

How does table-augmented generation work?

Currently, when a user asks natural language questions over custom data sources, two main approaches come into play: text-to-SQL or retrieval-augmented generation (RAG). 

While both methods do the job pretty well, users begin running into problems when questions grow complex and transcend beyond the systems’ capabilities. For instance, existing text-to-SQL methods — that convert a text prompt into a SQL query that could be executed by databases — focus only on natural language questions that can be expressed in relational algebra, representing a small subset of questions users may want to ask. Similarly, RAG, another popular approach to working with data, considers only queries that can be answered with point lookups to one or a few data records within a database.

Both approaches were often found to be struggling with natural language queries requiring semantic reasoning or world knowledge beyond what’s directly available in the data source.

“In particular, we noted that real business users’ questions often require sophisticated combinations of domain knowledge, world knowledge, exact computation, and semantic reasoning,” the researchers write. “Database systems provide (only) a source of domain knowledge through the up-to-date data they store, as well as exact computation at scale (which LMs are bad at),”

To address this gap, the group proposed TAG, a unified approach that uses a three-step model for conversational querying over databases. 

In the first step, an LM deduces which data is relevant to answer a question and translates the input to an executable query (not just SQL) for that database. Then, the system leverages the database engine to execute that query over vast amounts of stored information and extract the most relevant table. 

Finally, the answer generation step kicks in and uses an LM over the computed data to generate a natural language answer to the user’s original question.

With this approach, language models’ reasoning capabilities are incorporated in both the query synthesis and answer generation steps and the database systems’ query execution overcomes RAG’s inefficiency in handling computational tasks like counting, math and filtering. This enables the system to answer complex questions requiring both semantic reasoning and world knowledge as well as domain knowledge. 

For example, it could answer a question seeking the summary of reviews given to highest highest-grossing romance movie considered a ‘classic’. 

The question is challenging for traditional text-to-SQL and RAG systems as it requires the system to not only find the highest-grossing romance movie from a given database, but also determine whether it’s a classic or not using world knowledge. With TAG’s three-step approach, the system would generate a query for the relevant movie-associated data, execute the query with filters and an LM to come up with a table of classic romance movies sorted by revenue, and ultimately summarize the reviews for the highest-ranked movie in the table giving the desired answer.

Significant improvement in performance

To test the effectiveness of TAG, the researchers tapped BIRD, a dataset known for testing the text-to-SQL prowess of LMs, and enhanced it with questions requiring semantic reasoning of world knowledge (going beyond the information in the model’s data source). The modified benchmark was then used to see how handwritten TAG implementations fare against several baselines, including text-to-SQL and RAG.

In the results, the team found that all baselines achieved no more than 20% accuracy, while TAG did far better with 40% or better accuracy.

“Our hand-written TAG baseline answers 55% of queries correctly overall, performing best on comparison queries with an exact match accuracy of 65%,” the authors noted. “The baseline performs consistently well with over 50% accuracy on all query types except ranking queries, due to the higher difficulty in ordering items exactly. Overall, this method gives us between a 20% to 65% accuracy improvement over the standard baselines.”

Beyond this, the team also found that TAG implementations lead to three times faster query execution than other baselines.

While the approach is new, the results clearly indicate that it can give enterprises a way to unify AI and database capabilities to answer complex questions over structured data sources. This could enable teams to extract more value from their datasets, without going through writing complex code.

That said, it is also important to note that the work may need further fine-tuning. The researchers have also suggested further research into building efficient TAG systems and exploring the rich design space it offers. The code for the modified TAG benchmark has been released on GitHub to allow further experimentation.



Source link