
The concept is that we split the workflow into two streams to optimize costs and stability, as proposed with the LATM architecture, with some additional enhancements for managing data and memories specific to Data Recipes …
Stream 1: Recipes Assistant
This stream uses LLM agents and more powerful models to generate code snippets (recipes) via a conversational interface. The LLM is instructed with details about data sources — API specifications and Database Schema — in order that the person creating recipes can more easily conversationally program recent skills. Importantly, the method implements a review stage where generated code and results may be verified and modified by a human before being committed to memory. For best code generation, this stream uses more powerful models and autonomous agents, incurring higher costs per request. Nevertheless, there may be less traffic so costs are controlled.
Stream 2: Data Evaluation Assistant
This stream is utilized by the broader group of end-users who’re asking questions on data. The system checks memory to see if their request exists as a fact, e.g. “What’s the population of Mali?”. If not, it checks recipes to see if it has a skill to get the reply, eg ‘How you can get the population of any country’. If no memory or skill exists, a request is distributed to the recipes assistant queue for the recipe to be added. Ideally, the system may be pre-populated with recipes before launch, however the recipes library can actively grow over time based on user telemetry. Note that the tip user stream doesn’t generate code or queries on the fly and due to this fact can use less powerful LLMs, is more stable and secure, and incurs lower costs.
Asynchronous Data Refresh
To enhance response times for end-users, recipes are refreshed asynchronously where feasible. The recipe memory incorporates code that may be run on a set schedule. Recipes may be preemptively executed to prepopulate the system, for instance, retrieving the entire population of all countries before end-users have requested them. Also, cases that require aggregation across large volumes of information extracted from APIs may be run out-of-hours, mitigating —albeit partially— the limitation of aggregate queries using API data.
Memory Hierarchy — remembering skills in addition to facts
The above implements a hierarchy of memory to avoid wasting ‘facts’ which may be promoted to more general ‘skills’. Memory retrieval promotion to recipes are achieved through a mix of semantic search and LLM reranking and transformation, for instance prompting an LLM to generate a general intent and code, eg ‘Get total population for any country’ from a selected intent and code, eg ‘What’s the entire population of Mali?’.
Moreover, by routinely including recipes as available functions to the code generation LLM, its reusable toolkit grows such that recent recipes are efficient and call prior recipes relatively than generating all code from scratch.
By capturing data evaluation requests from users and making these highly visible within the system, transparency is increased. LLM-generated code may be closely scrutinized, optimized, and adjusted, and answers produced by such code are well-understood and reproducible. This acts to cut back the uncertainty many LLM applications face around factual grounding and hallucination.
One other interesting aspect of this architecture is that it captures specific data evaluation requirements and the frequency these are requested by users. This may be used to speculate in additional heavily utilized recipes bringing advantages to finish users. For instance, if a recipe for generating a humanitarian response situation report is accessed steadily, the recipe code for that report can improved proactively.
This approach opens up the potential for a community-maintained library of information recipes spanning multiple domains — a Data Recipes Hub. Much like code snippet web sites that exist already, it might add the dimension of information in addition to help users in creation by providing LLM-assisted conversational programming. Recipes could receive repute points and other such social platform feedback.
As with all architecture, it could not work well in all situations. An enormous part of information recipes is geared towards reducing costs and risks related to creating code on the fly and as an alternative constructing a reusable library with more transparency and human-in-the-loop intervention. It would after all be the case that a user can request something recent not already supported within the recipe library. We will construct a queue for these requests to be processed, and by providing LLM-assisted programming expect development times to be reduced, but there might be a delay to the end-user. Nevertheless, that is an appropriate trade-off in lots of situations where it’s undesirable to set free LLM-generated, unmoderated code.
One other thing to contemplate is the asynchronous refresh of recipes. Depending on the quantity of information required, this will likely grow to be costly. Also, this refresh may not work well in cases where the source data changes rapidly and users require this information in a short time. In such cases, the recipe can be run each time relatively than the result retrieved from memory.
The refresh mechanism should help with data aggregation tasks where data is sourced from APIs, but there still looms the indisputable fact that the underlying raw data might be ingested as a part of the recipe. This after all is not going to work well for large data volumes, but it surely’s no less than limiting ingestion based on user demand relatively than attempting to ingest a whole distant dataset.
Finally, as with all ‘Chat with Data’ applications, they’re only ever going to be nearly as good as the information they’ve access to. If the specified data doesn’t exist or is of low quality, then perceived performance might be poor. Moreover, common inequity and bias exist in datasets so it’s vital a knowledge audit is carried out before presenting insights to the user. This isn’t specific to Data Recipes after all, but certainly one of the largest challenges posed in operationalizing such techniques. Garbage in, garbage out!
The proposed architecture goals to deal with a number of the challenges faced with LLM “Chat With Data”, by being …
- Transparent — Recipes are highly visible and reviewed by a human before being promoted, mitigating issues around LLM hallucination and summarization
- Deterministic — Being code, they may produce the identical results every time, unlike LLM summarization of information
- Performant — Implementing a memory that captures not only facts but skills, which may be refreshed asynchronously, improves response times
- Inexpensive— By structuring the workflow into two streams, the high-volume end-user stream can use lower-cost LLMs
- Secure — The important group of end-users don’t trigger the generation and execution of code or queries on the fly, and any code undergoes human assessment for safety and accuracy
I might be posting a set of follow-up blog posts detailing the technical implementation of Data Recipes as we work through user testing at DataKind.
Large Language Models as Tool Makers, Cai et al, 2023.
Unless otherwise noted, all images are by the creator.
Please like this text if inclined and I’d be delighted if you happen to followed me! Yow will discover more articles here.