Simplifying complicated data through conversation
TL;DR
In this text, we explore the best way to create a conversational AI agent using climate change data from the wonderful Probable Futures API and the brand new OpenAI Assistants API. The AI agent is in a position to answer questions on how climate might affect a specified location and in addition perform basic data evaluation. AI assistants could be well-suited to tasks like this, providing a promising channel for presenting complex data to non-technical users.
I used to be recently chatting with a neighbor about how climate change might affect us and the way best to organize homes for extreme weather events. There are some amazing web sites that provide information related to this in map form, but I wondered if sometimes people might simply need to ask questions like “How will my home be affected by climate change?” and “What can I do about it?” and get a concise summary with recommendations on the best way to prepare. So I made a decision to explore a few of the AI tools made available in the previous couple of weeks.
AI agents powered by large language models like GPT-4 are emerging as a way for people to interact with documents and data through conversation. These agents interpret what the person is asking, call APIs and databases to get data, generate and run code to perform evaluation, before presenting results back to the user. Sensible frameworks like langchain and autogen are leading the best way, providing patterns for easily implementing agents. Recently, OpenAI joined the party with their launch of GPTs as a no-code option to create agents, which I explored in this text. These are designed thoroughly and open the best way for a much wider audience but they do have a number of limitations. They require an API with an openapi.json specification, which implies they don’t currently support standards comparable to graphql. In addition they don’t support the flexibility to register functions, which is to be expected for a no-code solution but can limit their capabilities.
Enter OpenAI’s other recent launch — Assistants API.
Assistants API (in beta) is a programmatic option to configure OpenAI Assistants which supports functions, web browsing, and knowledge retrieval from uploaded documents. The functions are an enormous difference in comparison with GPTs, as these enable more complex interaction with external data sources. Functions are where Large Language Models (LLMs) like GPT-4 are made aware that some user input should end in a call to a code function. The LLM will generate a response in JSON format with the precise parameters needed to call the function, which may then be used to execute locally. To see how they work intimately with OpenAI, see here.
For us to give you the option to create an AI agent to assist with preparing for climate change, we want an excellent source of climate change data and an API to extract that information. Any such resource must apply a rigorous approach to mix General Circulation Model (GCM) predictions.
Luckily, the oldsters at Probable Futures have done a tremendous job!
Probable Futures is “A non-profit climate literacy initiative that makes practical tools, stories, and resources available online to everyone, in every single place.”, they usually provide a series of maps and data based on the CORDEX-CORE framework, a standardization for climate model output from the REMO2015 and REGCM4 regional climate models. [ Side note: I am not affiliated with Probable Futures ]
Importantly, they supply a GraphQL API for accessing this data which I could access after requesting an API key.
Based on the documentation I created functions which I saved right into a file assistant_tools.py
…
pf_api_url = "https://graphql.probablefutures.org"
pf_token_audience = "https://graphql.probablefutures.com"
pf_token_url = "https://probablefutures.us.auth0.com/oauth/token"def get_pf_token():
client_id = os.getenv("CLIENT_ID")
client_secret = os.getenv("CLIENT_SECRET")
response = requests.post(
pf_token_url,
json={
"client_id": client_id,
"client_secret": client_secret,
"audience": pf_token_audience,
"grant_type": "client_credentials",
},
)
access_token = response.json()["access_token"]
return access_token
def get_pf_data(address, country, warming_scenario="1.5"):
variables = {}
location = f"""
country: "{country}"
address: "{address}"
"""
query = (
"""
mutation {
getDatasetStatistics(input: { """
+ location
+ """
warmingScenario: """" + warming_scenario + """"
}) {
datasetStatisticsResponses{
datasetId
midValue
name
unit
warmingScenario
latitude
longitude
info
}
}
}
"""
)
print(query)
access_token = get_pf_token()
url = pf_api_url + "/graphql"
headers = {"Authorization": "Bearer " + access_token}
response = requests.post(
url, json={"query": query, "variables": variables}, headers=headers
)
return str(response.json())
I intentionally excluded datasetId
to be able to retrieve all indicators in order that the AI agent has a wide selection of data to work with.
The API is powerful in that it accepts towns and cities in addition to full addresses. For instance …
get_pf_data(address="Latest Delhi", country="India", warming_scenario="1.5")
Returns a JSON record with climate change information for the placement …
{'data': {'getDatasetStatistics': {'datasetStatisticsResponses': [{'datasetId': 40601, 'midValue': '17.0', 'name': 'Change in total annual precipitation', 'unit': 'mm', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40616, 'midValue': '14.0', 'name': 'Change in wettest 90 days', 'unit': 'mm', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40607, 'midValue': '19.0', 'name': 'Change in dry hot days', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40614, 'midValue': '0.0', 'name': 'Change in snowy days', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40612, 'midValue': '2.0', 'name': 'Change in frequency of “1-in-100-year” storm', 'unit': 'x as frequent', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40101, 'midValue': '28.0', 'name': 'Average temperature', 'unit': '°C', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40901, 'midValue': '4.0', 'name': 'Climate zones', 'unit': 'class', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {'climateZoneName': 'Dry semi-arid (or steppe) hot'}}, {'datasetId': 40613, 'midValue': '49.0', 'name': 'Change in precipitation “1-in-100-year” storm', 'unit': 'mm', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40701, 'midValue': '7.0', 'name': 'Likelihood of year-plus extreme drought', 'unit': '%', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40702, 'midValue': '30.0', 'name': 'Likelihood of year-plus drought', 'unit': '%', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40704, 'midValue': '5.0', 'name': 'Change in wildfire danger days', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40703, 'midValue': '-0.2', 'name': 'Change in water balance', 'unit': 'z-score', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40201, 'midValue': '21.0', 'name': 'Average nighttime temperature', 'unit': '°C', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40205, 'midValue': '0.0', 'name': 'Freezing days', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40301, 'midValue': '71.0', 'name': 'Days above 26°C (78°F) wet-bulb', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40302, 'midValue': '24.0', 'name': 'Days above 28°C (82°F) wet-bulb', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40303, 'midValue': '2.0', 'name': 'Days above 30°C (86°F) wet-bulb', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40102, 'midValue': '35.0', 'name': 'Average daytime temperature', 'unit': '°C', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40103, 'midValue': '49.0', 'name': '10 hottest days', 'unit': '°C', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40104, 'midValue': '228.0', 'name': 'Days above 32°C (90°F)', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40105, 'midValue': '187.0', 'name': 'Days above 35°C (95°F)', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40106, 'midValue': '145.0', 'name': 'Days above 38°C (100°F)', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40202, 'midValue': '0.0', 'name': 'Frost nights', 'unit': 'nights', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40304, 'midValue': '0.0', 'name': 'Days above 32°C (90°F) wet-bulb', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40305, 'midValue': '29.0', 'name': '10 hottest wet-bulb days', 'unit': '°C', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40203, 'midValue': '207.0', 'name': 'Nights above 20°C (68°F)', 'unit': 'nights', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40204, 'midValue': '147.0', 'name': 'Nights above 25°C (77°F)', 'unit': 'nights', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}]}}}
Next, we want to construct the AI assistant using the beta API. There are some good resources within the documentation and in addition the very useful OpenAI Cookbook. Nonetheless, being so latest and in beta, there isn’t that much information around yet so at times it was a little bit of trial and error.
First, we want to configure tools the assistant can use comparable to the function to get climate change data. Following the documentation …
get_pf_data_schema = {
"name": "get_pf_data",
"parameters": {
"type": "object",
"properties": {
"address": {
"type": "string",
"description": ("The address of the placement to get data for"),
},
"country": {
"type": "string",
"description": ("The country of location to get data for"),
},
"warming_scenario": {
"type": "string",
"enum": ["1.0", "1.5", "2.0", "2.5", "3.0"],
"description": ("The warming scenario to get data for. Default is 1.5"),
}},
"required": ["address", "country"],
},
"description": """
That is the API call to the probable futures API to get predicted climate change indicators for a location
""",
}
You’ll notice we’ve provided text descriptions for every parameter within the function. From experimentation, this appears to be utilized by the agent when populating parameters, so take care to be as clear as possible and to notice any idiosyncracies so the LLM can adjust. From this we define the tools …
tools = [
{
"type": "function",
"function": get_pf_data_schema,
}
{"type": "code_interpreter"},
]
You’ll notice I left code_interpretor in, giving the assistant the flexibility to run code needed for data evaluation.
Next, we want to specify a set of user instructions (a system prompt). These are absolutely key in tailoring the assistents’s performance to our task. Based on some quick experimentation I arrived at this set …
instructions = """
"Hello, Climate Change Assistant. You help people understand how climate change will affect their homes"
"You'll use Probable Futures Data to predict climate change indicators for a location"
"You'll summarize perfectly the returned data"
"You can even provide links to local resources and web sites to assist the user prepare for the expected climate change"
"In the event you haven't got enough address information, request it"
"You default to warming scenario of 1.5 if not specified, but ask if the user desires to try others after presenting results"
"Group results into categories"
"At all times link to the probable futures website for the placement using URL and replacing LATITUDE and LONGITUDE with location values: https://probablefutures.org/maps/?selected_map=days_above_32c&map_version=latest&volume=heat&warming_scenario=1.5&map_projection=mercator#9.2/LATITUDE/LONGITUDE"
"GENERATE OUTPUT THAT IS CLEAR AND EASY TO UNDERSTAND FOR A NON-TECHNICAL USER"
"""
You may see I’ve added instructions for the assistant to supply resources comparable to web sites to assist users prepare for climate change. It is a bit ‘Open’, for a production assistant we’d probably want tighter curation of this.
One wonderful thing that’s now possible is we can even instruct regarding general tone, within the above case requesting that output is evident to a non-technical user. Obviously, all of this needs some systematic prompt engineering, but it surely’s interesting to notice how we now ‘Program’ partially through persuasion. 😊
OK, now we now have our tools and directions, let’s create the assistant …
import os
from openai import AsyncOpenAI
import asyncio
from dotenv import load_dotenv
import sysload_dotenv()
api_key = os.environ.get("OPENAI_API_KEY")
assistant_id = os.environ.get("ASSISTANT_ID")
model = os.environ.get("MODEL")
client = AsyncOpenAI(api_key=api_key)
name = "Climate Change Assistant"
try:
my_assistant = await client.beta.assistants.retrieve(assistant_id)
print("Updating existing assistant ...")
assistant = await client.beta.assistants.update(
assistant_id,
name=name,
instructions=instructions,
tools=tools,
model=model,
)
except:
print("Creating assistant ...")
assistant = await client.beta.assistants.create(
name=name,
instructions=instructions,
tools=tools,
model=model,
)
print(assistant)
print("Now save the DI in your .env file")
The above assumes we now have defined keys and our agent id in a .env
file. You’ll notice the code first checks to see if the agent exists using the ASSISTANT_ID
within the .env
file and update it if that’s the case, otherwise it creates a brand-new agent and the ID generated should be copied to the .env
file. Without this, I used to be making a LOT of assistants!
Once the assistant is created, it becomes visible on the OpenAI User Interface where it might be tested within the Playground. Since many of the development and debugging related to operate calls actually calling code, I didn’t find the playground super useful for this evaluation, but it surely’s designed nicely and is likely to be useful in other work.
For this evaluation, I made a decision to make use of the brand new GPT-4-Turbo model by setting model
to “gpt-4–1106-preview”.
We would like to give you the option to create a full chatbot, so I began with this chainlit cookbook example, adjusting it barely to separate agent code right into a dedicated file and to access via …
import assistant_tools as at
Chainlit could be very concise and the user interface easy to establish, yow will discover the code for the app here.
Putting all of it together — see code here — we start the agent with a straightforward chainlit run app.py
…
Let’s ask a few location …
Noting above that I intentionally misspelled Mombasa.
The agent then starts its work, calling the API and processing the JSON response (it took about 20 seconds) …
Based on our instructions, it then finishes off with …
But is it right?
Let’s call the API and review the output …
get_pf_data(address="Mombassa", country="Kenya", warming_scenario="1.5")
Which queries the API with …
mutation {
getDatasetStatistics(input: {
country: "Kenya"
address: "Mombassa"
warmingScenario: "1.5"
}) {
datasetStatisticsResponses{
datasetId
midValue
name
unit
warmingScenario
latitude
longitude
info
}
}
}
This provides the next (truncated to simply display a number of) …
{
"data": {
"getDatasetStatistics": {
"datasetStatisticsResponses": [
{
"datasetId": 40601,
"midValue": "30.0",
"name": "Change in total annual precipitation",
"unit": "mm",
"warmingScenario": "1.5",
"latitude": -4,
"longitude": 39.6,
"info": {}
},
{
"datasetId": 40616,
"midValue": "70.0",
"name": "Change in wettest 90 days",
"unit": "mm",
"warmingScenario": "1.5",
"latitude": -4,
"longitude": 39.6,
"info": {}
},
{
"datasetId": 40607,
"midValue": "21.0",
"name": "Change in dry hot days",
"unit": "days",
"warmingScenario": "1.5",
"latitude": -4,
"longitude": 39.6,
"info": {}
},
{
"datasetId": 40614,
"midValue": "0.0",
"name": "Change in snowy days",
"unit": "days",
"warmingScenario": "1.5",
"latitude": -4,
"longitude": 39.6,
"info": {}
},
{
"datasetId": 40612,
"midValue": "1.0",
"name": "Change in frequency of u201c1-in-100-yearu201d storm",
"unit": "x as frequent",
"warmingScenario": "1.5",
"latitude": -4,
"longitude": 39.6,
"info": {}
},.... etc
}
]
}
}
}
Spot-checking it appears that evidently the agent captured them perfectly and presented to the user an accurate summary.
The AI agent could be improved through some instructions about the way it presents information.
Considered one of the instructions was to all the time generate a link to the map visualization back on the Probable Futures website, which when clicked goes to the proper location …
One other instruction asked the agent to all the time prompt the user to try other warming scenarios. By default, the agent produces results for a predicted 1.5C global increase in temperature, but we allow the user to explore other — and somewhat depressing — scenarios.
Since we gave the AI agent the code interpreter skill, it should give you the option to execute Python code to do basic data evaluation. Let’s do this out.
First I asked for a way climate change would affect London and Latest York, to which the agent provided summaries. Then I asked …
This resulted within the Agent using code interpreter to generate and run Python code to create a plot …
Not bad!
Using the Probable Futures API and an OpenAI assistant we were in a position to create a conversational interface showing how people might give you the option to ask questions on climate change and get advice on the best way to prepare. The agent was in a position to make API calls in addition to do some basic data evaluation. This offers one other channel for climate awareness, which could also be more attractive to some non-technical users.
We could after all have developed a chatbot to find out intent/entities and code to handle the API, but that is more work and would should be revisited for any API changes and when latest APIs are added. Also, a Large Language Model Agent does an excellent job of interpreting user input and summarization with very limited development, and takes things to a different level in having the ability to run code and perform basic data evaluation. Our particular use-case seems particularly well suited to an AI agent since the task is constrained in scope.
There are some challenges though, the technique is a bit slow (queries took about 20–30 seconds to finish). Also, LLM token costs weren’t analyzed for this text and should be prohibitive.
That said, OpenAI Assistants API is in beta. Also the agent wasn’t tuned in any way and so with further work, extra functions for common tasks, performance and price could likely be optimized for this exciting latest technique.
This text is predicated on data and other content made available by Probable Futures, a Project of SouthCoast Community Foundation and certain of that data can have been provided to Probable Futures by Woodwell Climate Research Center, Inc. or The Coordinated Regional climate Downscaling Experiment (CORDEX)
Code for this evaluation could be found here.
Yow will discover more of my articles here.