Use GPT AI to query a SQL database of options prices

June 17, 2023
Facebook logo.
Twitter logo.
LinkedIn logo.

Use GPT AI to query a SQL database of options prices

I’ve been down the rabbit hole with LLMs for about three months now. One of the most exciting things I’ve found is LangChain which lets you build applications based on GPT AI. A few weeks ago I showed you how to read PDFs. Today I’m going to show you how to query a SQL database.

The plan?

Firstly, you'll populate a SQL database with options data. Next, you'll use OpenAI's GPT-3.5, a cutting-edge LLM, to craft SQL queries and extract answers from the data.

Use GPT AI to query a SQL database of options prices

LangChain is a tool that allows users to build applications with LLMs. Using LangChain’s Tools and Agents, you can use LLMs to write complex SQL to query relational databases. The key behind Agents is giving LLM's the possibility of using tools in their workflow. This is where LangChain departs from the ChatGPT.

LangChain uses GPT AI to interpret the query and return the desired results. It can be used to query a variety of databases, including relational databases, NoSQL databases, and graph databases.

By using LangChain Tools and Agents with SQL, you can streamline your data analysis process and make more informed decisions.

And the best part?

Today you’ll get the code to do it.

Imports and set up

We will use the Agents to interact with a small sample database of options. But first, you need to import pandas, the OpenBB SDK, and SQLAlchemy. You can use these to download and store the data. Then you’ll need the LangChain imports.

1import os
3import pandas as pd
5from openbb_terminal.sdk import openbb
6from sqlalchemy import create_engine
8from langchain import OpenAI
9from langchain.sql_database import SQLDatabase
10from langchain.chains import SQLDatabaseChain
11from langchain.agents import Tool, load_tools, initialize_agent
13os.environ["OPENAI_API_KEY"] = "<your API key>"
14engine = create_engine("sqlite:///:memory:")

In today’s issue, I show you how to use a SQLite database. But you can connect to, and use any relational database. Even remote ones.

Grab the data and create the table.

1symbols = ["META", "AMZN", "AAPL", "NFLX", "GOOG"]
3prices =
5chains = []
6for symbol in symbols:
7    chain = openbb.stocks.options.chains(symbol)
8    chain["symbol"] = symbol
9    chain["underlying_last"] = prices.iloc[-1][symbol]
10    chains.append(chain)
12options_chains = pd.concat(chains)

This code iterates through the FAANG stocks, grabs price data, and all the options chains. You need get price data separately because the options chains don’t have them.

Use pandas to store the data into the SQLite database.

1options_chains.to_sql("options", con=engine, index=False)

Set up LangChain chains, tools, and agents

The first step is to create an instance of the LLM, a DB, and a SQL DB chain.

1llm = OpenAI(temperature=0)
2db = SQLDatabase(engine)
3sql_chain = SQLDatabaseChain.from_llm(
4    llm=llm, 
5    db=db, 
6    verbose=True

This shows the power of LangChain. You now have a Python object that represents OpenAI’s cutting edge LLM.

Next, create the custom LangChain tool that uses the SQL Chain to build SQL queries.

1sql_tool = Tool(
2    name="Options DB",
4    description="Query options data.",

Since LLMs are notoriously bad at basic math, LangChain has a built in math tool. This helps the LLM do complex math. Load it and append the SQL tool you just built.

1tools = load_tools(["llm-math"], llm=llm)

Finally, initialize the LangChain agent.

1zero_shot_agent = initialize_agent(
2    agent="zero-shot-react-description",
3    tools=tools,
4    llm=llm,
5    verbose=True,
6    max_iterations=5,

This uses a zero-shot ReAct agent with no memory. That means it takes a single prompt and uses the frozen data in the LLM to respond. ReAct is a prompt engineering technique that alternatives between “acting” and “reasoning.” You’ll see it do this in the output.

Create prompt to generate the SQL

To run the agent, create a prompt.

1prompt = """
2What are the last prices of 5 META call options with expiration 
3date greater than 60 days away and strike price within 5% of 
4the underlying price?
6Create a list of the options and include the expiration date,
7strike price, and last price. Use that list to create a table
8using the following template:
10Expiration    Strike    Price
12expiration_date    strike    last
14expiration_date    strike    last
16If there are no results, print 'no results.'

I ask the LLM for the last prices of 5 META call options with expiration date greater than 60 days and a strike price within 5% of the underlying. Then I use the prompt to instruct the LLM to format the response in a table format.

Run the agent.

2FROM options 
3WHERE symbol = 'META' 
4AND "optionType" = 'call' 
5AND expiration > date('now', '+60 days') 
6AND strike 
7BETWEEN underlying_last * 0.95 
8AND underlying_last * 1.05 

You will see the LangChain agent explain its reasoning while it constructs the SQL. In the output you can find the query:

2FROM options 
3WHERE symbol = 'META' 
4AND "optionType" = 'call' 
5AND expiration > date('now', '+60 days') 
6AND strike 
7BETWEEN underlying_last * 0.95 
8AND underlying_last * 1.05 

From there, the query is executed and the results are output in a table.

Here’s another, more complex prompt.

1prompt = """
2What are the 5 options from each symbol with 
3expiration date between 40 and 60 days away, a strike 
4price within 5% of the underlying price, open interest 
5greater than 100, and the difference between the ask and 
6the bid less than 0.05?

If you’re familiar with SQL, you might expect the SQL to include a GROUP BY clause, which it does not. This demonstrates one of the downsides of using LLMs to generate SQL. But with enough iterations, you will find the right prompt that generates the results you're looking for.

There’s two action steps today.

The first is to use the code to create a few separate tables with a common key among them. Then write prompts to ask questions that require joins and aggregates.

The second is to connect to a remote database with SQLAlchemy and use LangChain in the same way to query it.

Man with glasses and a wristwatch, wearing a white shirt, looking thoughtfully at a laptop with a data screen in the background.