Analyze Pandas Dataframes With OpenAI and LlamaIndex
LlamaIndex is used to connect LLMs with external data. In this tutorial, we will show you how to use the OpenAI GPT-3
text-davinci-003 model to query structured data and more particularly pandas dataframes.
pip you can install the LlamaIndex library as follows:
pip install llama-index
Query Pandas Dataframes with LlamaIndex
The default model is the
text-davinci-003 and for this tutorial, we will leave it as is. Before you start, you need to pass your API key as an environment variable called
OPENAI_API_KEY. Let's pass the API key and load the required libraries:
# My OpenAI Key
os.environ['OPENAI_API_KEY'] = "INSERT OPENAI KEY"
import pandas as pd
from llama_index.indices.struct_store import GPTPandasIndex
For this tutorial, we will work with the famous iris dataset. We will load it from a URL:
csv_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# using the attribute information as the column names
col_names = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width','Class']
df = pd.read_csv(csv_url, names = col_names)
Create an Index
We can create an index of our data using the
index = GPTPandasIndex(df=df)
Create a Query Engine and Run Queries
Since we have built the index, we are in a position to create a query engine:
query_engine = index.as_query_engine()
Let’s run our first query by asking the number of rows and columns.
response = query_engine.query("""Return how many rows and how many columns are in the dataset.\n