Analyze Pandas Dataframes With OpenAI and LlamaIndex

Did you know that you can pass a pandas data frame into OpenAI

George Pipis


Image generated by DALL-E

LlamaIndex is used to connect LLMs with external data. In this tutorial, we will show you how to use the OpenAI GPT-3 text-davinci-003 model to query structured data and more particularly pandas dataframes.


Using pip you can install the LlamaIndex library as follows:

pip install llama-index

Query Pandas Dataframes with LlamaIndex

The default model is the text-davinci-003 and for this tutorial, we will leave it as is. Before you start, you need to pass your API key as an environment variable called OPENAI_API_KEY. Let's pass the API key and load the required libraries:

# My OpenAI Key
import os

import pandas as pd
from llama_index.indices.struct_store import GPTPandasIndex

Iris Dataset

For this tutorial, we will work with the famous iris dataset. We will load it from a URL:

csv_url = ''

# using the attribute information as the column names
col_names = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width','Class']
df = pd.read_csv(csv_url, names = col_names)


Create an Index

We can create an index of our data using the GPTPandasIndex:

index = GPTPandasIndex(df=df)

Create a Query Engine and Run Queries

Since we have built the index, we are in a position to create a query engine:

query_engine = index.as_query_engine()

Let’s run our first query by asking the number of rows and columns.

response = query_engine.query("""Return how many rows and how many columns are in the dataset.\n



George Pipis

Sr. Director, Data Scientist @ Persado | Co-founder of the Data Science blog: