Analyze Pandas Dataframes With OpenAI and LlamaIndex

Did you know that you can pass a pandas data frame into OpenAI

George Pipis
3 min readMay 4, 2023
Image generated by DALL-E

LlamaIndex is used to connect LLMs with external data. In this tutorial, we will show you how to use the OpenAI GPT-3 text-davinci-003 model to query structured data and more particularly pandas dataframes.

Installation

Using pip you can install the LlamaIndex library as follows:

pip install llama-index

Query Pandas Dataframes with LlamaIndex

The default model is the text-davinci-003 and for this tutorial, we will leave it as is. Before you start, you need to pass your API key as an environment variable called OPENAI_API_KEY. Let's pass the API key and load the required libraries:

# My OpenAI Key
import os
os.environ['OPENAI_API_KEY'] = "INSERT OPENAI KEY"

import pandas as pd
from llama_index.indices.struct_store import GPTPandasIndex

Iris Dataset

For this tutorial, we will work with the famous iris dataset. We will load it from a URL:

csv_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'

# using the attribute information as the column names…

--

--

George Pipis

Sr. Director, Data Scientist @ Persado | Co-founder of the Data Science blog: https://predictivehacks.com/