Analyze Pandas Dataframes With OpenAI and LlamaIndex
Did you know that you can pass a pandas data frame into OpenAI
LlamaIndex is used to connect LLMs with external data. In this tutorial, we will show you how to use the OpenAI GPT-3 text-davinci-003
model to query structured data and more particularly pandas dataframes.
Installation
Using pip
you can install the LlamaIndex library as follows:
pip install llama-index
Query Pandas Dataframes with LlamaIndex
The default model is the text-davinci-003
and for this tutorial, we will leave it as is. Before you start, you need to pass your API key as an environment variable called OPENAI_API_KEY
. Let's pass the API key and load the required libraries:
# My OpenAI Key
import os
os.environ['OPENAI_API_KEY'] = "INSERT OPENAI KEY"
import pandas as pd
from llama_index.indices.struct_store import GPTPandasIndex
Iris Dataset
For this tutorial, we will work with the famous iris dataset. We will load it from a URL:
csv_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
# using the attribute information as the column names…