The GPT-4o model has vision capabilities that enable us to answer questions about the images. Using the ChatGPT interface, you can upload the image and ask a question:
Press enter or click to view image in full size
Python SKD
That’s cool, but it would be great if we could do the same task programmatically using the Python API. For this example, we assume that the user gets the images locally.
from openai import OpenAI
import base64 import requests import os # OpenAI API Key api_key = os.environ['OPENAI_API_KEY']
# Function to encode the image def encode_image(image_path): with open(image_path, "rb") as image_file: return base64.b64encode(image_file.read()).decode('utf-8')
# Path to your image image_path = "images/dogs/image_1.jpg"
# Getting the base64 string base64_image = encode_image(image_path)