How to read CSV in Pandas with Column Names or Regular Expressions

A practical example of how to load columns based on columns names or using regex

George Pipis

--

A common scenario is to work with massive datasets consisting of many columns which are not needed for our analysis. Usually, we tend to load the whole dataset and then keep the columns of our interest. Let’s see a more efficient way to load only the columns of our interest.

For this tutorial, we will work with the following dataset:

import pandas as pd
import re

pd.read_csv('example.csv')

With the following column names:

['campaign_id', 'user_id', 'cdate', 'att_gander', 'att_agegroup','variant', 'response']

How to read a CSV by keeping only specific column names

Assuming that we want to load only three columns, such as the user_id, the variant and the response. In the read_csv function, there is a parameter called usecols which will be used in all the examples below.

usecols : list-like or callable, optional
Return…

--

--

George Pipis

Sr. Director, Data Scientist @ Persado | Co-founder of the Data Science blog: https://predictivehacks.com/