Photo by aaron boris on Unsplash

In PySpark data frames, we can have columns with arrays. Let’s see an example of an array column. First, we will load the CSV file from S3.

# read the data from the S3
df ="s3://my-bucket/my_folder/my_file.csv")
# select the Row_Number and Category column['Row_Number', 'Category']).show(5)

Photo by Pascal Müller on Unsplash

In Pandas, when we add a new column, it appears at the end of the data frame. However, many times there is a need to add a column at a specific location. Let’s see how we can do it in Pandas by using the insert method.

Insert a Pandas column at the Beginning

Let’s see how we…

George Pipis

Sr. Director, Data Scientist @ Persado | Co-founder of the Data Science blog:

