Photo by aaron boris on Unsplash

In PySpark data frames, we can have columns with arrays. Let’s see an example of an array column. First, we will load the CSV file from S3.

# read the data from the S3
df = spark.read.options(header=True).csv("s3://my-bucket/my_folder/my_file.csv")
# select the Row_Number and Category column
df.select(['Row_Number', 'Category']).show(5)

Photo by Pascal Müller on Unsplash

In Pandas, when we add a new column, it appears at the end of the data frame. However, many times there is a need to add a column at a specific location. Let’s see how we can do it in Pandas by using the insert method.

Insert a Pandas column at the Beginning

Let’s see how we…

George Pipis

Sr. Director, Data Scientist @ Persado | Co-founder of the Data Science blog: https://predictivehacks.com/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store