An efficient way to create balanced datasets in Python

Image on Unsplash

We have provided examples of how you can Resample Data By Groups in Python and how you do Undersampling by Groups in R. In this post, we will provide you an efficient way of how you can create balanced datasets by being able to take into consideration more than one variable. Let’s start by creating our “unbalanced” dataset with the following characteristics:

Pair Coding Tools are really helpful for interviews if you want to assess the coding skills of the candidates

Image by Wikipedia


I work as a Data Scientist and I have interviewed many Data Scientists. Also, I have been interviewed several times by other Data Scientists for Data Science positions.

One of the most difficult parts of the interview process is to assess the coding skills of the candidate. Personally, I share a Technical Assessment with the candidates where I have the opportunity to assess their coding style, skills etc as well as their Data Science skills.

However, sometimes there is a need to assess candidates’ coding skills during the interview process. …

My experience with an Asynchronous Job Interview and why I believe that it will be the future

Image on Unsplash

My Experience with Interviews

I work in big companies since 2007 and in my career, I have had more than 1000 interviews. I have been interviewed by tech giants and marker leaders like Google, Facebook Spotify, PwC etc and as a freelancer, I have been interviewed by companies and individuals, including students for tutoring etc.

In addition, I have given many interviews, by having different roles, including phone screening, screening, peer assessment and hiring manager.

In my opinion, the interview process is very important, but also very consuming. I believe that now due to the new norm of Covid-19, companies tend to be more…

A walk-through example of how you can use AWS Data Wrangler to interact with S3, Glue and Athena

Image on Unsplash

In the previous posts, we have provided examples of how to interact with AWS using Boto3, how to interact with S3 using AWS CLI, how to work with GLUE and how to run SQL on S3 files with AWS Athena.

Did you know that we can do all these things that we mentioned above using the AWS Data Wrangler? Let’s provide some walk-through examples.

AWS Data Wrangler

AWS Data Wrangler is an AWS Professional Service open-source python initiative that extends the power of Pandas library to AWS connecting DataFrames and AWS data-related services. …

10 Useful Snippet Code Tips in Python and R

We have started a series of articles on tips and tricks for data scientists (mainly in Python and R). In case you have missed:


1.How to Get The Key of the Maximum Value in a Dictionary

d={"a":3,"b":5,"c":2}(max(d, key=d.get))

We get:


2.How to Sort a Dictionary by Values

Assume that we have the following dictionary and we want to sort it by values (assume that the values are numeric data type).

d={"a":3,"b":5,"c":2}# sort it by valuedict(sorted(d.items(), key=lambda item: item[1]))

We get:

{'c': 2, 'a': 3, 'b': 5}

If we want to sort it in descending order:

dict(sorted(d.items(), key=lambda item: item[1], reverse=True))

We get:

{'b': 5, 'a': 3…

An example of how you can estimate the probability of each team to Win the Euro 2020

In a previous post, we built a Predictive Model based on FIFA Ranking and making the assumption that the points follow a normal distribution. If we look closer at FIFA’s Ranking Model we will see that it is based on the ELO System where the expected result of the game can be extracted from the following formula:

Simulate the Final-16 Phase Based on the Expected Result

A walk-through example of how you can estimate the odds of Euro 2020 Games

We will provide an example of how you can estimate the outcome of a Euro 2020 Game based on FIFA World Ranking. The current calculation method applied on 10 June 2018 and is based on the Elo rating system and after each game points will be added to or subtracted from a team’s rating according to the formula:

The Expected Result of a Game

The expected result of a Game is given by the following formula:

An example of object detection with bounding boxes in Python using the cvlib library

In this post, we will provide you an example of object detection with bounding boxes in Python using the cvlib library which is a simple, high-level, easy-to-use open-source Computer Vision library for Python.

How to install cvlib library

You can pip install cvlib provided that you have already installed the OpenCV and the Tensorflow, otherwise, you can pip install as follows:

pip install opencv-python tensorflowpip install cvlib

Object Detection

cvlib library has a function called detect_common_objects() which returns the detected labels, the bounding box co-ordinates and the confidence scores for the detected…

What it needs to get 100$ from a single story on Medium

A Few Words about me

I entered the Medium Partner Program in September 2020. Since then, I have published 153 stories. In this post, I would like to share some stats and thoughts of my most successful story.

My most successful story

Honestly, I cannot understand what makes a story to be successful. I have written ~150 stories mainly about Data Science, and the How to Run the Chi-Square Test in Python is the one with the highest daily traffic. If you ask my opinion, I would expect this story by my best one since I have written more interesting and advanced articles.

This story has generated $100 in…

An attempt to estimate the Winner of Euro 2020


We have reached the knock-out phase of Euro 2020 (or 2021) where the final-16 teams and the games can be shown below:

George Pipis

Data Scientist @ Persado | Co-founder of the Data Science blog:

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store