How to solve the famous “Birthday Problem” numerically using Monte Carlo Simulation in R


In probability theory, the birthday problem or birthday paradox concerns the probability that, in a set of n randomly chosen people, some pair of them will have the same birthday. In a group of 23 people, the probability of a shared birthday exceeds 50%, while a group of 70 has a 99.9% chance of a shared birthday.

You can calculate explicitly the probability of at least two people have the same birthday between n people by applying the mathematical formulas. Let’s try to estimate these probabilities numerically by applying Monte Carlo simulation. …

A Docker Cookbook for beginners

Underworld Geodynamics Community

Get a list of all of the Docker commands:

docker -h

Management Commands:

  • builder Manage builds
  • config Manage Docker configs
  • container Manage containers
  • engine Manage the docker engine
  • image Manage images
  • network Manage networks
  • node Manage Swarm nodes
  • plugin Manage plugins
  • secret Manage Docker secrets
  • service Manage services
  • stack Manage Docker stacks
  • swarm Manage Swarm
  • system Manage Docker
  • trust Manage trust on Docker images
  • volume Manage volumes

docker image

  • build Build an image from a dockerfile
  • history Show the history of an image
  • import Import the contents from a tarball to create a filesystem image
  • inspect Display detailed information on one or more images

Get Started with OpenCV in Python

In this post, we will provide some examples of what you can do with OpenCV.

Blending Images in OpenCV

We will give a walk-through example of how we can blend images using Python OpenCV. Below we represent the target and the filter images.

Target Image

Filter Image

Hands-on example of how you can generate correlated data in R

Sometimes we need to generate correlated data for exhibition purposes, technical assessments, testing, etc. We have provided a walk-through example of how to generate correlated data in Python using the scikit-learn library. In R, as far as I know, there is not any library that allows us to generate correlated data. For that reason, we will work with the simulated data from the Multivariate Normal Distribution. I would suggest having a look at the variance-covariance matrix and the relationship between correlation and covariance.

Generate Correlated Data

We will generate 1000 observations from the Multivariate Normal Distribution of 3 Gaussians as follows:

  • V1~N(10,1), V2~N(5,1)…

A practical example of how to calculate the Power of Test in One and Two-Sided Hypothesis Testing with Binomial Distribution in R


In this tutorial, we will show how you can get the Power of Test when you apply Hypothesis Testing with Binomial Distribution. Before we provide the example let’s recall that is the Type I, and Type II errors.

Type I error

This is the probability to reject the null hypothesis, given that the null hypothesis is true. This is the level of significance α and in statistics is usually set to 5%

Type II error

This is the probability to accept the null hypothesis, given that the null hypothesis is false. …

My story with a “Luddite” colleague at my first job in 2008


What is a Luddite?

A Luddite can be characterized as a person opposed to new technology or ways of working. The word comes from the “Luddites” that were a secret oath-based organization of English textile workers in the 19th century, a radical faction that destroyed textile machinery as a form of protest.

My First Job

Back in 2008, I got my first job in the banking sector, in the Sale Department as a Sales Analyst. In our department, we were dealing with Car Loans and some of the KPIs that we were monitoring were:

  • Number of Loan Applications from the Car Dealers
  • Number of Approved Loan Applications

An Example of how to Find the Number of Followers per Medium Publication

Medium publications are very important for medium writers. As a writer, if you want to increase your audience you should publish your stories to a Medium publication. As a rule of thumb, you should choose popular publications, and by popular we mean the publications with many followers.

Thanks for the tip, but how can I see the number of followers of a Medium publication?

It is frustrating that there is no easy way to get the number of followers per publication. Let’s go to the “Start it up” which is the publication with the most followers. I type the URL “” and I land on this page:

From there, you cannot see the number of followers. A…

A walk-through tutorial of Docker Volumes


Docker has two main categories of data storage, the persistent and the non-persistent.

Persistent Data Storage

Persistent data storage is the volumes that are decoupled from the containers.


  • Use a volume for persistent data: Create the volume first, then create your container.
  • Mounted to a directory in the container
  • Data is written to the volume
  • Deleting a container does not delete the volume
  • First-class citizens
  • Uses the local driver
  • Third-party drivers: Block storage, File storage, Object storage
  • Storage locations: Linux: /var/lib/docker/volumes/ , Windows: C:\ProgramData\Docker\volumes

Non-Persistent Data Storage


  • Local storage
  • Data that is ephemeral
  • Every container has it
  • Tied to the lifecycle of the contain


Hands-on examples of checking the existence of files and directories

File cabinet
File cabinet
Photo by Maksym Kaharlytskyi on Unsplash.

When building data workflows and machine learning pipelines, we often check for the existence of specific files and directories (folders). In this article, we will provide some hands-on examples of how you can check for files or directories in R, Python, and Bash.

Check for the Existence of a File or Directory in R

For this example, we have created a file called myfile.txt and a directory called my_test_folder.

How to check if a file exists

We can easily check if a file exists with the file.exists() command from the base package. Let's have a look at the following example:

if (file.exists("myfile.txt")) {

print("The file exists")
} else {

print("The file does not exist")

And we get:

Helpful code snippets in R and Python

Office space
Office space
Photo by Nastuh Abootalebi on Unsplash.

We have started a series of articles on tips and tricks for data scientists (mainly in Python and R). In case you missed the previous installments:

Vol. 1:

Vol. 2:





1. How to get the mode from a list

Assume that we have the following list:

mylist = [1,1,1,2,2,3,3]

And we want to get the mode (i.e. the most frequent element). We can use the following trick using the max and the lambda key:

max(mylist, key = mylist.count)

And we get 1 since this was the mode in our list. In the case where there is a draw in the mode and you want to get…

George Pipis

Data Scientist @ Persado | Co-founder of the Data Science blog:

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store