Some more helpful code snippets in R and Python

Photo by Patrick Hendry on Unsplash.

I have started a series of articles on tips and tricks for data scientists (mainly in Python and R). In case you missed Vol. 1, you can have a look below:


1. How to create files from Jupyter

While working with the Jupyter Notebook, sometimes you need to create a file (e.g. a .py file). Let’s see how we can do it via Jupyter Notebook.

To write a file, we can simply type %%writefile myfile within a Jupyter cell and then start writing the file. For example, the command below will create a new file called

%%writefile myfile.pydef my_function():    print("Hello from a function")

If we…

A practical example of Logistic Regression when we are dealing with summary data instead of binary (0,1)

Image for post
Image for post

We will provide an example of how you can run a logistic regression in R when the data are grouped. Let’s provide some random sample data of 200 observations.

df<-tibble(Gender = as.factor(sample(c("m","f"), 200, replace = TRUE, prob=c(0.6,0.4))),
Age_Group = as.factor(sample(c("[<30]","[30-65]", "[65+]"), 200, replace = TRUE, prob=c(0.3,0.6,0.1))),
Response = rbinom(200, 1, prob = 0.2))


# A tibble: 200 x 3 Gender Age_Group Response <fct> <fct> <int> 1 f [65+] 0 2 m [30-65] 0 3 m [65+] 0 4 m…

Helpful code snippets in R and Python

Wooden handcraft house sign reading “10.”
Wooden handcraft house sign reading “10.”
Photo by Markus Spiske on Unsplash


As data scientists, we love to do our job efficiently without reinventing the wheel. Tips-and-tricks articles provide snippets of code for common tasks in the data science world. In this article, we’ll cover mainly Python and R, as well as other tips in Unix, Excel, Git, Docker, Google Spreadsheets, etc.

Don’t miss the Tips and Tricks Vol.2


1. How to sort a list of tuples by element

Let’s say I have the following list:

l = [(1,2), (4,6), (5,1), (1,0)]


[(1, 2), (4, 6), (5, 1), (1, 0)]

And I want to sort it by the second element of the tuple:

sorted(l, key=lambda t: t[1])


[(1, 0)…

A practical example of how you can split a Userbase using the modulo function in R

In many cases, there is a need to split a userbase into 2 or more buckets. For example:

  • UCG: Many companies that run promotional campaigns, in order to quantify and evaluate the performance of the campaigns, create a Universal Control Group (UCG) which is a random sample of the userbase and does not receive any offer or message.
  • Bucketize: For testing purposes, it is common to split the userbase into buckets so that to be able to compare them in a long term.
  • Samples for Machine Learning: A userbase can become too large for a machine learning model to run…

A quick walk-through example for sharing your Notebooks

Image of a whale
Image of a whale
Image on Unsplash

As data scientists, we want our work to be reproducible, meaning that when we share our analysis everyone should be able to re-run it and come up with the same results. This is not always easy, since we are dealing with different operating systems (iOS, Windows, Linux) and different programming language versions and packages. That is why we encourage you to work with virtual environments like conda environments. Another more robust solution from conda environments is to work with Dockers.

Scenario: We have run an analysis using Python Jupyter Notebooks on our own data, and we want to share this…

Deploy SSH keys correctly

USB key
USB key
Photo by Brina Blum on Unsplash.

When working with Git and GitHub, you can interact with HTTPS or SSH. Today, we will provide a tutorial on how you can deploy an SSH key to your GitHub repository.

Image for post
Image for post

Programming, R

A hands-on tutorial of how to share your python Flask APIs with R Shiny

As a Data Scientist, you may work in both R and Python and it is common to prefer one language over the other for some specific tasks. For example, you may prefer R for Statistics, Data Cleansing and Data Visualizations and you may prefer Python for NLP tasks and Deep Learning. Also, when it comes to Restful APIs, Python Flask APIs have an advantage over R Plumber and Restrserve.

The Scenario

Assume that you have built a model in Python and on top of that, you have built a Flask API. Regarding the UI, you prefer to work with Shiny. So, the…

Building and deploying a Python Flask API on AWS Elastic Beanstalk

Diagram explaining how Elastic Beanstalk works
Diagram explaining how Elastic Beanstalk works
Photo from AWS.

In previous articles, we provided examples of how to build a Flask Rest API, how to build and deploy a machine learning web app, and how to deploy a Flask API with Digital Ocean. Today, we will provide a hands-on example of how to deploy Flask applications of machine learning models on AWS with Elastic Beanstalk.

Use Case

Let’s assume that you work as a data scientist and you built a machine learning model that you want to share with other people. The most common way to share your model is with a Flask Restful API, but you will need a server…

Statistics, R

A practical example of Rolling Regression in R with applications in Pairs Trading

Image for post
Image for post

In a previous post, we have provided an example of Rolling Regression in Python to get the market beta coefficient. We have also provided an example of pairs trading in R. In this post, we will provide an example of rolling regression in R working with the rollRegres package. We will provide an example of getting the beta coefficient between two co-integrated stocks in a rolling window of n observations.

What is a Rolling Regression

The rolling regression is simply a dynamic regression within a rolling moving window. Assuming that we have 5 observations and a rolling window of 3 observations. …

Machine Learning

Example of getting the Market Beta Coefficients of stocks by running rolling regression in Python

Image for post
Image for post
Image on Unsplash

In finance, a measure of asset movements against the market is the market beta β. It is a popular measure of the contribution of stock to the risk of the market portfolio and that is why is referred to as an asset’s non-diversifiable risk, its systematic risk, market risk, or hedge ratio.

Interpretation of Market Beta

The weighted average of all market-betas with respect to the market index is 1.

  • Beta>1: If a stock has a beta above 1, then it means that its return, on average, moves more than 1 to 1 with the return of the index
  • Beta<1: If a stock has…

George Pipis

Data Scientist @ Persado | Co-founder of the Data Science blog:

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store