Programming

A practical example of how you can construct diversified portfolios minimizing the risk using Python and SciPy

Image for post
Image for post
Image by Unsplash

We will show how you can build a diversified portfolio that satisfies specific constraints. For this tutorial, we will build a portfolio that minimizes the risk.

So the first thing to do is to get the stock prices programmatically using Python.

How to Download the Stock Prices using Python

We will work with the package where you can install it using pip install yfinance --upgrade --no-cache-dir You will need to get the symbol of the stock. You can find the mapping between NASDAQ stocks and symbols in this csv file.

For this tutorial, we will assume that we are dealing with the following 10 stocks and we try to minimize the portfolio risk. …


A practical example of how you can calculate Card Probabilities with Monte Carlo Simulation and Numerically

Image for post
Image for post
Photo by Amanda Jones on Unsplash

We are going to show how we can estimate card probabilities by applying Monte Carlo Simulation and how we can solve them numerically in Python. The first thing that we need to do is to create a deck of 52 cards.

How to Generate a Deck of Cards

import itertools, random# make a deck of cards
deck = list(itertools.product(['A', '2', '3', '4', '5', '6', '7', '8', '9', '10', 'J', 'Q', 'K'],['Spade','Heart','Diamond','Club']))
deck

And we get:

[('A', 'Spade'),
('A', 'Heart'),
('A', 'Diamond'),
('A', 'Club'),
('2', 'Spade'),
('2', 'Heart'),
('2', 'Diamond'),
('2', 'Club'),
('3', 'Spade'),
('3', 'Heart'),
('3', 'Diamond'),
('3', 'Club'),
('4', 'Spade'),
('4', 'Heart'),
('4', 'Diamond'),
('4', 'Club'),
('5', 'Spade'),
('5', 'Heart'),
...


Image for post
Image for post

A detailed explanation of Simpson’s Paradox with concrete and reproducible examples

Back in 2001 when I entered university to study Statistics, our professor told us that:

Statistics is a perfect way to tell lies

This “quote” got my attention and I totally agree with that. I can confirm that I have seen many statistical analyses with a totally opposite statistical inference, sometimes the misleading statistical inference is on purpose, and sometimes is because the analyst does not take into consideration all the parameters. A good example of misleading inference that can be generated by misapplied statistics is Simpson’s Paradox which we are going to explain with some examples.

Simpson’s Paradox

Simpson’s paradox is a phenomenon encountered in the field of probability and statistics in which a trend appears in different groups of data but disappears or reverses when we aggregate the data and treat it as a unique group. Below we will represent reproducible examples of Simpson’s Paradox. …


A gentle introduction of how you can check your Grammar and Spelling in Python

Image for post
Image for post
Screenshot from https://languagetool.org/

LanguageTool

LanguageTool is an open-source grammar tool, also known as the spellchecker for OpenOffice. This library allows you to detect grammar errors and spelling mistakes through a Python script or through a command-line interface. We will work with the language_tool_pyton python package which can be installed with the pip install language-tool-python command. By default, language_tool_python will download a LanguageTool server .jar and run that in the background to detect grammar errors locally. However, LanguageTool also offers a Public HTTP Proofreading API that is supported as well but there is a restriction in the number of calls.

LanguageTool in Python

We will provide a practical example of how you can detect your grammar mistakes and also correct them. We will work with the following…


My story of becoming a Data Science Blogger

Image for post
Image for post
Image by Author

A programming language is as good as its community

Μy Background

My Studies

Back in 2001, I entered university to study Statistics. During my first year, I ran my first regression model in Minitab and a year later I wrote my first lines of code in R / S-Plus and I still remember my frustration to import the data using the command read.table. At that time, Wikipedia was not existing and neither was StackOverflow🤨.

Once I received my degree in Statistics, I continued my studies in Financial Mathematics (MSc) and once I graduated in 2007, I started my professional career.

Ever since then, my philosophy has always been to try to learn something new constantly. I couldn’t imagine myself not studying anymore, so I decided to continue my studies starting with a BSc in Mathematics. At that time there weren’t any MOOCs like Coursera etc. …


Image for post
Image for post
Image By Predictive Hacks

Natural Language Processing

A High-Level Introduction to Word Embeddings in Plain English

A common representation of words

The most common representation of words in NLP tasks is the One Hot Encoding. Although this approach has been proven to be effective in many NLP models, it has some drawbacks:

  • The encodings are arbitrary.
  • This approach leads to data sparsity with many zeros.
  • It doesn’t provide any relation between words.

Below we can see an example of One Hot Encoding for the words “Cat” and “Dog”. As we can see, these two vectors are independent since their inner product is 0, and their Euclidean distance is the square root of 2. …


An example of how you can count efficiently the cumulative distinct values in R and Python

Image for post
Image for post

Sometimes there is a need to do a rolling count of the distinct values of a list/vector. In other words, we want to add up only any new element that appears in our list/vector. Below is an example of how we can easily do it in R and Python.

Cumulative Count Distinct in R

# assume that this is our vectorx=c("e", "a","a","b","a","b","c", "d", "e")# we apply the "cumsum(!duplicated(x))" commanddata.frame(Vector=x,CumDistinct=cumsum(!duplicated(x)))
Image for post
Image for post

Cumulative Count Distinct in Python

import pandas as pddf = pd.DataFrame({'mylist':["e", "a","a","b","a","b","c", "d", "e"]})df['CumDistinct'] = (~df.mylist.duplicated()).cumsum()df
Image for post
Image for post

We could use the apply as follows:

df['CumDistinct'] = df.mylist.apply(lambda x: (~pd.Series(x).duplicated()).cumsum())

Alternatively, we can use list comprehension as follows:

df = pd.DataFrame({'mylist':["e", "a","a","b","a","b","c", "d", "e"]})df['CumDistinct']=[len(set(df['mylist'][:i])) for i,j in enumerate(df['mylist'], 1)]df
Image for post
Image for post


Image for post
Image for post
Image on Predictive Hacks

How to Count the Consecutive Events in R and Python

When we are dealing with Financial Assets, Sports Analytics, Gambling Games etc, usually there is a need to keep track of the consecutive events, called streaks. For instance:

  • For how many consecutive days the Stock X has closed with a positive sign
  • For how many games in a row, the Team A has scored at least one goal and so on.

We will show how you can easily calculate the consecutive events in both R and Python.

Consecutive Events in R

Assume that there is a Roulette Wheel which returns Red (50%) and Black (50%). We are going to simulate N=1,000,000 Rolls and keep track of the streaks of Red and Black respectively. The R function which makes our life easier is the rle but if we want to track the running streak, then we need also to use the seq function. …


Image for post
Image for post
Image on Unsplash

Programming, R

An example of how you can easily build advanced stacked ensemble models in R using with H2O package

This post will show you how you easily apply Stacked Ensemble Models in R using the H2O package. The models can treat both Classification and Regression problems. For this example, we will apply a classification problem using the Breast Cancer Wisconsin dataset, which can be found here.

Description of the Stacked Ensemble Models

The steps below describe the individual tasks involved in training and testing a Super Learner ensemble. H2O automates most of the steps below so that you can quickly and easily build ensembles of H2O models.

a) Set up the ensemble

  • Specify a list of L base algorithms (with a specific set of model parameters). …


A walk-through example of how you can reshape pandas data frames

Image for post
Image for post
Image on Unsplash

We will provide some examples of how we can reshape Pandas data frames based on our needs. We want to provide a concrete and reproducible example and for that reason, we assume that we are dealing with the following scenario.

We have a data frame of three columns such as:

  • ID: The UserID
  • Type: The type of product.
  • Value: The value of the product, like ‘H’ for High, ‘M’ for Medium and ‘L’ for Low

The data are in a long format, where each case is one row. Let’s create the data frame:

Create the Pandas Data Frame

import pandas as pddf = pd.DataFrame({'ID':[1,1,1,1,2,2,3,3,3,4],
'Type':['A','B','C','E','D','A','E','B','C','A']…

About

George Pipis

Data Scientist @ Persado | Co-founder of the Data Science blog: https://predictivehacks.com/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store