# Portfolio Optimization in Python

## A practical example of how you can construct diversified portfolios minimizing the risk using Python and SciPy

We will show how you can build a diversified portfolio that satisfies specific constraints. For this tutorial, we will build a portfolio that minimizes the risk.

So the first thing to do is to get the stock prices programmatically using Python.

We will work with the package where you can install it using `pip install yfinance --upgrade --no-cache-dir` You will need to get the symbol of the stock. You can find the mapping between NASDAQ stocks and symbols in this csv file.

For this tutorial, we will assume that we are dealing with the following 10 stocks and we try to minimize the portfolio risk. …

# Estimate Probabilities of Card Games

## A practical example of how you can calculate Card Probabilities with Monte Carlo Simulation and Numerically

We are going to show how we can estimate card probabilities by applying Monte Carlo Simulation and how we can solve them numerically in Python. The first thing that we need to do is to create a deck of 52 cards.

How to Generate a Deck of Cards

`import itertools, random# make a deck of cardsdeck = list(itertools.product(['A', '2', '3', '4', '5', '6', '7', '8', '9', '10', 'J', 'Q', 'K'],['Spade','Heart','Diamond','Club']))deck`

And we get:

`[('A', 'Spade'), ('A', 'Heart'), ('A', 'Diamond'), ('A', 'Club'), ('2', 'Spade'), ('2', 'Heart'), ('2', 'Diamond'), ('2', 'Club'), ('3', 'Spade'), ('3', 'Heart'), ('3', 'Diamond'), ('3', 'Club'), ('4', 'Spade'), ('4', 'Heart'), ('4', 'Diamond'), ('4', 'Club'), ('5', 'Spade'), ('5', 'Heart'),... …`

## A detailed explanation of Simpson’s Paradox with concrete and reproducible examples

Back in 2001 when I entered university to study Statistics, our professor told us that:

Statistics is a perfect way to tell lies

This “quote” got my attention and I totally agree with that. I can confirm that I have seen many statistical analyses with a totally opposite statistical inference, sometimes the misleading statistical inference is on purpose, and sometimes is because the analyst does not take into consideration all the parameters. A good example of misleading inference that can be generated by misapplied statistics is Simpson’s Paradox which we are going to explain with some examples.

Simpson’s paradox is a phenomenon encountered in the field of probability and statistics in which a trend appears in different groups of data but disappears or reverses when we aggregate the data and treat it as a unique group. Below we will represent reproducible examples of Simpson’s Paradox. …

# LanguageTool

LanguageTool is an open-source grammar tool, also known as the spellchecker for OpenOffice. This library allows you to detect grammar errors and spelling mistakes through a Python script or through a command-line interface. We will work with the language_tool_pyton python package which can be installed with the `pip install language-tool-python` command. By default, `language_tool_python` will download a LanguageTool server `.jar` and run that in the background to detect grammar errors locally. However, LanguageTool also offers a Public HTTP Proofreading API that is supported as well but there is a restriction in the number of calls.

# LanguageTool in Python

We will provide a practical example of how you can detect your grammar mistakes and also correct them. We will work with the following…

# My Journey as a Data Science Blogger

## My story of becoming a Data Science Blogger

A programming language is as good as its community

# Μy Background

## My Studies

Back in 2001, I entered university to study Statistics. During my first year, I ran my first regression model in Minitab and a year later I wrote my first lines of code in R / S-Plus and I still remember my frustration to import the data using the command `read.table`. At that time, Wikipedia was not existing and neither was StackOverflow🤨.

Once I received my degree in Statistics, I continued my studies in Financial Mathematics (MSc) and once I graduated in 2007, I started my professional career.

Ever since then, my philosophy has always been to try to learn something new constantly. I couldn’t imagine myself not studying anymore, so I decided to continue my studies starting with a BSc in Mathematics. At that time there weren’t any MOOCs like Coursera etc. …

# A common representation of words

The most common representation of words in NLP tasks is the One Hot Encoding. Although this approach has been proven to be effective in many NLP models, it has some drawbacks:

• The encodings are arbitrary.
• This approach leads to data sparsity with many zeros.
• It doesn’t provide any relation between words.

Below we can see an example of One Hot Encoding for the words “Cat” and “Dog”. As we can see, these two vectors are independent since their inner product is 0, and their Euclidean distance is the square root of 2. …

# Cumulative Count Distinct Values

## An example of how you can count efficiently the cumulative distinct values in R and Python

Sometimes there is a need to do a rolling count of the distinct values of a list/vector. In other words, we want to add up only any new element that appears in our list/vector. Below is an example of how we can easily do it in R and Python.

## Cumulative Count Distinct in R

`# assume that this is our vectorx=c("e", "a","a","b","a","b","c", "d", "e")# we apply the "cumsum(!duplicated(x))" commanddata.frame(Vector=x,CumDistinct=cumsum(!duplicated(x)))`

## Cumulative Count Distinct in Python

`import pandas as pddf = pd.DataFrame({'mylist':["e", "a","a","b","a","b","c", "d", "e"]})df['CumDistinct'] = (~df.mylist.duplicated()).cumsum()df`

We could use the `apply` as follows:

`df['CumDistinct'] = df.mylist.apply(lambda x: (~pd.Series(x).duplicated()).cumsum())`

Alternatively, we can use list comprehension as follows:

`df = pd.DataFrame({'mylist':["e", "a","a","b","a","b","c", "d", "e"]})df['CumDistinct']=[len(set(df['mylist'][:i])) for i,j in enumerate(df['mylist'], 1)]df`

# Count of the Consecutive Events

## How to Count the Consecutive Events in R and Python

When we are dealing with Financial Assets, Sports Analytics, Gambling Games etc, usually there is a need to keep track of the consecutive events, called streaks. For instance:

• For how many consecutive days the Stock X has closed with a positive sign
• For how many games in a row, the Team A has scored at least one goal and so on.

We will show how you can easily calculate the consecutive events in both R and Python.

# Consecutive Events in R

Assume that there is a Roulette Wheel which returns Red (50%) and Black (50%). We are going to simulate N=1,000,000 Rolls and keep track of the streaks of Red and Black respectively. The R function which makes our life easier is the rle but if we want to track the running streak, then we need also to use the seq function. …

# How To Build Stacked Ensemble Models In R

## An example of how you can easily build advanced stacked ensemble models in R using with H2O package

This post will show you how you easily apply Stacked Ensemble Models in R using the H2O package. The models can treat both Classification and Regression problems. For this example, we will apply a classification problem using the Breast Cancer Wisconsin dataset, which can be found here.

# Description of the Stacked Ensemble Models

The steps below describe the individual tasks involved in training and testing a Super Learner ensemble. H2O automates most of the steps below so that you can quickly and easily build ensembles of H2O models.

a) Set up the ensemble

• Specify a list of L base algorithms (with a specific set of model parameters). …

# Reshape Pandas Data Frames

## A walk-through example of how you can reshape pandas data frames

We will provide some examples of how we can reshape Pandas data frames based on our needs. We want to provide a concrete and reproducible example and for that reason, we assume that we are dealing with the following scenario.

We have a data frame of three columns such as:

• ID: The UserID
• Type: The type of product.
• Value: The value of the product, like ‘H’ for High, ‘M’ for Medium and ‘L’ for Low

The data are in a long format, where each case is one row. Let’s create the data frame:

# Create the Pandas Data Frame

`import pandas as pddf = pd.DataFrame({'ID':[1,1,1,1,2,2,3,3,3,4],                    'Type':['A','B','C','E','D','A','E','B','C','A']…`