We will show how you can build a diversified portfolio that satisfies specific constraints. For this tutorial, we will build a portfolio that **minimizes the risk**.

So the first thing to do is to get the stock prices programmatically using Python.

We will work with the package where you can install it using `pip install yfinance --upgrade --no-cache-dir`

You will need to get the symbol of the stock. You can find the mapping between NASDAQ stocks and symbols in this csv file.

For this tutorial, we will assume that we are dealing with the following 10 stocks and we try to minimize the portfolio risk. …

We are going to show how we can estimate card probabilities by applying Monte Carlo Simulation and how we can solve them numerically in **Python**. The first thing that we need to do is to create a deck of 52 cards.

**How to Generate a Deck of Cards**

importitertools, random# make a deck of cards

deck=list(itertools.product(['A', '2', '3', '4', '5', '6', '7', '8', '9', '10', 'J', 'Q', 'K'],['Spade','Heart','Diamond','Club']))deck

And we get:

`[('A', 'Spade'),`

('A', 'Heart'),

('A', 'Diamond'),

('A', 'Club'),

('2', 'Spade'),

('2', 'Heart'),

('2', 'Diamond'),

('2', 'Club'),

('3', 'Spade'),

('3', 'Heart'),

('3', 'Diamond'),

('3', 'Club'),

('4', 'Spade'),

('4', 'Heart'),

('4', 'Diamond'),

('4', 'Club'),

('5', 'Spade'),

('5', 'Heart'),

... …

Back in 2001 when I entered university to study Statistics, our professor told us that:

“

Statistics is a perfect way to tell lies”

This “quote” got my attention and I totally agree with that. I can confirm that I have seen many statistical analyses with a totally opposite statistical inference, sometimes the misleading statistical inference is on purpose, and sometimes is because the analyst does not take into consideration all the parameters. A good example of misleading inference that can be generated by misapplied statistics is Simpson’s Paradox which we are going to explain with some examples.

Simpson’s paradox is a phenomenon encountered in the field of probability and statistics in which a trend appears in different groups of data but disappears or reverses when we aggregate the data and treat it as a unique group. Below we will represent reproducible examples of Simpson’s Paradox. …

LanguageTool is an open-source grammar tool, also known as the spellchecker for OpenOffice. This library allows you to detect grammar errors and spelling mistakes through a Python script or through a command-line interface. We will work with the language_tool_pyton python package which can be installed with the `pip install language-tool-python`

command. By default, `language_tool_python`

will download a LanguageTool server `.jar`

and run that in the background to detect grammar errors locally. However, LanguageTool also offers a Public HTTP Proofreading API that is supported as well but there is a restriction in the number of calls.

We will provide a practical example of how you can detect your grammar mistakes and also correct them. We will work with the following…

“A programming language is as good as its community”

Back in 2001, I entered university to study Statistics. During my first year, I ran my first regression model in Minitab and a year later I wrote my first lines of code in R / S-Plus and I still remember my frustration to import the data using the command `read.table`

. At that time, Wikipedia was not existing and neither was StackOverflow🤨.

Once I received my degree in Statistics, I continued my studies in Financial Mathematics (MSc) and once I graduated in 2007, I started my professional career.

Ever since then, my philosophy has always been to try to learn something new constantly. I couldn’t imagine myself not studying anymore, so I decided to continue my studies starting with a BSc in Mathematics. At that time there weren’t any MOOCs like Coursera etc. …

The most common representation of words in NLP tasks is the One Hot Encoding. Although this approach has been proven to be effective in many NLP models, it has some drawbacks:

- The encodings are arbitrary.
- This approach leads to data sparsity with many zeros.
- It doesn’t provide any relation between words.

Below we can see an example of One Hot Encoding for the words “Cat” and “Dog”. As we can see, these two vectors are independent since their inner product is 0, and their Euclidean distance is the square root of 2. …

Sometimes there is a need to do a rolling count of the distinct values of a list/vector. In other words, we want to add up only any **new **element that appears in our list/vector. Below is an example of how we can easily do it in R and Python.

# assume that this is our vectorx=c("e", "a","a","b","a","b","c", "d", "e")# we apply the "cumsum(!duplicated(x))" commanddata.frame(Vector=x,CumDistinct=cumsum(!duplicated(x)))

importpandas as pddf=pd.DataFrame({'mylist':["e", "a","a","b","a","b","c", "d", "e"]})df['CumDistinct']=(~df.mylist.duplicated()).cumsum()df

We could use the `apply`

as follows:

`df['CumDistinct'] = df.mylist.apply(lambda x: (~pd.Series(x).duplicated()).cumsum())`

Alternatively, we can use list comprehension as follows:

df=pd.DataFrame({'mylist':["e", "a","a","b","a","b","c", "d", "e"]})df['CumDistinct']=[len(set(df['mylist'][:i]))fori,jinenumerate(df['mylist'], 1)]df

When we are dealing with Financial Assets, Sports Analytics, Gambling Games etc, usually there is a need to keep track of the consecutive events, called **streaks**. For instance:

- For how many consecutive days the
**Stock X**has closed with a positive sign - For how many games in a row, the
**Team A**has scored at least one goal and so on.

We will show how you can easily calculate the consecutive events in both R and Python.

Assume that there is a Roulette Wheel which returns Red (50%) and Black (50%). We are going to simulate N=1,000,000 Rolls and keep track of the streaks of Red and Black respectively. The R function which makes our life easier is the **rle** but if we want to track the running streak, then we need also to use the **seq **function. …

This post will show you how you easily apply Stacked Ensemble Models in R using the H2O package. The models can treat both **Classification **and **Regression **problems. For this example, we will apply a classification problem using the Breast Cancer Wisconsin dataset, which can be found here.

The steps below describe the individual tasks involved in training and testing a Super Learner ensemble. H2O automates most of the steps below so that you can quickly and easily build **ensembles** of H2O models.

**a) Set up the ensemble**

- Specify a list of L base algorithms (with a specific set of model parameters). …

We will provide some examples of how we can reshape Pandas data frames based on our needs. We want to provide a concrete and reproducible example and for that reason, we assume that we are dealing with the following scenario.

We have a data frame of three columns such as:

**ID**: The UserID**Type**: The type of product.**Value**: The value of the product, like ‘H’ for High, ‘M’ for Medium and ‘L’ for Low

The data are in a long format, where each case is one row. Let’s create the data frame:

import pandas as pddf = pd.DataFrame({'ID':[1,1,1,1,2,2,3,3,3,4],

'Type':['A','B','C','E','D','A','E','B','C','A']…

About