I have started a series of articles on tips and tricks for data scientists (mainly in Python and R). In case you missed Vol. 1, you can have a look below:
While working with the Jupyter Notebook, sometimes you need to create a file (e.g. a .py
file). Let’s see how we can do it via Jupyter Notebook.
To write a file, we can simply type %%writefile myfile
within a Jupyter cell and then start writing the file. For example, the command below will create a new file called myfile.py
:
%%writefile myfile.pydef my_function(): print("Hello from a function")
If we…
A practical example of Logistic Regression when we are dealing with summary data instead of binary (0,1)
We will provide an example of how you can run a logistic regression in R when the data are grouped. Let’s provide some random sample data of 200 observations.
library(tidyverse)
set.seed(5)df<-tibble(Gender = as.factor(sample(c("m","f"), 200, replace = TRUE, prob=c(0.6,0.4))),
Age_Group = as.factor(sample(c("[<30]","[30-65]", "[65+]"), 200, replace = TRUE, prob=c(0.3,0.6,0.1))),
Response = rbinom(200, 1, prob = 0.2))df
Output:
# A tibble: 200 x 3 Gender Age_Group Response <fct> <fct> <int> 1 f [65+] 0 2 m [30-65] 0 3 m [65+] 0 4 m…
As data scientists, we love to do our job efficiently without reinventing the wheel. Tips-and-tricks articles provide snippets of code for common tasks in the data science world. In this article, we’ll cover mainly Python and R, as well as other tips in Unix, Excel, Git, Docker, Google Spreadsheets, etc.
Don’t miss the Tips and Tricks Vol.2
Let’s say I have the following list:
l = [(1,2), (4,6), (5,1), (1,0)]
l
Output:
[(1, 2), (4, 6), (5, 1), (1, 0)]
And I want to sort it by the second element of the tuple:
sorted(l, key=lambda t: t[1])
Output:
[(1, 0)…
In many cases, there is a need to split a userbase into 2 or more buckets. For example:
As data scientists, we want our work to be reproducible, meaning that when we share our analysis everyone should be able to re-run it and come up with the same results. This is not always easy, since we are dealing with different operating systems (iOS, Windows, Linux) and different programming language versions and packages. That is why we encourage you to work with virtual environments like conda environments. Another more robust solution from conda environments is to work with Dockers.
Scenario: We have run an analysis using Python Jupyter Notebooks on our own data, and we want to share this…
When working with Git and GitHub, you can interact with HTTPS or SSH. Today, we will provide a tutorial on how you can deploy an SSH key to your GitHub repository.
As a Data Scientist, you may work in both R and Python and it is common to prefer one language over the other for some specific tasks. For example, you may prefer R for Statistics, Data Cleansing and Data Visualizations and you may prefer Python for NLP tasks and Deep Learning. Also, when it comes to Restful APIs, Python Flask APIs have an advantage over R Plumber and Restrserve.
Assume that you have built a model in Python and on top of that, you have built a Flask API. Regarding the UI, you prefer to work with Shiny. So, the…
In previous articles, we provided examples of how to build a Flask Rest API, how to build and deploy a machine learning web app, and how to deploy a Flask API with Digital Ocean. Today, we will provide a hands-on example of how to deploy Flask applications of machine learning models on AWS with Elastic Beanstalk.
Let’s assume that you work as a data scientist and you built a machine learning model that you want to share with other people. The most common way to share your model is with a Flask Restful API, but you will need a server…
In a previous post, we have provided an example of Rolling Regression in Python to get the market beta coefficient. We have also provided an example of pairs trading in R. In this post, we will provide an example of rolling regression in R working with the rollRegres package. We will provide an example of getting the beta coefficient between two co-integrated stocks in a rolling window of n observations.
The rolling regression is simply a dynamic regression within a rolling moving window. Assuming that we have 5 observations and a rolling window of 3 observations. …
In finance, a measure of asset movements against the market is the market beta β. It is a popular measure of the contribution of stock to the risk of the market portfolio and that is why is referred to as an asset’s non-diversifiable risk, its systematic risk, market risk, or hedge ratio.
The weighted average of all market-betas with respect to the market index is 1.
Data Scientist @ Persado | Co-founder of the Data Science blog: https://predictivehacks.com/