In a previous post, we provided an example of how to load data from S3 to Snowflake. Data Scientists and Data Engineers are very familiar with Python and Pandas Data Frames, so it is essential to be able to connect Snowflake with Python. In this tutorial, we will show you how to get data from Snowflake in your local environment in Python.
For this tutorial, we have created a database called
GPIPIS_DB where there is a table called
In this tutorial, we will show you how to create several tables in Redshift Spectrum from data stored in S3. Finally, we will perform queries on the tables that we have created. Note that Redshift Spectrum is similar to Athena, since both services are for running SQL queries on S3 data.
The first thing that we need to do is to go to Amazon Redshift and create a cluster. In my case, the Redshift cluster is running.
Let’s see how to create an EMR Cluster on AWS. Assuming that you have the required access (IAM Roles) we follow the next steps.
Once you have logged in to the console, you can go to the services and search for EMR.
Once you click on the EMR, you can click on the “Create cluster“.
We have provided an example of How to Query S3 Objects With S3 Select via console. In this post, we will show you how you can filter large data files using the S3 Select via the Boto3 SDK.
Assume that we have a large file (can be csv, txt, gzip, json etc) stored in S3, and we want to filter it based on some criteria. For example, we want to get specific rows or/and specific columns. Let’s see how we can do it with
S3 Select using
We have provided examples of how to interact with S3 using AWS CLI. In this post, we will show how we can synchronize our local directory with S3. Assume that in our local directory we have the following files.
We will provide a walk-through tutorial of the “Data Science Pipeline” that can be used as a guide for Data Science Projects. We will consider the following phases:
The Regression models involve the following components:
We can analyze different scientific studies that address the same question by applying a meta-analysis. The assumption is that every individual study contains some degree of error. For example, a study could be the mortality rate of two treatments for a specific disease. The goal is to obtain pooled summary estimates from individual studies by taking into consideration the heterogeneity among individual studies. The aggregated data from the individual studies leads to higher statistical power.
I am a Millennial, born in 80s, so I do not belong to the group of people who used to send voice messages via WhatsApp, Viber etc. My generation still prefers to send text messages. However, in my network, there are people who choose to communicate with me via Voice Messages, that I do not find convenient and especially when the messages are loooong.