Domino in 10 minutes

In this article, we'll show you how to get started with Domino in less than 10 minutes! We'll use some data about the demographics of New York City.

Step 1

Download this CSV file and this Python script, mean_pop.py, which calculates mean statistics for whatever column you choose in the CSV file.

Step 2

Next, create a new project and upload these files to Domino. 

 

Step 3

Now that you have these files in Domino, go to the "Runs" tab and start a new run using mean_pop.py. Use the command-line argument "PERCENT FEMALE" to calculate the mean value for that column.

mean_pop.py "PERCENT FEMALE"
 

The result is an average of 24% female in each zipcode. That's unexpectedly low, so let's dig deeper in an interactive Jupyter session.

Step 4

Copy/paste these lines of code to follow along with the video below:

import pandas as pd
df = pd.read_csv('Demographic_Statistics_By_Zip_Code.csv')
df[['COUNT FEMALE']].mean()
df[['COUNT MALE']].mean()
 

This says that on average they sampled 7 women and 10 men in each zipcode. That's a pretty small sample relative to the size of New York City, so we can't trust the 24% women we found in step 3. We need to find a different data set.

Don't forget to name and save your Jupyter session! When you hit "Stop", we'll sync the results back to Domino. Then it's back to the drawing board to find a decent data set. Ahh ... the life of a data scientist!

Step 5

You can review the results in the Runs dashboard, and even leave a comment to remind your future self why you didn't use this data set.

 
Was this article helpful?
0 out of 0 found this helpful