logo

Datasets

We host a few datasets here that we use in our videos just to make it easy to get started.



monopoly.csv Properties of properties in the popular board game of monopoly.

Download it here;

wget https://calmcode.io/datasets/monopoly.csv


pokemon.json A lightly nested datastructure that has information on imaginary pets.

Originally found on kaggle we serve it as a json file. Download it here;

wget https://calmcode.io/datasets/pokemon.json


smoking.csv Is smoking good for your health? Can we lie with statistics?

Originally found in an R package we serve it as a csv file. Download it here;

wget https://calmcode.io/datasets/smoking.csv

The dataset contains three columns.

  1. outcome: was the person still alive after 20 years
  2. smoker: indicates the number of relevant unit tests passing
  3. age: the total number of unit tests passing

sleep_deprived_coding.csv What effect does sleep deprivation have on programming?

This dataset is part of a research paper that can be found here. We merely consider a subset of the original dataset in the paper for sake of calm. We also renamed a few columns. The dataset contains a test done at a university where students had to implement a `PigLatin` class in 90 minutes. Some students deprived themselves of sleep the night before the exercise. A few unit tests were available that had to be confirmed while another set of unit tests were held back. We also know the GPA of the students. Download it here;

wget https://calmcode.io/datasets/sleep_deprived_coding.csv

The dataset has the following columns.

  1. id: the id of the person in the trial
  2. gpa: grade point average, in Italy it is between 18-30
  3. sleep: the level of sleep the person got
  4. passed_unit_tests: how many unit tests that were held back passed
  5. passed_asserts: how many known unit tests passed
  6. tackled_user_stories: how many user stories were tackled

bigmac.csv How does inflation compare when you measure burgers?

This dataset has been generated by the economist. and was originally retrieved from this repo. The dataset has prices over time of the popular MacDonalds sandwich. The idea is that because you can get it around the world it serves as an interesting economic indicator.

You can download the dataset here or fetch via;

wget https://calmcode.io/datasets/bigmac.csv

The dataset has the following columns.

  1. date: the date of the measurement
  2. currency_code: code for the currency in the country
  3. name: name of the country
  4. local_price: the price of a bigmac in the local currency
  5. dollar_ex: the exchange rate with the local currency and the dollar
  6. dollar_price: the price of the bigmac after converting to dollars

birthdays.csv A datasets that contains births across the united states per state. It was originally found in the mosaic package in R.

Download it here;

wget https://calmcode.io/datasets/birthdays.csv

The dataset has the following columns.

  1. state: the state where the birth happened
  2. year: the year of the birth
  3. month: the month of the year
  4. day: the day of the month
  5. date: the date of the birth
  6. wday: the day of the week
  7. births: the number of births


stigler.csv A dataset that contains nutritional values of household commidities. The dataset was used in the "stigler"-problem where we try to figure out how to minimise the costs of food while still getting enough nutrients. It was originally found on the documentation page of OR Tools.

Download it here;

wget https://calmcode.io/datasets/stigler.csv

The dataset has the following columns.

  1. commodity: name of the commodity
  2. unit: the measure unit of the commodity
  3. price_cents: the 1939 price in cents
  4. calories: calories in the unit of the commodity
  5. protein_g: grams of protein in the unit of the commodity
  6. calcium_g: grams of calcium in the unit of the commodity
  7. iron_mg: miligrams of iron in the unit of the commodity
  8. vitamin_a_iu: vitamin A in the unit of the commodity
  9. vitamin_b1_mg: vitamin B1 in the unit of the commodity
  10. vitamin_b2_mg: vitamin B2 in the unit of the commodity
  11. niacin_mg: niacin in the unit of the commodity
  12. vitamin_c_mg: vitamin C in the unit of the commodity


stocks.csv A subset of stock information downloaded from yahoo finance using the `yfinance` package.

Download it here;

wget https://calmcode.io/datasets/stocks.csv

The dataset has the following columns.

  1. Date: the date
  2. MSFT: the stockprice for MSFT
  3. KLM: the stockprice for KLM
  4. ING: the stockprice for ING
  5. MOS: the stockprice for MOS


drawndata1.csv A dataset created with drawdata

Download it here;

wget https://calmcode.io/datasets/drawndata1.csv

The dataset has the following columns.

  1. x: floating point x value
  2. y: floating point y value
  3. z: a class we'd like to predict


drawndata2.csv A dataset created with drawdata

Download it here;

wget https://calmcode.io/datasets/drawndata2.csv

The dataset has the following columns.

  1. x: floating point x value
  2. y: floating point y value
  3. z: a class we'd like to predict