Pandas is great at dealing with datetimes. Before you can use all the available timeseries features though, you'll need to cast strings into datetime objects that pandas can deal with.
Installation
In this series of videos we'll be using pandas version 1.3.4. You can install this version in your notebook via;
%pip install pandas==1.3.4
Checking the Types
Let's first load in a csv file that contains a date column.
import pandas as pd
df = pd.read_csv("https://calmcode.io/datasets/birthdays.csv")
When you check the types, you can confirm that the "date"
column
is not a datetime compatible column.
df.dtypes
This command returns:
state object
year int64
month int64
day int64
date object
wday object
births int64
dtype: object
At the moment the "date"
column is of type "object". Let's change that.
Example: convert string with pd.to_datetime
You can convert strings to pandas compatible dates via:
df.assign(date=lambda d: pd.to_datetime(d['date']))
This changes the types!
df.assign(date=lambda d: pd.to_datetime(d['date'])).dtypes
This is the result.
state object
year int64
month int64
day int64
date datetime64[ns]
wday object
births int64
dtype: object
Speedup by setting the format
Typically, we recommend setting the format
upfront when you're
casting to a datetime. The reason is that it's typically much
faster, although you may not notice unless there's you're dealing
with a big dataframe.
Here's an example of casting to a datetime with a format.
df.assign(date=lambda d: pd.to_datetime(d['date'], format="%Y-%m-%d"))