Pandas can convert datetime frequencies by resampling. This allows you to turn per-second data into 5-minutely data. This is a very common task when you're dealing with timeseries. So let's explore how this works.

## Timeseries

Let's start by making a subset of our data.

```
subset_df = (df
.assign(date=lambda d: pd.to_datetime(d['date'], format="%Y-%m-%d"))
[['state', 'date', 'births']]
.loc[lambda d: d['state'] == 'CA'])
```

This dataframe represents the births in the state of California. Here's the top 5 rows.

state | date | births |
---|---|---|

CA | 1969-01-01 00:00:00 | 824 |

CA | 1969-01-02 00:00:00 | 816 |

CA | 1969-01-03 00:00:00 | 940 |

CA | 1969-01-04 00:00:00 | 906 |

CA | 1969-01-05 00:00:00 | 804 |

If you're interested in calculating aggregates here you could *could* generate
a grouping-feature, like year, pass it in a group-by and aggregate. There is
a more convenient method though, which involves using the `.resample`

method.

## Using `resample`

To use `.resample()`

you'll need to make sure that the dataframe has an index
that's a datetime column first. Then you'll be able to call resample, which
acts kind of like a group-by but has a convenient string-syntax to declare
time windows. After that you'll be able to call an aggregation method to summarise
the data.

### Calculate the sum per year.

This example calculates the total birhts per year in California.

```
subset_df.set_index('date').resample('Y').sum()
```

date | births |
---|---|

1969-12-31 00:00:00 | 352858 |

1970-12-31 00:00:00 | 362682 |

1971-12-31 00:00:00 | 329816 |

1972-12-31 00:00:00 | 306538 |

1973-12-31 00:00:00 | 298062 |

### Calculate the average per two weeks.

This example calculates the average births every two weeks in California. Note that we're able to use a number here to indicate we're dealing with two weeks instead of one!

```
subset_df.set_index('date').resample('2W').sum()
```

date | births |
---|---|

1969-01-05 00:00:00 | 858 |

1969-01-19 00:00:00 | 924.429 |

1969-02-02 00:00:00 | 913.143 |

1969-02-16 00:00:00 | 923.714 |

1969-03-02 00:00:00 | 879.867 |

## Together with Groupby

In the previous example we calculated statistics but we only calculated statistics
for the state of California. We could combine the use of `.resample`

with `.groupby`

to calculate these numbers for all states. If you do this, remember that you can
only use `.resample()`

after `.groupby()`

, not the other way around.

```
(df[['state', 'date', 'births']]
.assign(date=lambda d: pd.to_datetime(d['date'], format='%Y-%m-%d'))
.set_index('date')
.groupby('state')
.resample('2W').sum())
```

The top 5 rows of this query look like:

state | date | births |
---|---|---|

AK | 1969-01-05 00:00:00 | 88 |

AK | 1969-01-19 00:00:00 | 246 |

AK | 1969-02-02 00:00:00 | 246 |

AK | 1969-02-16 00:00:00 | 244 |

AK | 1969-03-02 00:00:00 | 260 |

The bottom 5 of this table look like:

state | date | births |
---|---|---|

WY | 1988-11-13 00:00:00 | 243 |

WY | 1988-11-27 00:00:00 | 223 |

WY | 1988-12-11 00:00:00 | 271 |

WY | 1988-12-25 00:00:00 | 238 |

WY | 1989-01-08 00:00:00 | 114 |

## Offsets in Resample

There are many ways to group dates together using the string-like syntax in resample. The table below shows a subset of all possible offsets.

Alias | Description |
---|---|

B | business day frequency |

D | calendar day frequency |

W | weekly frequency |

M | month end frequency |

Q | quarter end frequency |

H | hourly frequency |

T, min | minutely frequency |

S | secondly frequency |

L, ms | milliseconds |

U, us | microseconds |

N | nanoseconds |

This table represents a subset of all offsets found on the pandas documentation

## Summary Methods in Resample

Typically, you'll be interested in calculating the `.mean()`

or the `.sum()`

. It's
good to know though that many other methods are also supported.

Method | Description |
---|---|

`.count()` |
Compute count of group, excluding missing values. |

`.nunique()` |
Return number of unique elements in the group. |

`.first()` |
Compute first of group values. |

`.last()` |
Compute last of group values. |

`.max()` |
Compute max of group values. |

`.mean()` |
Compute mean of groups, excluding missing values. |

`.median()` |
Compute median of groups, excluding missing values. |

`.min()` |
Compute min of group values. |

`.size()` |
Compute group sizes. |

`.sum()` |
Compute sum of group values. |

`.quantile(q)` |
Return value at the given quantile. |

A full list of descriptive statistics can be found on the pandas documentation.

#### Caveat

There is also a more general .agg() method which is similar to the `.agg()`

method
you may know from `.groupby`

-objects but there are subtle differences to be aware of.
One of the main difference is that named aggregations are not impemented yet
at the time of writing this document.