The numbers don’t lie, which is why they are so useful in motivating exercise.
Like many people, I use a Garmin sports watch to monitor how much time — spoiler: not enough — I spend directing my body to its evolved purpose of moving about.
Apropos of this, below instructions to wrangle exported CSV data from Garmin Connect into sensible daily summary data.
To download CSV data from Garmin Connect:
- Sign in to your Garmin Connect account
- Click Arrow Icon in the upper left corner to expand the Navigation Bar
- Click Activities
- Click All Activities
- Click Export to CSV located on the right side under the activity list
- Save the file
The default filename is Activities.csv
and I am running Jupyter Lab where this file is the only CSV in the current working directory, and so I can easily grab the file name:
r = !ls *csv
file_name = list(r)[0]
Once that’s done, the usual imports are taken care of:
import pandas as pd
import numpy as np
import datetime
cols = ', '.join([x for x in pd.read_csv(file_name).columns])
print(cols)
The CSV has 31 columns of varying levels of relevance:
Activity Type, Date, Favorite, Title, Distance, Calories, Time, Avg HR, Max HR, Aerobic TE, Avg Run Cadence, Max Run Cadence, Avg Speed, Max Speed, Elev Gain, Elev Loss, Avg Stride Length, Avg Vertical Ratio, Avg Vertical Oscillation, Avg Ground Contact Time, Avg GCT Balance, Training Stress Score®, Grit, Flow, Bottom Time, Min Temp, Surface Interval, Decompression, Best Lap Time, Number of Laps, Max Temp
I am interested in aggregating these daily, and only the core metrics, primarily distance and time.
The first step is to read, do a minimal clean, and have date strings to be used as a key for the groupby
:
df = pd.read_csv(file_name).replace('--', np.nan).replace(',', '', regex=True)
df['Date'] = df.Date.apply(pd.to_datetime)
df['Time'] = pd.to_datetime(df.Time)
df['day'] = df['Date'].dt.strftime('%Y-%m-%d')
The time deltas are also fixed up so that they can be summed:
df['Time'] = (
(df['Time'].dt.hour * 60)
+ (df['Time'].dt.minute)
+ (df['Time'].dt.second / 60)
).map(lambda x: datetime.timedelta(minutes=x))
Dummy values are added so the activities — in this case running or cycling — can be summed:
df = pd.concat([df, pd.get_dummies(df['Activity Type'])], axis=1)
for col in ['Calories', 'Elev Gain', 'Elev Loss']:
df[col] = df[col].apply(pd.to_numeric)
Once all that is done we can aggregate for daily sums:
cols_select = ['Time', 'Distance', 'Calories', 'Cycling', 'Running']
df.groupby('day').agg({col:np.sum for col in cols_select})
Yielding:
day | Time | Distance | Calories | Cycling | Running |
---|---|---|---|---|---|
2020-04-24 | 0:33:44 | 3 | 361 | 0 | 1 |
2020-04-25 | 0:26:35 | 2.46 | 291 | 0 | 1 |
2020-04-27 | 0:37:47 | 5.87 | 274 | 1 | 0 |
2020-04-29 | 0:25:49 | 2.22 | 253 | 0 | 1 |
2020-05-01 | 0:39:29 | 5.85 | 314 | 1 | 0 |
2020-05-02 | 0:20:25 | 1.63 | 172 | 0 | 1 |
2020-05-03 | 0:42:31 | 4.92 | 313 | 1 | 0 |