The pandemic has changed our mobility patterns. In this blog, we work with time series data collected by Google and we have drilled down to Toronto, Ontario (home of the CN Tower and also Penny Analytics). The original data is provided by Google to give public health officials insights into the impact of COVID restrictions, and whether restrictions are being heeded. It is based on smartphone location data. The data source is here.
The dataset starts on 15 February 2020 and goes to the end of 2020, so includes about 300 records, one for each day. There are six key metrics:
- retail_and_recreation_percent_change_from_baseline
- grocery_and_pharmacy_percent_change_from_baseline
- parks_percent_change_from_baseline
- transit_stations_percent_change_from_baseline
- workplaces_percent_change_from_baseline
- residential_percent_change_from_baseline
The data is not completely raw, Google have already created baselines from the first five weeks of 2020. So, we can think of these numbers as deviations from the expected pattern.
We uploaded the file to the Penny Analytics website and downloaded our free data profiling report. This report gives us an overview of the data and is a free part of the service. The idea is you can get a good understanding of your data and fix any issues before proceeding to outlier detection. Here is the data profiling report:
Google_mobility_2020_Tor_profiling_report.html
(To enable all features of the data profiling report including toggle details, you will need to download it and open it from there.)
Next, we submitted the dataset for outlier detection. In the outlier results, the outliers are ranked from 1 to 10, with 1 being the biggest outlier. Since this data is a time series, it is natural to view the data in date order. We have used Tableau Public for the visualization:
In the visualization we have annotated the six outliers that correspond to public holidays. Google’s baselining did not stretch as far as knowing Canadian holidays! This leaves four outliers, all clustered in early to mid-March, around the onset of the pandemic. We can look at the reason codes for these outliers:
The reason code of “grocery_and_pharmacy_percent_change_from_baseline high” for the 12 and 13 March makes sense – it is panic buying and hoarding of goods. The other two reason codes we are not so sure about, but the system was able to detect that something had changed around 7 and 8 March.
What if your dataset contained several time series? Our Google mobility example contains six time series columns but our outlier detection service can process up to 40 time series columns. Needless to say, 40 time series is really hard to visualize, even with a great tool like Tableau. But remember, our system finds the records/dates that are outliers, and gives the reasons why.
If your use case is for outlier detection in time series, you should also read our NYC taxi blog and put the NYC taxi dataset through our free trial to see how our process works.
The Penny Analytics outlier file for the Google mobility dataset can be downloaded from here:
Google_mobility_2020_Tor_penny_outliers_2507_235
Also, here is the link to our site in Tableau Public.