There are many possible date formats and some of them are ambiguous. To avoid any confusion, we recommend your dates include all four digits of the year (2001, 2011 etc.) and spell out the month (Jan, Feb, Mar etc.).

Ambiguous dates in a csv file can be fixed by reading the file into Excel. Select the column, go to number formats, choose an unambiguous format, and save the file again as csv. An example of an unambiguous format is:

“2-FEB-2008 01:00:00 AM”

This can be achieved by using the following custom format: “dd-mmm-yyyy h:mm:ss AM/PM”. If your date does not include time you can use “dd-mmm-yyyy” instead. You can check that this worked correctly by opening up your new csv file in a text editor such as Notepad. (If you open it up again in Excel, it may look like nothing changed).

If your uploaded data file contains dates, please review the data profiling report we provide. Look at variables–your date variable–toggle details–histogram and check that the distribution of dates is what you expect.

Dates are important to outlier detection because they provide context e.g. there is more shopping in December than other months. Dates also help us identify time series, which require their own set of algorithms.

If you provide dates that are ambiguous, there is a risk that we will misinterpret them. An example is “01-02-08 01:00:00”. We will assume this is 1 February, 2008 at 1:00 AM. i.e. we assume that the first digit of the date is the day and the last digit is the year. If AM or PM is not specified, we assume it is a 24 hour clock. Please remove this risk by formatting your dates unambiguously.

Copyright © 2020 Penny Analytics Limited All rights reserved.