Here, we give you an idea what your results file might look like and how to make it more visual in Excel.
For this example, we are using the free trial dataset “Fisher iris data”. This is a classic dataset that contains three species of iris flowers and their measurements, in only 150 records. This file contains a unique integer rownumber in its first column.
The file is small and is eligible for the standard option, in which case the results file will look like this:
If instead, the user chooses the high volume/high security option, their results file will look like this:
Here, the user will need to append the results file to their original data file to create a results file that looks like that of the standard option.
These results are rather plain looking, but here we will provide some tips on making them more visual in Excel.
The iris data is an example of standard dataset i.e. it is not a time series. For datasets like this, we recommend filtering the data in Excel and sorting on the outlier score, which is a rank. Ranks of 1 are the biggest outliers.
If you want to go further, you can use conditional formatting. You do not have to do this, but you may be interested if you are an “Excel enthusiast”. Here is our method:
First, we save the data as an .xlsm file. Then, go to developer->visualbasic->insert module and paste the following code:
Public Function highlight_reasons(column_heading, reason_code) As String If InStr(reason_code, column_heading & " high") > 0 Then highlight_reasons = "high" ElseIf InStr(reason_code, column_heading & " low") > 0 Then highlight_reasons = "low" ElseIf InStr(reason_code, column_heading & " unexpected") > 0 Then highlight_reasons = "unexpected" End If End Function
Now press the save button and exit from the developer section. You should now be back on your data worksheet, and if you go into functions you should be able to find a user defined function called highlight_reasons.
If Excel asks you whether to enable macros, the answer is yes.
Our method is to make a second set of columns on the right hand side of the original columns. We then apply this function to the new columns taking care to use absolute references correctly.
The formula can then be copied to fill the new columns with the diagnostic values.
Then we conditionally format the original columns based on the new columns.
Finally, we hide the new columns from view.
Time series example
For this example, we are going to use the CAD exchange rate dataset. This time series example tracks three exchange rates: the euro, the pound and the US dollar.
Once again, outlier scores of 1 are the biggest outliers. In this case, the data has an order and we want to keep that.
As before, you can use conditional formatting and if you are an “Excel enthusiast” you might want to try this. Our method is the same as for the standard example above – create the.xlsm file, create the custom function and make a second set of columns on the right hand side of the original columns. We use the highlight_reasons function above to fill the new columns with the diagnostic values, following a similar approach to the standard file example (above).
Next we add three more columns on the right hand side and, using a formula, make the values either 1 or 0 depending on whether that column is an outlier or not.
Then we add a final column named “Outliers” and write a formula so that it shows the exchange rate if the record is an outlier, otherwise it is blank. This formula looks a little complicated but that is only because it is possible for more than one column to be an outlier at the same time.
To create a chart, first select the date column and the three original columns. You should get a line chart that looks like this:
Then go to select data and add another series. Add the outliers column. Right hand click this series only and change the series chart type to scatter chart. You should now have a chart that looks like this:
Now, in this dataset example all the column metrics were on similar scale, but is life always like that? What if, instead, we had a dataset with an additional metric such as Canada’s GDP? In that case, we would need to index the data. Here, we have made a copy of our worksheet and renamed it CAD_indexed. In this new worksheet, we changed the formula for the exchange rates to give us an index number starting at 100.
Then we also made a copy of our chart tab and went into its select data->data source and changed the two references to the source worksheet to point to CAD_indexed instead.
This is what our chart looks like now: