Worked examples – Penny Analytics

Here, we give you an idea what your results file might look like and how to make it more visual in Excel.

Standard example

For this example, we are using the free trial dataset “Fisher iris data”. This is a classic dataset that contains three species of iris flowers and their measurements, in only 150 records. This file contains a unique integer rownumber in its first column.

The file is small and is eligible for the standard option, in which case the results file will look like this:

Fisher iris dataset with Penny Analytics results, standard option

If instead, the user chooses the high volume/high security option, their results file will look like this:

Fisher iris dataset with Penny Analytics results, high volume/high security option

Here, the user will need to append the results file to their original data file to create a results file that looks like that of the standard option.

These results are rather plain looking, but here we will provide some tips on making them more visual in Excel.

The iris data is an example of standard dataset i.e. it is not a time series. For datasets like this, we recommend filtering the data in Excel and sorting on the outlier score, which is a rank. Ranks of 1 are the biggest outliers.

Fisher iris dataset with Penny Analytics results, filtered and sorted in Excel

If you want to go further, you can use conditional formatting. You do not have to do this, but you may be interested if you are an “Excel enthusiast”. Here is our method:

First, we save the data as an .xlsm file. Then, go to developer->visualbasic->insert module and paste the following code:

Public Function highlight_reasons(column_heading, reason_code) As String

If InStr(reason_code, column_heading & " high") > 0 Then
highlight_reasons = "high"
ElseIf InStr(reason_code, column_heading & " low") > 0 Then
highlight_reasons = "low"
ElseIf InStr(reason_code, column_heading & " unexpected") > 0 Then
highlight_reasons = "unexpected"
End If

End Function

Now press the save button and exit from the developer section. You should now be back on your data worksheet, and if you go into functions you should be able to find a user defined function called highlight_reasons.

If Excel asks you whether to enable macros, the answer is yes.

Our method is to make a second set of columns on the right hand side of the original columns. We then apply this function to the new columns taking care to use absolute references correctly.

Fisher iris dataset with Penny Analytics results, preparation for conditional formatting in Excel

The formula can then be copied to fill the new columns with the diagnostic values.

Then we conditionally format the original columns based on the new columns.

Fisher iris dataset with Penny Analytics results, conditionally formatted in Excel

Finally, we hide the new columns from view.

Fisher iris dataset with Penny Analytics results, hide columns in Excel

Time series example

For this example, we are going to use the CAD exchange rate dataset. This time series example tracks three exchange rates: the euro, the pound and the US dollar.

Once again, outlier scores of 1 are the biggest outliers. In this case, the data has an order and we want to keep that.

Canadian exchange rate dataset with Penny Analytics results

As before, you can use conditional formatting and if you are an “Excel enthusiast” you might want to try this. Our method is the same as for the standard example above – create the.xlsm file, create the custom function and make a second set of columns on the right hand side of the original columns. We use the highlight_reasons function above to fill the new columns with the diagnostic values, following a similar approach to the standard file example (above).

Canadian exchange rate dataset with Penny Analytics results, preparation for conditional formatting in Excel

Next we add three more columns on the right hand side and, using a formula, make the values either 1 or 0 depending on whether that column is an outlier or not.

Canadian exchange rate dataset with Penny Analytics results, preparation for charting Excel

Then we add a final column named “Outliers” and write a formula so that it shows the exchange rate if the record is an outlier, otherwise it is blank. This formula looks a little complicated but that is only because it is possible for more than one column to be an outlier at the same time.

Canadian exchange rate dataset with Penny Analytics results, preparation for charting in Excel

To create a chart, first select the date column and the three original columns. You should get a line chart that looks like this:

Canadian exchange rate dataset with Penny Analytics results, initial chart in Excel

Then go to select data and add another series. Add the outliers column. Right hand click this series only and change the series chart type to scatter chart. You should now have a chart that looks like this:

Canadian exchange rate dataset with Penny Analytics results, chart in Excel with outliers marked

Now, in this dataset example all the column metrics were on similar scale, but is life always like that? What if, instead, we had a dataset with an additional metric such as Canada’s GDP? In that case, we would need to index the data. Here, we have made a copy of our worksheet and renamed it CAD_indexed. In this new worksheet, we changed the formula for the exchange rates to give us an index number starting at 100.

Canadian exchange rate dataset with Penny Analytics results, indexing the data to deal with metrics on different scales

Then we also made a copy of our chart tab and went into its select data->data source and changed the two references to the source worksheet to point to CAD_indexed instead.

Canadian exchange rate dataset with Penny Analytics results, changing the Excel chart references to point to indexed data

This is what our chart looks like now:

Canadian exchange rate dataset with Penny Analytics results, chart in Excel with data indexed and outliers marked