At Penny Analytics, we do outlier detection, and by this we mean we find unusual records in the datasets our customers upload to our website. We may be from the same part of the world as Malcolm Gladwell, but our interest in outliers is about outliers in large datasets.
So, what is the meaning of an outlier? And is it the same as an anomaly?
From a language point of view, an outlier and an anomaly are the same thing, it’s just one word is derived from old English and the other is Latin and Greek. And although we are based in North America, we still have a taste for the simpler old English words. (Another English word is penny, an unpretentious coin if there ever was one).
From an academic point of view, an outlier is a genuine data point that is just far from the norm (think: Dionne quintuplets) whereas an anomaly is a data point generated by a different process (think: multiple births resulting from fertility treatments). But even academics use the terms interchangeably, because in practice the impact of an anomaly vs. an outlier is often the same.
But what is anomaly detection for business?
If academics use the terms interchangeably, does the commercial world also? Then we get into common business usage. Anomaly detection software typically refers to software that analyzes streams of time series data in near real time. The idea is to find events that require action. A classic application is the monitoring of IT logs. Let’s say your company has several machines and processes running continuously. If one of your machines goes offline or one of the processes fails, that’s when you want an alert to go out. In fact, if one of your machines slows right down or the process is taking much longer than usual, that’s also when you want an alert to go out. The need for alerts has always been there but machine learning models can now increase the sophistication of the alerting, so that you are not waking up your support people in the middle of the night for a false alarm and you are only showing them the most important issues.
So, that’s another reason that at Penny Analytics we position ourselves as outlier detectors, as, although we can find anomalies in time series data (as well as multivariate), we do not offer real time alerts. The market for real time alerts contains several players, their products are impressive and the companies themselves are well funded. Adobe, Anodot, Avora, and Azure are some examples, but there are more letters in the alphabet than A! We think it would be a project in itself for you to both evaluate your readiness for this kind of software and evaluate the players. In future blogs, we plan to touch on these topics but for now, we would like to leave you with this thought:
Before you get yourself wrapped up in a big procurement project, wouldn’t it be valuable to gain some experience in anomaly detection using your own data, today? If the answer is yes, then try our online outlier detection service. The results we provide will serve as a benchmark and what you learn from it will help you navigate the anomaly detection software space.