On a podcast the other day they were discussing the world’s strangest languages.  Apparently, the world’s strangest language is Chalcatongo Mixtec, spoken by 6000 Mexicans and what makes it strange is that there is no such thing as a question, everything is a direction.  This may account for why the population is small, since human relations have got to be challenging without questions.  English was also discussed as one of the more unusual languages.

Anyway, we decided to see what the Penny Analytics outlier detection system would come up with. We sourced our dataset from Kaggle. The dataset itself is a bit unusual because almost all the variables are categorical and there are many missing values. Our only difficulty was that the dataset had 202 columns and our column limit is 200, so we removed the first three columns as these were just labels.  We are glad to report that the Chalcatongo Mixtec and English have both been identified as outliers (in the top 3% of records), ranking 50 and 33 respectively. But according to us, German is the biggest outlier.  We’ll take that away and think about it in the biergarten.

Original dataset (2679 records, 202 columns):

World atlas of languages on Kaggle

Our data profiling report:

language_firsthreecolsremoved_profiling_report

(To enable all features of the data profiling report including toggle details, you will need to download it and open it from there.)

Our results file

language_firsthreecolsremoved_penny_outliers_1494_374
Categories: Blogposts

Copyright © 2020 Penny Analytics Limited All rights reserved.