
I was completely shocked today when I learned of Benfords Law. I'm sure there's probably a ton of nerds out there who know about it, making this blasé to you all, but this was the first I heard about it. So I just had to put it to the test - and it worked.
Benfords Law, also known as the "first-digit-law", states that, within a reasonable distribution of numbers, the first digit is predictable - like magic I tell you! On a nice slope, the number 1 will show up more frequently as the first digit than the others, each declining in probability as you move towards 9.
For example, were you to look at the world's tallest buildings, the first digit will be the number 1 at about 30% of the time - and it's true; Even if you're looking at the population of the world's countries.
This law works so well it's used for fraud detection. It's even admissible as evidence in US criminal court. It's apparently easy to detect human fraud when working with numbers because we tend to evenly distribute the first digit of our "fake" numbers over the spread of possible numbers.
I decided to test this witch magic for myself with some data I already have (I'm a database guy, so I've got data all over the place to play with).
The data I selected was Field Services Technician Job Time. I created a total amount of job time spent in every Zip Code for the last year, broken out by the category of work performed.
I then extracted just the first digit of the TotalTime, counted the occurrence of each digit, and calculated the distribution of the occurrence of that digit over all.
I ended up with a nice Benford Curve, with the distribution of the occurrence of the number 1 at 27%.
My first possible digit did include 0 as well, since I didn't feel like rounding up to whole numbers first. I'm not sure what Benfords Law says about the occurrence of zero, but given that my curve matches a Benford pattern, I'll bet the occurrence of my zero is likely in line with an expected result of 17% (given a data set where zeros are accountable, which would not happen when you look at, say, building heights). I would imagine if you were looking for fraud within certain datasets, zero might make for an interesting peek, though I'm sure you can just truncate it and grab the first non-zero, and be done with it.
Look at what I found in my data!

I'd say Benfords Law just works like Magic right before my very eyes!
It is important to note that obviously Benfords Law doesn't apply to everything. You need a good data sample size. For instance, if you're looking at the distribution of the first digit for a set of numbers between 20 and 80... well, I bet you can guess the probability of a 1 showing up in that set. :)
Just for giggles, I ran another test of the Law against another data set I ripped from Wikipedia, total Exports by Country;




