Probability Distributions in Real Life
Stats Analysis 3: What are the different probability distributions. Where is each one of them used, and what are their respective expected values and variance.
Required Readings
Table of Contents:
How Probability Distributions are used in Real Life
Normal/Gaussian
Exponential
Lognormal/Galton
Poisson
Binomial/Bernoulli
1 - How Probability Distributions are used in Real Life
Probability distributions are the mathematical function that tell us what the probability of different occurrences happening are. In the DS world, you do not specifically need to know what the function itself is, however what you will need to know are some of the commonly used distributions in real life.
Here is an example of a real life distribution. If we look at the histogram, we can see that this maps quite well with the Kernel Distribution.
Here are some of the most common probability distributions you will encounter, and the domains.
2 - Normal/Gaussian
Histogram
Real Life Scenarios
Almost every single Machine Learning algorithm relies upon normal distribution in one way or another. Typically population data is great for mapping normal distribution. Here are some of the common problems that can be mapped quite well with the normal distribution:
Height: Most men are at around 5feet 10 inches
IQ: Average IQ is 100
Stock Market Returns: the volatility is standard deviation
Income Distribution: average adult earns 50k USD
R/Python
In R, you can use the rnorm() function to generate a random number from the normal distribution.
In Python, we can use the random.normal() function from the numpy library to generate a random number from the normal distribution.
3 - Exponential
Histogram
Real Life Scenarios
Exponential distribution assumes the vast majority of a sample maps out to a small number, and then a very small amount of people map out to a large number. The basic idea is lots of small numbers, and very few amount of large numbers.
Here are some real life scenarios:
Amount of money customers spent
Waiting time modelling
Failure prediction modelling
The amount of time until event X occurs
R/Python
In R we can use the rexp() function in order to generate samples based off the exponential distribution.
In Python we can use the random.exponential() in order to generate samples based off the exponential distribution.
4 - Lognormal/Galton
Histogram
Real Life Scenarios
The lognormal distribution is quite similar to the normal distribution, but it does have some key differences. Both distribution are used to describe the probability of an event occurring. Unlike a normal distribution however, the lognormal is not symmetrical, and also the distribution itself is right skewed.
Here are some real life examples of where this distribution comes in handy:
Rainfall Droplets
Gas at a reserve
Product Reliability analysis
R/Python
In R, we can use the rlnorm() to generate samples based off the lognormal distribution
In Python, we can use the numpy.lognormal() function from the numpy library to generate samples based off the lognormal distribution.
5 - Poisson
Histogram
You can actually use Exponential distribution and the Poisson distribution hand in hand. Observe from the video below:
Real Life Scenarios
The Poisson process is used to state the average number of events in a specific time period. For example, if you live in LA, then you can pretty much assume there will be like 5 robberies per day. We can actually map this out via a Poisson process, where lambda = 5, and we can making a distribution on a per day basis. Here are some more real life examples of the Poisson process:
Average number of scam calls you get per hour
Average number of customers arriving per minute
Number of website visitors per day
Number of ISP internet network failures per week
R/Python
In R, we can use the rpois() function in order to generate samples from the poisson process.
In Python, from the numpy library, we use the random.poisson() function.
6 - Binomial/Bernoulli
Histogram
Real Life Scenarios
The binomial distribution is used to model the number of success in a specific number of experiments. You can think about this as the probability you get something like 5 heads, when you flip the coin 20 times or something. If, we are only flipping the coin (running the experiment) 1 time only, then it is called a Bernoulli distribution. Here are some more real life examples of where this is used:
Number of side effects from a medicine
Fraud Transactions
Customer product returns per month
R/Python
In R, you can use rbinom() function to generate samples from the binomial distribution.
In Python, you can use the random.binomial() from the numpy library to do the same thing.