Suppose I had two coins and I flipped both of them.  The possible combinations can be two heads, two tails, or one of each.  These combinations are all part of a sample space.

Now let’s take this a step further.  Taking the above demonstration, we want to determine the probability that each combination would occur.

Assuming independence, we derive the following probabilities:

  • P(\text{two heads}) = 0.25
  • P(\text{two tails}) = 0.25
  • P(\text{one of each}) = 0.5

All of these probabilities belong in a probability distribution.

What is a probability distribution?

A probability distribution holds the probabilities of all possible occurrences within a sample space.  A probability distributions can either be discrete (containing a finite number of  occurrences) or continuous (containing an infinite number of occurrences).

To specify the probability of a specific outcome, however, we have to refer to another concept called random variables.

What is a random variable?

A random variable represents a possible outcome from a sample space.  They are often denoted the same way that you would denote a variable in a mathematical equation, e.g. X.

Whether a random variable is discrete or continuous is always determined by the probability distribution that it represents.

What makes a probability distribution valid?

Determining whether a distribution is valid depends on whether it’s discrete or continuous.

In a discrete distribution, all of the probabilities would sum up to one:

P(S) = \displaystyle\sum_{X \in S}P(X)=1

In the case of a continuous distribution, the integral of the probability density function would integrate to one:

P(S) = \displaystyle\int_{-\infty}^\infty f(x)dx = 1

Types of probability distributions

Depending on whether the distribution is discrete or continuous, there are several types that can be used, each having specific applications.

In the case of discrete probability distributions, the following can be used:

  • Binomial – Used to measure the probability of an event happen k times.
  • Bernoulli – Special case of Binomial, where we measure an event once.
  • Poisson – Measures the probability of an event within an interval.
  • Geometric – Measures the probability of an event occurring after a certain number of tries.

In the case of continuous probability distributions, we can resort to the following:

  • Uniform – There’s no variance.  While it’s been awhile that I last used this distribution, it assumes that events happen systematically.
  • Normal (Gaussian) – Known to be symmetrical.  A very important distribution since several concepts utilize this distribution.
  • Exponential – Used to determine the amount of time between events.

There are additional distributions that were not listed, but the point here is to show the possible distributions that you can use.

Ties with Machine Learning

Now, some of you might be asking how does all this relates to machine learning.  Generally, probability distributions are used in probability and statistics.  Given some context, we would want to know the likelihood of an event happening.  Machine learning comprises of so many different types of models, many of them not even relating to probability.  Despite this, probability distributions do play a role with Naive Bayes.

Recall that Naive Bayes has several variations.  All of these variations involve the use of probability distributions to determine the probability of an event.  For example, in document classification, we want to know whether a document pertains to a specific concept.  We can use the words within the documentation to determine how likely they belong to a specific document.


While probability distributions don’t play a major role in machine learning, they do play a role when interpreting problems, determining likelihoods, and testing hypothesis.  Even if they don’t have a major role in machine learning, it’s still a good idea to be familiar with probability distributions.

Have any questions or spotted an error?  Leave a comment down below.

Update: I reworded on how to determine the validity of a continuous probability density function.