Table of Contents
Let’s introduce examples of the geometric distribution and the negative binomial distribution.
Example
We roll two dice simultaneously. Let’s call this trial A. We represent both dice showing 1 as T and any other outcome as F.
We repeat trial A until T occurs and then stop at that point. Let’s call this trial B.
In trial B, we define R as the number of times F occurs, i.e., the number of times both dice don’t show 1.
We repeat trial B c times and take the average of R, which we denote as X.
Probability of X
We will write the probability of c,X as P(c,X).
Let’s consider the case when c=1. Basically, when we roll the dice once, the probability of T appearing is 136, and the probability of F appearing is 3536. Since P(1,X) is the probability of T appearing after X consecutive Fs, it can be expressed as:
P(1,X)=(3536)X136and this is the geometric distribution with respect to X(=cX).
Let’s consider the general case when c is not 1. We roll the dice a total of (cX+c) times. The possible outcomes are combinations of cX consecutive Fs before any of the c T’s, which is the combination XHcX(=c+cX−1CcX=c+cX−1Cc−1). In other words, it is the combination of (c−1) out of (c+cX−1) occurrences to be T. Note that for c>1, X can take non-integer values.
The probability is then:
P(c,X)=c+cX−1Cc−1(3536)cX(136)cand this is the negative binomial distribution with respect to cX. As for X, it represents the distribution of averages over multiple trials B, so the central limit theorem holds.
Graphing the Probability Distribution
Let’s display line graphs for c=1,2,5,10,20,50,100. In Excel or LibreOffice, you can use the COMBIN
function to calculate combinations and create graphs.







Since overlapping all graphs together makes it difficult to read, we overlay the graphs of cP(c,X) for c=1,2,5,10,20,50,100. Note that only integer values of X are plotted, so the graph for c=100 is slightly rough.

For c=1, the distribution was not only asymmetrical but also decreasing to the right. However, as c increases, it becomes more similar to a normal distribution. As c increases, X=35 becomes the most probable value, and the distribution changes accordingly.
Geometric Distribution
When c=1, it was the geometric distribution. The geometric distribution is expressed as P(X)=p(1−p)X−1 for X=1,2,3,⋯, and its E(X)=1p and V(X)=1−pp2.
In the above example, with p=136, we can express P(X)=(1−p)⋅p(1−p)X, so the expected value is E(X)=1−pp=35, and the variance is V(X)=35⋅36.
Negative Binomial Distribution
For c=1 and the general c, it was the negative binomial distribution. The negative binomial distribution for the number of failures X(=0,1,⋯) before the kth success is given by P(X)=k+X−1CXpk(1−p)X, and its E(X)=k1−pp and V(X)=kqp2.
In the above example, with p=136, and considering the number of failures cX before the cth success, we can express P(cX)=c+cX−1CcXpc(1−p)cX, so the expected value is E(cX)=c35, and the variance is V(cX)=c35⋅36. Consequently, E(X)=35, and V(X)=35⋅36c. As observed in the graph, the expected value indeed becomes 35.