Stochastic OR

1.3 Fighting the Exponential Distribution is Futile

Perhaps the mathematical arguments presented in the preceding section have not fully convinced you of the universality of the exponential distribution and its counterpart, the Poisson distribution. To enhance your understanding, let us use simulation to see how the exponential distribution emerges as a sort of law of nature.

The simulation is based on an internship of one of my master's students with the task of evaluating the scheduling system for patients with inflammatory bowel disease (IBD) at a hospital. Patients with IBD can be categorized as regular and urgent. Regular patients required check-ups approximately every six months, whereas urgent patients needed to be seen within one week, that is, five working days.1 This procedure has been changed recently; however, the ideas of this case apply much more generally. Each consultation takes 10 minutes. Immediately after a consultation, regular patients scheduled their next visit for approximately six months later. There are about 1000 patients in the system and a working year consists of 200 working days. Given that each patient is seen twice per year, a physician needed to consult with an average of 10 patients daily.2 Two visits per year per patient times 1000 patients distributed over 200 days per year is 10 patients per day. The problem was to design a planning procedure that allowed enough flexibility to meet the time constraints for the urgent patients without leading to idle times for the physicians. Of course, new patient intakes occurred, but for simplicity we assume that the patient population remains constant.

Planning regular patients six months in advance seems to lead to stable arrival times, but this is not true, as there are many practical disturbances. A significant number of patients rescheduled their appointment date, for example, due to altered vacation plans. There were no-shows and yet other patients relocated to other cities; some died. However, even without such practical problems and even when each patient individually shows relatively predictable behavior, the simulation results below show that at the population level the inter-arrival times are well modeled as exponentially distributed rvs.

In conclusion, whether the planner plans six months in advance or does not plan at all, there will not be much difference in the variability of the number of patients arriving per day. In other words, the idea that making appointments half a year in advance will reduce variability by a large amount is simply not true. The verdict must then be clear: When rules do not have an effect, it is best to drop them altogether. It is simply better to send patients a reminder for a new appointment three weeks before a new visit is due, rather than planning six months in advance. Far fewer replanning actions are needed, and the probability for no-shows decreases.

We now turn to developing the simulation to show how the exponential distribution arises 'out of thin air'. We focus only on regular patients and neglect urgent patients as this is a relatively small group. Assume that the hospital has just \(50\) regular patients. For ease, scale the time so that each patient plans the next appointment uniformly in the interval \([0.9 , 1.1]\), independent of previous appointments and other patients.3 Like this, the inter-arrival times between two consecutive appointments correspond to half a year (100 working days) plus or minus 10 days (observe that \([0.9, 1.1] \times 100 = [90, 110]\)).

The question of interest is to estimate the distribution of inter-arrival times between two patients as seen by the planner of the hospital. Of course, with only one patient this is straightforward, the planner sees \(\sim \Unif{[0.9, 1.1]}\). But what happens for many patients?

Fig. 1 provides the answer to this question, but we first discuss the code that produces this figure, as the code clarifies how to interpret the figure.4 If you don't understand certain commands of python, just consult the web or an LLM.

In python, we nearly always start with loading a set of modules.5 Here and elsewhere, we place comments above the code to which they relate. We will not explain commands that are trivial to understand. Modules provide specific functionality not required by every Python program; hence, these should only be loaded when needed. For example, the numpy module offers support for numerical work, while matplotlib is a plotting library.

import numpy as np
import matplotlib.pyplot as plt

To create a pdf file with a plot that looks good in \LaTeX, we use the seaborn library and provide it with some extra options6 Such configurations are not algorithmic, so you don't have to memorize this. , such as good font sizes.7 While designing a figure, I comment out this code because running \LaTeX\/ slows down the script.

def initialize_plots():
    import seaborn as sns

    rc = {
        "text.usetex": True,
        "font.family": "fourier",
        "axes.labelsize": 10,
        "font.size": 10,
        "legend.fontsize": 8,
        "xtick.labelsize": 8,
        "ytick.labelsize": 8,
    }
    sns.set(style="whitegrid", rc=rc)


initialize_plots()

The simulation environment contains num_patients patients and runs for a number of num_periods half years. With default_rng we can set up a random number generator and by setting a seed8 Here the seed is \(2\), but any number would do. we ensure to always obtain the same random deviates.9 A random deviate is a sample taken from a distribution; it is not the same as a rv.

num_periods = 10
num_patients = 50
rng = np.random.default_rng(2)

We next generate a number of inter-arrival times in the following way. The matrix \(X\) has num_patients rows and num_periods columns. Row \(k\) of \(X\) corresponds to the inter-arrival times of patient \(k\). Each inter-arrival time is \(\sim\Unif{[0.9, 1.1]}\). With these inter-arrival times, we use Eq. (1.2.2) to compute the arrival times for patient \(k\) as \(A_{i}^{k} = A_{i-1}^{k} + X_{i}^{k}\).

The cumsum of a matrix along axis=1 adds elements one by one along a row.10 The cumsum of the sequence of numbers \(1, 3, 5\) is \(1, 4, 9\). Hence, if we want to implement \(A_{0}=0\), we set just set \(X_{0}=0\) and keep \(X_{1}=3, X_{2}=5\), so that [0, 3, 5].cumsum() expands to [0, 3, 8]. As such, the elements of the first column of A would be equal to the first column of X. To ensure that the first row of A is zero, so that all patients start at time \(0\), we set the first column of X to \(0\). However, setting the first column of inter-arrivals \(X\) to zero, we effectively remove one arrival. Thus, we should generate num_periods + 1 inter-arrival times.

X = rng.uniform(low=0.9, high=1.1, size=(num_patients, num_periods + 1))
X[:, 0] = 0  # set first column of X to 0
A = X.cumsum(axis=1)

We need an environment to control the plots. The fig object can be thought of as a canvas on which we draw the graphs, and axes is a list of ax objects, where each ax represents a single graph within the figure. As is apparent from Fig. 1, we have a matrix of four ax objects. For easy reference in the code below, we flatten11 Ask an LLM for further explanation. the axes object and unpack it into the four separate objects ax1, \(\ldots\), ax4.

fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(6, 3))
ax1, ax2, ax3, ax4 = axes.flatten()

Next, we build the upper left panel of Fig. 1. Each row contains a number of dots, and each dot corresponds to an arrival of a patient.12 The \(x\) coordinate of the \(i\)th dot of patient \(k\) is \(A_{i}^k\) and its \(y\) coordinate is \(k\). As we have \(50\) patients, we have \(50\) such rows of dots. The for loop writes the dots of each patient to the upper left panel (which is referenced by ax1). To beautify the plot, we give a few options to the plot command. We do not want the dots to be connected by lines, so we set the linestyle ls='None'.

ax1.set_xlim(0, 11)
for i in range(num_patients):
    x = A[i, 1:]
    y = i * np.ones(num_periods)
    ax1.plot(x, y, ls='None', marker="o", ms=0.2, c="k")

We plot the arrivals as seen by the planner as a jitter plot, that is, we add small random noise to the data points along the \(y\)-axis. By spreading out the points slightly, the jitter plot allows for a clearer visualization of the distribution of individual points. We plot the dots at \(y=0\) plus a jittering distributed as \(\sim\Unif{[-1/2, 1/2]}\). Without the jitter, we would just see small line segments instead of somewhat rectangular shapes.

for i in range(num_patients):
    x = A[i, 1:]
    y = rng.uniform(-0.5, 0.5, size=num_periods)  # add jitter
    ax1.plot(x, y, ls='None', marker="o", ms=0.2, c="k")

Our next task is to estimate the density of inter-arrival times as observed by the planner and plot this in the right panel of Fig. 1. Recall that each row of the matrix \(A\) contains the arrival times of one specific patient. By flattening \(A\) we obtain all arrival times in a long list. By sorting, we obtain the arrival times in order of time. Finally, the inter-arrival times for the planner are the times between the elements of this list. In other words, if the arrival times \(\{A_{k}\}\) are known, the \(k\)th inter-arrival time is given by

\begin{equation*} X_k = A_k - A_{k-1}. \tag{1.3.1} \end{equation*}
A = np.sort(A.flatten())
X = A[1:] - A[:-1]

We assemble all inter-arrival times of \(X\) in 50 bins. On average, we have \(\lambda=50\) arrivals per unit time; thus, the mean inter-arrival time is \(1/\lambda = 1/50\). Next, recall that the pdf of the exponential distribution is \(\lambda e^{-\lambda x}\). To plot the pdf, we take as support for the plot \(5\) times the mean time between arrivals, that is, \(5/\lambda = 0.1\). We then plot the histogram of the inter-arrival times and the pdf on ax2, which corresponds to the upper right panel. The argument density = True normalizes the area of the bars in the histogram to one, enabling us to compare the histogram with the plot in the pdf.

bins = np.linspace(0, 5 / num_patients, 50)
exp_density = num_patients * np.exp(-num_patients * bins)
ax2.hist(X, bins, density=True, color="k")
ax2.plot(bins, exp_density, ":", c="k", lw=0.75, label="pdf")
ax2.legend()

What can we see in the upper panels of Fig. 1? In the upper left panel, the arrival times of each patient are at first tightly concentrated around \(1\), but as time progresses, the arrival times become more spread out. The jitter plot around \(y=0\) also demonstrates this. The rectangles become wider until they almost overlap. In the right upper panel, we see that in comparison to the exponential density, there are relatively many short inter-arrival times. This is expected because all patients start at the same time, so arrival times will be concentrated for the first couple of visits.13 But even with these unrealistic starting times, the exponential distribution is not a bad fit. Therefore, letting all patients start at the same time is not a good model for a hospital that has been serving patients for years.

To compensate for these unnatural start times, we set the initial arrival time of each patient as uniformly distributed on \([0,1]\). With this change, we run the above code again and plot the results in ax3 and ax4 to obtain the lower left and right panels of Fig. 1. Now we see that the jitter rectangles, that is, the arrivals as seen by the hospital, overlap right from the start, and the exponential distribution is an excellent fit.

The next code saves the figure. The tight_layout() function ensures that the axes labels are nicely formatted.

fig.tight_layout()
fig.savefig("/figures/IBD_exponential.pdf")
fig.savefig("/figures/IBD_exponential.png", dpi=300)
IBD_exponential.png
Figure 1: How the uniform distribution leads to the exponential distribution.

To conclude, if many patients arrive at the hospital and each contributes some small variation to the inter-arrival times, the hospital sees exponential inter-arrival times for the population. Clearly, this is not specific to hospitals: It applies to any system that serves a large pool of customers, each arriving with some variation. To avoid any confusion, here we model the inter-arrival times for one patient as uniformly distributed, but the same phenomenon occurs when the inter-arrival times for one patient have other types of distribution.

TF 1.3.1

Consider the following chunk of code, where the \(i\)th row of \(X\) corresponds to the inter-arrival times of the \(i\)th patient to a hospital.

Python
import numpy as np

rng = np.random.default_rng(3)
N, T = 5, 8
a, b = 0, 3
X = rng.uniform(low=a, high=b, size=(N, T))
X[:, 0] = 0
A = X.cumsum(axis=1)

Claim: the \(i\)-th row of \(A\) contains the \(T\) arrival times of the \(i\)-th patient.

Solution
Did you actually try? Maybe see the 'hints' above!:
Solution, for real

False. The \(i\)th row of \(A\) contains the first \(T-1\) number of arrival times for patient \(i\) because \(A[0] = 0\). Hence, when \(T=8\), the column \(A[:,T]\) corresponds to the \(7\)th arrival time of the patients.

TF 1.3.2

Claim: this program prints 22.

Python
x = [3 * i + 1 for i in range(10)]
print(x[8])
Solution
Did you actually try? Maybe see the 'hints' above!:
Solution, for real

True.

TF 1.3.3

Claim: the keys and the values of the dict \(a\) are 8, 13 and 18.

Python
a = {}
start, thres = 3, 19
while True:
    start += 5
    if start >= thres:
        break
    a[start] = start

print(a.keys())
Solution
Did you actually try? Maybe see the 'hints' above!:
Solution, for real

True.

TF 1.3.4

Claim: this program prints 64.

Python
def doit(a):
    match a:
        case int():
            return 8 * a
        case float():
            return 9 * a
        case _:
            raise ValueError("Unknown type passed")

print(doit("8"))
Solution
Did you actually try? Maybe see the 'hints' above!:
Solution, for real

False. The function is applied to the string '8', not the number (integer) 8.

TF 1.3.5

Given that the function uniform() returns a uniform random number on the interval \((0, 1)\) then the following function generates a random variate with a Poisson distribution with parameter \(\lambda=\)labda.

Python
def Pois(labda):
    R = 0
    T = - np.log(uniform())
    while T < labda:
        R += 1
        T += -np.log(uniform())
    return R
Solution
Did you actually try? Maybe see the 'hints' above!:
Solution, for real

True. The solution follows directly from standard probability and Section 1.2. Take \(U \sim \Unif{(0,1)}\), and \(f(x) = -\frac{1}{\lambda}\log x\). Then,

\begin{equation*} \P{f(U) \leq a} \stackrel1= \P{U \geq f^{-1}(a)} \stackrel2= 1- f^{-1}(a) \stackrel3= 1- e^{-\lambda a}, \end{equation*}

where \(1\) follows from the fact that \(f\) is strictly decreasing, 2 from the general property of the uniform distribution \(\P{U\geq b} = 1 - b\), and \(3\) because \(f^{-1}(y) = e^{-\lambda y}\). Thus, if we define \(X:=f(U)\), then \(X = -\frac{1}{\lambda}\log U \sim \Exp{\lambda}\). In other words, the rv \(X=-\frac{1}{\lambda} \log U\) is exponentially distributed.

Next, define

\begin{equation*} N(t) = \inf\{j : A_{j} \leq t < A_{j+1}\}, \end{equation*}

in words, \(N(t)\) is the index of the last arrival before or equal to \(t\). Moreover, when \(\{U_{i}\}\) is a sequence of iid uniform rvs on \((0,1)\) then the \(X_{i} = -\frac{1}{\lambda} \log U_{i}\) are iid \(\sim \Exp{\lambda}\). Since \(A_{k} = A_{k-1} + X_{k}\), the event

\begin{align*} \{N(t) = k \} &= \{A_k \leq t < A_{k+1}\} \\ &= \left\{\sum_{i}^{k} X_{i}\leq t < \sum_{i}^{k+1} X_{i}\right\} \\ &= \left\{-\frac{1}{\lambda}\sum_{i}^{k} \log U_{i}\leq t < -\frac{1}{\lambda}\sum_{i}^{k+1} \log U_{i}\right\}. \end{align*}

Now taking \(t=1\), we arrive at the equality:

\begin{align*} \{N(1) = k \} = \left\{-\frac{1}{\lambda}\sum_{i}^{k} \log U_{i}\leq 1 < -\frac{1}{\lambda}\sum_{i}^{k+1} \log U_{i}\right\}. \end{align*}

Since \(-\frac{1}{\lambda}\log U\sim \Exp{\lambda}\), the sum of \(k\) such rvs is \(\Gamma{k, \lambda}\), and, by Eq. (1.2.4) and Eq. (1.2.5),

\begin{align*} \P{N(1) = k} = \P{-\frac{1}{\lambda}\sum_{i}^{k} \log U_{i}\leq 1 < -\frac{1}{\lambda}\sum_{i}^{k+1} \log U_{i}} = \frac{\lambda^{k}}{k!} e^{-\lambda}. \end{align*}

The next algorithm implements the above.

Python
def Pois(labda):
    R = 0
    T = - np.log(uniform()) / labda
    while T < 1:
        R += 1
        T += -np.log(uniform()) / labda
    return R

Here we divide every rv uniform() by \(\lambda\), but this is numerically wasteful, which we can avoid by multiplying the threshold by \(\lambda\).

Note that each uniform rv in the while loop must be a new drawing (i.e., realization of a rv). If not, we keep on adding the same number, which is wrong.

It remains to formalize the observations from the above plots. In fact, we should distinguish between two limiting regimes. Patients recurrently plan appointments, separated around some mean period, that is, roughly every half a year. The upper left panel in Fig. 1 suggests that if the number of appointments increases, the appointments for each patient eventually become uniformly distributed on the period length (here half a year). The lower left panel suggests that if the appointments are uniformly distributed to begin with, they remain uniformly distributed. So, our first step is to prove the claim that in the limit \(t\to \infty\), stochastically recurring inter-arrival times become uniformly distributed over an interval. The second claim is that, in the limit of many patients, each arriving at a uniformly distributed time in an interval, the inter-arrival times of all patients together become exponentially distributed, in a scaled sense.

The theorems and proofs are elegant and combine order statistics, the beta distribution, first-step analysis, and some infinitesimal calculus to give the reasoning a heuristic flavor. Finally, we discuss an important smoothing property of taking expectations. This idea is also relevant in data science when moving averages are used. However, if you are not interested in probability theory, skip the rest of the section.

Let us first show that periodically planned arrivals, with some stochasticity, tend to become uniformly distributed over an interval. To formalize this claim, we consider a discrete-time model. There are on average \(C\) days between two successive visits of a patient (for the moment, the time on a day is of no importance). For instance, let the inter-arrival times \(X_{i}\) be such that \(\P{X_{i}=C-1} = \P{X_{i}=C+1}=1/2\). In this case, if the process starts on day \(0\) and \(X_{1} = C-1\), the next visit will be on day \(C-1\), but if \(X_{1}=C+1\), the next visit will be on day \(1\) of the next cycle. Looking at the arrival process like this, the patient moves on a circle with states \(\{0, 1, \ldots, C-1\}\), and with every visit the patient moves with probability \(1/2\) one state to the left or to the right. Thus, when \(X=C-1\), we can just as well model this as \(X=-1\).14 There is a small subtlety when the cycle length is even and \(\P{X=C-1} = \P{X=C+1} = 1/2\). As an even number is a multiple of \(2\), the odd (even) days can only be visited after an odd (even) number of visits. The problem is easily resolved by assuming, in addition, that \(C\) is odd or \(\P{X = 0} > 0\).

Th. 1.3.1

Assume that the arrival times \(\{A_{i}\}\) live on the circle space \(\{0, 1, \ldots, C-1\}\), i.e., \(A_{i} = (A_{i-1} + X_{i}) \bmod C\). If \(\{X_{k}\}_{k=1}^{\infty}\) is a set of iid rvs such that \(\E{X_{k}} < \infty\) and \(\sup_{j}\P{X_{k} =j} < 1\), then \(\alpha_k(i) = \P{A_k=i}\), i.e., the probability that the \(k\)th arrival is on the \(i\)th day, converges to \(1/C\) as \(k\to\infty\).

Proof

We impose the condition \(\E{X_{k}}<\infty\) to prevent having to deal with \(A_{k} = \infty\) for some finite \(k\), and we need \(\sup_{j}\P{X_{k} =j} < 1\) to exclude trivial rvs.15 A rv \(X\) is trivial when \(\P{X=a} = 1\) for some \(a\).

For the proof, assume for the moment that \(\P{X_{k} = 1}=p \in (0, 1)\) and \(\P{X_{k}=-1} = q=1-p\), and let \(C\) be odd.14 By first-step analysis,

\begin{equation*} \alpha_{k+1}(i) = p \alpha_k(i-1) + q \alpha_k(i+1). \end{equation*}

Clearly, if \(\alpha_{k}(i)=1/C\) for all \(i\), then \(\alpha_{k+1}(i) = 1/C\) also. However, if not all \(\alpha_{k}(\cdot)\) are equal, there must be at least one state \(i\) such that \(\alpha_k(i)\) is strictly larger than at least one of its neighbors, that is, \(\alpha_{k}(i) \geq \max\{\alpha_k(i-1), \alpha_k(i+1)\}\) and \(\alpha_{k}(i) > \min\{\alpha_k(i-1),\alpha_k(i+1)\}\).16 Observe that, for this argument, it is necessary that \(C\) is finite. But then, on the next visit,

\begin{equation*} \alpha_{k+1}(i) = p \alpha_k(i-1) + q \alpha_k(i+1) < \alpha_k(i). \end{equation*}

In words, as long as not all \(\alpha_{k}(i)\) are equal for all \(i\), after each visit the maximal \(\alpha_{k}\) becomes lower, and by symmetry the minimal \(\alpha_{k}\) becomes higher. As the minimum and maximum cannot cross after any step, all \(\alpha_{k}(i)\) have to converge to the same value as \(k\to\infty\). Since there are \(C\) states, \(\lim_{k\to\infty}\alpha_{k}(i) \to 1/C\) for all \(i\in \set{0, 1, \ldots, C-1}\).

The argument is easy to generalize. Let \(\P{X=j} = p_{j}\). By assumption \(p_{j} \in (0, 1)\). Then, \(\E{\alpha_k(i+X)} = \sum_{j} p_{j} \alpha_k(i+j) < \alpha_k(i)\) if \(\alpha_k(i) \geq \alpha_{k}(\cdot)\) when there is at least one \(j\) such that \(\alpha_k(i+j) < \alpha_k(i)\) and \(p_{j} > 0\). And similarly for the minimum. In total, this implies that \(\alpha_k(i) \to 1/C\) for all \(i\) as \(k\to \infty\).

Clearly, the proof applies to any finite \(C=1, 2, \ldots\). To see how to generalize the argument to the circle \([0, 1)\) on the reals, consider a continuous function \(f\) on the unit circle \(C\). As the circle is compact and \(f\) continuous, \(f\) achieves its maximum \(f(\bar x)\) at \(\bar x\), say. Suppose for some \(\epsilon>0\) there is set \(A_{\epsilon}(\bar x) = \{x \in C: f(x) \leq f(\bar x) -\epsilon\}\) such that \(\P{A_{\epsilon}(\bar x )} > 0\). (If there is no such set for any \(\epsilon\), then \(f\) is a constant.) But then \(\E{f(x+X)} \leq (f(\bar x) -\epsilon) \P{A_{\epsilon}(\bar x)} + f(\bar x) (1-\P{A_{\epsilon}(\bar x)}) < f(\bar x)\). Again, taking an average over \(f\) makes the maximum smaller and the minimum larger.

Applied to the hospital, after many visits of a patient—each roughly half a year apart— the visits will appear to be uniformly distributed on a `circle' of half-year length.

There is one point that remains about the lower left panel of Fig. 1; recall that for this panel we started the simulation with uniformly distributed arrival times. What is the distribution of the arrival times on the circle if we start like this? Let us analyze this for the case in which the inter-arrival times are uniform on \([0,1]\) and we take the circle as \(C=[0, 1)\). Observe that when \(C=[0, 1)\), \(U+V \bmod 1\), that is, the sum of two iid uniform rvs on the circle, is just the fractional part of \(U + V\).

Th. 1.3.2

Let \(Z=\sum_{i}^{n} U_{i}\) for \(n\) iid rvs \(U_{i}\sim \Unif{[0,1)}\). The fractional part of \(Z\), i.e., \(Z-\lfloor Z \rfloor\), is uniformly distributed on \([0,1)\).17 \(\lfloor x \rfloor\) is the integer part of \(x\in \R\).

Proof

Realizing that the density of the sum of two uniform rvs on \([0,1]\) is a triangle with base \([0,2]\) and height \(1\),

\begin{align*} \P{U +V- \lfloor{U+V} \rfloor \leq x} &= \P{U +V\leq x} + \P{1\leq U +V\leq 1 + x} \\ &= x^{2}/2 + (1/2 - (1-x)^{2} /2) = x. \end{align*}

Therefore, the fractional part of the sum of two uniform rvs is uniform on \([0,1]\). We can apply this result successively through the summation in \(Z\).

We now deal with the case with many patients, that is, when the number \(n\) of patients goes to infinity. By the above results, we model the arrival times as iid uniform rvs \(\{U_k\}_{k=1}^{n}\) such that \(U_{k}\sim \Unif{[0,1)}\). Recall that in the simulation we sorted the arrival times of all patients together to obtain the ordered sequence of arrivals as seen by the hospital. In probabilistic terms, the hospital sees the order statistic of \(\{U_k\}_{i=1}^{n}\). Write this as \(0=A^{n}_{0} <A^{n}_1 < A^n_2 < \cdots < A_n^{n}< A_{n+1}^{n} = 1\).18 We neglect the probability of simultaneous arrivals; these have probability zero. Define the size of the inter-arrivals as seen by the hospital as \(X_{k}^{n} = A_k^n - A_{k-1}^{n}\), \(k=1, \ldots, n+1\).

As the population size \(n\) increases, the size of the first gap, i.e., the properly scaled arrival time of the first patient, becomes increasingly like an exponentially distributed random variable.19 We need to scale with a factor \(n\) because the number of patients increases.

Lem. 1.3.3
\begin{equation*} \lim_{n\to\infty} \P{A^n_1 \leq x/n} = \lim_{n\to\infty} \P{X^n_1 \leq x/n} =1- e^{-x}. \end{equation*}
Proof

The probability that the smallest of \(n\) rvs is less than some \(x\) is the same as \(1\) minus the probability that all \(n\) rvs are larger than \(x\). Therefore,

\begin{align*} \P{ A^n_1 \leq x/n} &= 1 - \P{\min\{U_{1}, \cdots, U_{n}\} > x/n} = 1 - (1-x/n)^{n}, \end{align*}

because \(\P{U_{i} > x/n} = 1-x/n\), and the \(U_{i}\) are iid. The RHS converges to \(1-e^{-x}\) as \(n\to \infty\). Finally, from the definition, \(X_1^n = A_1^n - A_0^{n} = A_1^{n}\).

Next, we show that all the hospital inter-arrival times have the same density.

Lem. 1.3.4

For \(r\in (0, 1)\), the density of the \(k\)th inter-arrival time is \(f_{X_{k}^n}(r) = n (1-r)^{n-1}\), for \(k=1, \ldots, n\).

Proof

From the proof of the previous lemma we have \(\P{X_1^n \leq r/n} = 1- (1-r)^{n}\), therefore \(f_{X_{1}^{n}}(r) \d r = \P{X_1^n \in [r, r + \d x]} = n (1-r)^{n-1} \d r\). By symmetry, \(X_1^n \sim X_{n+1}^{n}\). It remains to deal with the intermediate inter-arrivals.

The probability of the event \(A_1^n < A_2^n < \cdots < A_{k-2}^n < x\), \(A_{k-1}^n \in [x, x+ \d x], A_k^n \in [y, y+\d y]\), \(y < A_{k+1}^{n} < \cdots A_n^{n}\) is given by

\begin{align*} f_{A_{k-1}^n, A_k^n}(x, y) \d x \d y = \frac{n!}{(k-2)!(n-k)!} x^{k-2} (1-y)^{n-k} \d x \d y, \end{align*}

because \(k-2\) arrivals must occur before \(k\), \(n-k\) after \(y\) and one arrival in \([x, x+\d x]\) and one in \([y, y+\d y]\). With this, the probability that \(X_k^{n} \in [r, r+\d r]\) becomes

\begin{align*} f_{X_{k}^n}(r) \d r = \d r \int_{0}^{1-r} f_{A_{k-1}^n, A_k^n}(x, x+r) \d x, \end{align*}

because \(X_{k}^{n}= A_{k}^{n } -A_{k-1}^{n}= r \implies y = A_{k}^{n } = A_{k-1}^{n} + r = x +r\), and \(x\) can lie anywhere in \([0, 1-r]\).

Substituting in this integral the density \(f_{A_{k-1}^n, A_k^n}\), but dropping the factorials for the moment for notational ease, we find

\begin{align*} \int_{0}^{1-r} x^{k-2} (1-r-x)^{n-k} \d x &= (1-r)^{n-1}\int_0^{1-r} \left(\frac{x}{1-r}\right)^{k-2} \left( \frac{1-r - x}{1-r}\right)^{n-k} \d x\\ &= (1-r)^{n-1}\int_0^1 x^{k-2}(1-x)^{n-k} \d x \\ &= (1-r)^{n-1} \frac{(k-2)!(n-k)!}{(n-1)!}, \end{align*}

where we use the normalization constant of the Beta distribution to compute the last integral. Canceling the factorials proves the claim.

Combining the above two lemmas leads straightaway to the next theorem.

Th. 1.3.5

Let \(n\) jobs arrive at a station such that the arrival times are iid rvs \(U_{k}\sim \Unif{[0,1]}\). Then all inter-arrival times \(\{X_{k}^{n}\}_{k=1}^{n+1}\) as seen by the server (the hospital) are iid and \(\lim_{n\to\infty} \P{nX_{k}^{n}\leq x} = 1-e^{-x}\), i.e., exponentially distributed.

Proof

By Lem. 1.3.4, all rvs \(\{X_k^n\}_{k=1}^{n}\) have the same density \(n(1-r)^{n-1}\). By Lem. 1.3.3, \(\lim_{n\to\infty}\P{nX_k^n \leq x} = 1-e^{-x}\) for each \(k\).