5.2 Applications of the \(M(n)/M(n)/1\) queueing model
In this section we apply the tools developed above to a number of interesting queueing systems. First, we discuss how to use the \(M(n)/M(n)/1\) queue to organize a part of a production process. Second, we show how to plan the number of servers at a call center such that the average queue length remains small.
Riley is an operations manager at a factory that produces custom metal parts for various industries. Riley's factory has a single heat treatment furnace to temper the metal parts. Recently, Riley noticed an increase in customer orders, leading to long delays and dissatisfied clients.
- Should Riley sometimes block jobs to control the waiting time?
- Should Riley buy another oven to increase the capacity?
- If Riley bought a second oven, the utilization of both ovens may become quite low. However, as an oven is costly to heat up, it may be favorable to only switch on a second oven when queues are long.
To compare the effect of these different strategies, Riley hires us to model the operation of the oven using queueing theory. We consider the following metrics: average job waiting times, average number of jobs in the system, machine utilization, and job acceptance rate. For ease1 At a later stage we can make more complicated models. , we assume that jobs arrive as a Poisson process with a rate \(\lambda\) jobs per hour, and production times are \(\sim \Exp{\mu}\) with \(1/\mu = \E S\). We then implement the level-crossing relations in Python for the appropriate variations of the \(M(n)/M(n)/1\) queue.
The KPIs for the queueing models below can be computed with the same formulas; the models differ only in their level-crossing equations.
Thus, we can use an abstract base class, MnMn, to implement the common functionality.
The code below needs some explanation.
First, the implementation of the KPIs follows the discussion of Section 5.1.2
Think about the sequence in which we compute the KPIs.
Second, by putting @abstractmethod above a method, we indicate that derived classes have to provide the implementation of that method.
Third, the computation of \(p(n) = \P{L=n}\) is perhaps slightly subtle.
Start with \(p(0) = 1\), and then compute, recursively, \(p(n)\), for \(n=1, 2, \ldots\) until \(p(n) < \epsilon\), for some small \(\epsilon\).
This idea is based on the assumption that, for some \(\eta<1\) and \(N\), \(\lambda(n)/\mu(n+1) < \eta\) whenever \(n > N\), so that \(p(n)\) decreases geometrically for \(n>N\).
Fourth, for the utilization, we assume that the service rate is non-decreasing in the number of jobs \(n\) in the system 3
There is no practical reason to reduce service capacity when queue lengths become large.
and use Eq. (5.1.3) to compute the mean acceptance rate.
from abc import abstractmethod
from functools import cache
eps = 1e-5 # Threshold for convergence.
def normalize(v):
"Normalize the array v such that its elements sum to 1."
G = sum(v[n] for n in range(len(v))) # normalization constant
return [v[n] / G for n in range(len(v))] # normalize
class MnMn:
def __init__(self):
self.compute_p()
@abstractmethod
def acceptance_rate(self, n):
pass
@abstractmethod
def service_rate(self, n):
pass
def compute_p(self):
# Recursively compute probabilities until convergence.
p = [1] # p[0] = 1
n = 0
while p[-1] > eps:
p.append(p[n] * self.acceptance_rate(n) / self.service_rate(n + 1))
n += 1
self.p = normalize(p)
@cache
def mean_acceptance_rate(self):
return sum(
self.acceptance_rate(n) * self.p[n] for n in range(len(self.p))
)
def max_service_rate(self):
# Assumes service rate is non-decreasing in queue length.
return self.service_rate(len(self.p))
def ES(self):
return 1 / self.service_rate(1)
@cache
def EL(self):
return sum(self.p[n] * n for n in range(len(self.p)))
def EJ(self):
return self.EL() / self.mean_acceptance_rate()
def EW(self):
return self.EJ() - self.ES()
def EQ(self):
return self.mean_acceptance_rate() * self.EW()
def ELs(self):
return self.EL() - self.EQ()
def utilization(self):
return self.mean_acceptance_rate() / self.max_service_rate()
def loss(self):
return 1 - self.mean_acceptance_rate() / self.labda
def print_kpis(self):
print(f"EW: {self.EW():0.2f}")
print(f"EL: {self.EL():0.2f}")
print(f"acceptance rate: {self.mean_acceptance_rate():0.2f}")
print(f"utilization: {self.utilization():0.2f}")
print(f"loss: {self.loss():0.2f}")
It is now easy to construct various queueing models because we can derive much functionality from this MnMn class.
We start with the level-crossing equations for the \(M/M/c\) queue.
class MMC(MnMn):
def __init__(self, labda, mu, c):
self.labda = labda
self.mu = mu
self.c = c
super().__init__()
def acceptance_rate(self, n):
return self.labda
def service_rate(self, n):
return self.mu * min(n, self.c)
The \(M/M/1\) queue is the \(M/M/c\) queue with \(c=1\).
class MM1(MMC):
def __init__(self, labda, mu):
super().__init__(labda, mu, c=1)
For the \(M/M/c/K\) queue, we override the acceptance rate and the computation of p and implement a function to compute the fraction of lost jobs.
class MMCK(MMC):
def __init__(self, labda, mu, c, K):
self.K = K
super().__init__(labda, mu, c)
def acceptance_rate(self, n):
return self.labda * (n < self.K)
def compute_p(self):
p = [1] * (self.K + 1)
for n in range(1, self.K + 1):
p[n] = (
p[n - 1] * self.acceptance_rate(n - 1) / self.service_rate(n)
)
self.p = normalize(p)
We compute the KPIs for the relevant queueing models.
We start with the base case.
labda = 1.9
mu = 2
mm1 = MM1(labda, mu)
mm1.print_kpis()
EW: 9.50
EL: 19.00
acceptance rate: 1.90
utilization: 0.95
loss: 0.00
Next, consider several variations of how to organize the process. Riley first considers reducing the inflow of jobs. What happens if we allow at most \(K=10\) jobs in the system?
mmck = MMCK(labda, mu, c=1, K=10)
mmck.print_kpis()
EW: 2.04
EL: 4.49
acceptance rate: 1.77
utilization: 0.88
loss: 0.07
This is interesting. The expected waiting time reduces by a factor 4.5, which is considerable, while the fraction of loss is \(\approx 7\%\), which does not seem large.
As a check, we verify that the KPIs converge to those of the \(M/M/1\) queue when \(K\) is large. The results confirm this.
mmck = MMCK(labda, mu, c=1, K=1000)
mmck.print_kpis()
EW: 9.50
EL: 19.00
acceptance rate: 1.90
utilization: 0.95
loss: 0.00
Riley's second idea is to buy a second oven. For this, we use the \(M/M/2\) queue.
mmc = MMC(labda, mu, c=2)
mmc.print_kpis()
EW: 0.15
EL: 1.23
acceptance rate: 1.90
utilization: 0.47
loss: 0.00
Now, \(\E \W\) is almost negligible, but oven utilization becomes low. As it is costly to keep both ovens at production temperature, this latter result is undesirable.
Next, we analyze a case in which the second oven switches on only when the system has \(m\) or more jobs.
class MMC_controlled(MMC):
def __init__(self, labda, mu, m):
self.m = m
super().__init__(labda, mu, c=1)
def service_rate(self, n):
return self.mu * ((n < self.m) + 2 * (n >= self.m))
def utilization_first(self):
return 1 - self.p[0]
def utilization_second(self):
return 1 - sum(self.p[n] for n in range(self.m))
We can compare the usual KPIs, but now we should also include the fraction of time the second oven is busy. For the \(M/M/2\) queue, this fraction of time is \(\sum_{n=2}^{\infty} p(n)\). In the controlled \(M/M/2\), the second oven is used only when \(n\geq 6\), so the fraction is \(\sum_{n=6}^{\infty} p(n)\).
mmc_c = MMC_controlled(labda, mu, m=5)
mmc_c.print_kpis()
print(f"{mmc_c.utilization_first()=:0.2f}")
print(f"{mmc_c.utilization_second()=:0.2f}")
EW: 0.79
EL: 2.46
acceptance rate: 1.90
utilization: 0.47
loss: 0.00
mmc_c.utilization_first()=0.81
mmc_c.utilization_second()=0.14
Clearly, \(\E \W\) increases a bit, while the second oven does not have to assist often.
To complete the analysis, we should make graphs of how the KPIs depend on the control parameters \(K\) or the switching level \(m\). With the code above, this is not difficult anymore.4 As our educational goal (demonstrate how to build and use the models) is achieved; we leave it to Riley to make the graphs.
Let us next discuss an example of employee planning at a call center. Here we keep the models simple, but we include all necessary steps to solve a realistic planning problem.
We start with specifying the arrival process. It is reasonable to model it as a Poisson process, as there are many potential customers, each choosing with a small probability of calling at a certain moment in time. Thus, we only have to characterize the arrival rate. Estimating this for a call center is easy because all calls are registered.
It is common to use a demand profile that shows the average number of customers arriving per hour. Then we model the arrivals as a Poisson process with an arrival rate that is constant during a certain hour, as specified by the demand profile.
It is also easy to find the service distribution; for ease, we model the service time distribution as exponential with mean \(1.5\) minutes.
For the service objective we prefer to keep the average number of jobs in the system such that \(\E \L \leq 5\).5 Other KPIs are also possible to use.
Let us model the situation as an \(M/M/1\) queue with one server that works as fast as \(c\) single servers combined. Solving for \(c\) in the inequality \(\E \L \leq 5\) gives6 \(\E \L = \rho/(1-\rho) \implies \rho = \E \L/(\E \L + 1)\) and \(\rho=\lambda \E S /c\).
\begin{equation*} c \geq \frac{\E \L + 1}{\E \L} \lambda \E S = \frac 6 5 \lambda \E S \approx 0.03 \lambda, \end{equation*}where, with the above estimate, \(\E{S}=1.5/60\) hour. It is easy to apply this formula. For instance, we see in the demand profile Fig. 1 that \(\lambda= 120\) customers per hour between 12 and 13, hence we need \(c\approx 3.6 = 4\) employees for that period.
With this we can find a formula to convert the demand profile into a load profile to specify the minimal number of servers per hour needed to meet the service objective.
The last step in the planning process is to cover the load profile with service shifts. This is typically not easy since shifts have to satisfy all kinds of restrictions with respect to breaks and durations. Moreover, shifts can also have different costs: evening shifts are typically more expensive per hour.
The usual way to solve such covering problems is by means of an integer programming problem. For instance, suppose only 4 shift plans are available as shown at the right.7 1. \(++-++\) 2. \(+++-+\) 3. \(++-+++\) 4. \(+++-++\). Shift plans. A \(+\) indicates a working hour and \(-\) a break of an hour.
For the optimization, let \(x_i\) be the number of shifts of type \(i\) and \(c_i\) the cost of this type. Then the problem is to solve \(\min \sum_i c_i x_i\), such that for all hours \(t\) the shifts cover the load, that is, \(\sum_i x_i \1{t \in s_i} \geq 0.03 \lambda_t\). (We write \(t\in s_i\) if hour \(t\) is covered by shift type \(i\).)
Suppose \(\frac{\lambda}{\mu} = 2\). Claim: 3 is the minimum number of servers required to make the system stable.
Solution
Solution, for real
True.
Claim: It is always better to have a single fast server than multiple slow servers which together have the same rate.
Solution
Solution, for real
False. This depends on the objective. The service time at a slow server is longer. However, when multiple servers are present, if one server breaks down, service can still continue with the other servers, while when a single fast server fails, everything stops.
A repair/maintenance facility would like to determine how many employees should be working in its tool crib. The service time is exponential, with a mean of 4 minutes, and customers arrive as a Poisson process with a rate of 28 per hour. Claim: With one employee the system is not rate stable.
Solution
Solution, for real
True.
Consider an \(M/M/c/K\) queue that is not rate stable, but jobs sometimes balk. Claim: \(\lim\limits_{t\to\infty}\mathbb{P}(L(t)=K)=1\).
Solution
Solution, for real
False. Just after jobs balk, the number of jobs in the system is less than \(K\).
Claim: For the \(M/M/c\) queue, adding a second server always halves the expected waiting time compared to the \(M/M/1\) queue with the same arrival and service rates.
Solution
Solution, for real
False. Adding a second server reduces the waiting time, but not by exactly half. The reduction depends nonlinearly on \(\rho\) and is typically much more than a factor \(2\).
The next exercises require some modeling skills. Hence, they may be quite hard even though the formulas are simple.
In the town hall of the municipality, people can, for example, renew their driver's license or passport. For ease, assume that there is just one employee and that the arrivals of customers are not planned. Why is it reasonable to model this queueing system as a \(M/M/1\) queue?
Measurements show that \(\E \W = 6\) minutes and \(\lambda=5\) per minute. What are the service rate and utilization? Calculate \(\E{\QQ}\), \(\E \L\).
There should be a sufficient number of seats so that all waiting customers can sit down at least 90 percent of the time. How many seats should there minimally be? What would you advise the municipality?
Hint
\(\E{\QQ}\) follows right away from an application of Little's law. For the other quantities, we need to find \(\E S\). Use the expression for \(\E\W\) for the queue \(M/M/1\) to solve for \(\rho\). Then, since \(\lambda\) is known, \(\E S\) follows.
Solution
Solution, for real
As people arrive without making appointments, it is reasonable to say that the inter-arrival times are memoryless. As for the services, the employee deals with many different questions, some are short, some are long. So, we also model service times as exponential.
With Little's law:
labda = 5. # per minute
EW = 6.
EQ = labda * EW
EQ
We can use the expression for \(\E\W\) to solve for \(\rho\), but we can also do a simple search.
from scipy.optimize import bisect
EW = 6.
def find_W(rho):
# return W -1 for given rho
ES = rho / labda
return rho / (1 - rho) * ES - EW
rho = bisect(find_W, 0, 0.999)
rho
ES = rho / labda
ES
J = EW + ES
J
L = labda*J
L
Next, find \(n\) such that \(\sum_{j=1}^n p_j > 0.9\). We start counting at 1 because while the system contains one job, this customer is in service, hence stands at the desk of the employee.
n, p = 0, 1 - rho
total = p
while total <= 0.9:
p *= rho
total += p
n += 1
total
n - 1
Note that the while loop breaks when \(n\) is one too large.
The average queue is huge: hire an extra employee.
The smallest barber shop in town works alone and has two chairs for waiting customers. Six jobs arrive per hour, and the barber needs 12 minutes for a haircut. We model this, for ease, as an \(M/M/1/3\) queue. Estimate \(\E{\QQ}\) and the average number of customers served per hour. Then, estimate \(\E{\QQ}\) when the barber hires another location that allows for 3 extra chairs. Quantify the effect on the number of customers served per hour.
Solution
Solution, for real
This is an \(M/M/1/3\) queue: there is room for 1 customer in service and two in queue.
We import numpy and convert the lists to arrays to fix the output precision to 3, otherwise we get long floats in the output.
First the case with \(b=2\).
import numpy as np
np.set_printoptions(precision=3)
labda, mu, c = 6.0, 5.0, 1
rho = labda / mu
K = c + 2
p = np.array([rho ** n for n in range(K + 1)]) # range(n) is up to n
G = sum(p)
p /= G # normalize
p
L = sum(n * p[n] for n in range(len(p)))
L
Q = sum(max(n - c,0) * p[n] for n in range(len(p)))
Q
lost = labda * p[-1] # the last element of p
labda - lost # accepted, hence served
Now with three extra chairs.
K += 3
K
p = np.array([rho ** n for n in range(K + 1)]) # range(n) is up to n
G = sum(p)
p /= G # normalize
lost = labda * p[-1] # the last element of P
labda - lost # accepted, hence served
We see that since the server is overloaded, the acceptance is not much affected by increasing the number of chairs. We need an extra server.
Occasionally there is a pizza stand at the corner of the Verlengde Visserstraat and the Westersingel in Groningen.
As people (tend to) behave like sheep, the rate at which people join the queue depends on the actual queue length. When there is nobody waiting, people are somewhat hesitant to order: `When there are no people waiting, most likely the pizzas are not good.'. However, when the queue is long, people prefer to go elsewhere.
Measurements show that the number of arrivals per hour is well captured by the array
labda = np.array([10.0, 10.0, 20.0, 20.0, 5.0]).
The pizza oven can contain two pizzas and it takes 2 minutes to bake a pizza. We model the system as an \(M/M/2\) queue.
Calculate the steady-state probabilities and from this the average arrival rate. Then, find \(\E \L\), \(\E{\QQ}\), \(\E \J\) and \(\E{\W}\).
Solution
Solution, for real
The computations.
import numpy as np
np.set_printoptions(precision=3)
labda = np.array([10.0, 10.0, 20.0, 20.0, 5.0])
mu = np.array([0., 30., 60., 60., 60., 60.])
c = 2
p = np.ones_like(mu)
for i in range(len(labda)):
p[i+1] = labda[i] *p[i]/ mu[i+1]
p /= p.sum()
p
labdaBar = sum(labda[n] * p[n] for n in range(len(labda)))
labdaBar
L = sum(n * p[n] for n in range(len(p)))
L
Q = sum(max(n - c,0) * p[n] for n in range(len(p)))
Q
J = L / labdaBar
J
W = Q / labdaBar
W
Fast food restaurants tend to fry patties before customers arrive. This way, they can serve customers faster, because when a customer arrives while there is still an inventory of prepared patties, the server only has to flip a patty in a bun, while otherwise the customer has to wait for the additional frying time of the patty. Assume the restaurant keeps at most 3 patties on stock. To keep the model simple, assume the restaurant has just one pan to fry a patty and the frying time \(S\sim \Exp{\mu}\) with \(\mu = 15\) per hour. Customers arrive as a Poisson process with rate \(\lambda=12\) per hour. When there is a prepared patty upon arrival, the customer is served immediately, otherwise the customer has to wait for a patty.
Explain how to model this system as an \(M/M/1\) system. Then compute the average number of patties on stock, the average number of people waiting, and the average waiting time of a patty before it ends up in a bun, and the average time a person has to wait for a hamburger.
You should remember this problem. It is of fundamental importance in inventory control and known as an inventory system with backlogging; it is (implicitly) used by nearly any company that keeps inventory.
Hint
Let \(T\) be the number of patties waiting and \(G\) the number of persons waiting. There cannot be a patty and a person waiting at the same time, hence \(T\cdot G = 0\). Take \(L\) as the number of jobs in an \(M/M/1\) queue. If \(L=0\), then there are 3 patties on stock, when \(L=1\) there is one patty less \(T=2, G=0\); when \(L=2\), we have two patties less, so \(T=1, G=0\), when \(L=3\), there are no patties on stock, but also no people waiting, hence \(T=0, G=0\). When \(L=4\), we ran out of patties and one person must be waiting, hence \(T=0, G=1\).
With this reasoning we see a pattern: at all moments in time, \(3=T+L - G\). Now map this system to an \(M/M/1\) queue.
Solution
Solution, for real
The rv \(L\) behaves like the number of jobs in an \(M/M/1\) queue. If we have \(L\), then with the two properties \(T G = 0\) and \(3 = T +L - G\), we can retrieve (uniquely at all moments in time) the number of patties \(T\) on stock and the number of people waiting \(G\). In inventory terms, \(G\) represents the number of backlogged patties.
With this mapping, the expected number of patties on stock is \(\E T = \sum_{l=0}^3 (3-l)p(l)\).
labda = 12.0 # per hour
mu = 15.0 # per hour
rho = labda / mu
max_patties = 3
ET, p = 0, 1 - rho
for l in range(max_patties):
ET += (max_patties - l) * p
p *= rho
ET
For the expected number of people waiting, use that \(3=T+L-G\) always. Taking expectations, we see that \(3 = \E T + \E \L - \E G\), where \(\E G\) is the expected number of people.
EL = rho/(1-rho)
EG = ET + EL - max_patties
EG
Computing the waiting times is tricky. For the patties, the rate at which `jobs' arrive, is the arrival rate of people. For the people, the rate at which `jobs' arrive is the arrival rate of patties.
ET/labda # Waiting for patties
EG/mu # waiting time for people
What would be the impact of allowing 4 patties maximally?
[Continuation of Ex 5.2.9] What would be the impact of allowing four patties on stock?
Hint
How is the invariant \(3 = T + L -G\) affected?
Solution
Solution, for real
The invariant \(3 = T + L -G\) becomes \(4 = T + L -G\).
Estimating the impact of allowing maximally four patties is simple with the code we already have.
max_patties = 4
ET, p = 0, 1 - rho
for l in range(max_patties):
ET += (max_patties - l) * p
p *= rho
ET
EG = ET + EL - max_patties
EG
[Continuation of Ex 5.2.9] Suppose that the restaurant does not keep patties in stock. What is the expected waiting time for the customer?
Solution
Solution, for real
The invariant \(3 = T + L -G\) becomes \(0 = T + L -G\). This implies that \(T=0\) and \(L=G\). Hence \(\E G = \E \L\).