Stochastic OR

4.3 Renewal Reward Theorem

In a one-dollar store, each item costs \(\$ 1\). If every minute a customer arrives and each customer buys exactly \(10\) items, this shop earns money at a rate of \(\$10\) per minute. In another one-dollar store, customers spend variable amounts, but on average \(\$10\) per customer, and customers come in irregularly, but the average inter-arrival time between two customers is \(1\) minute. The renewal reward theorem says that both shops have the same earning rate, even though the second shop perceives variability while the first does not. In this section, we prove this extremely useful theorem and show how to apply it to a famous inventory system.

We are given a sequence of strictly increasing epochs \(0=T_0 < T_1 < T_2 < \cdots\) such that \(T_k\to\infty\) as \(k\to\infty\), and a non-decreasing right-continuous (deterministic) function \(t\to R(t)\) with \(R(0)=0\). Let \(N(t) = \sum_{k=1}^{\infty} \1{T_k\leq t}\) count the number of epochs that occurred during \((0, t]\), and \(Z_k = R(T_k)-R(T_{k-1})\) be the increment of \(R(t)\) during \((T_{k-1}, T_{k}]\).1 Here \(Z_k\) is not necessarily an inter-arrival time between two jobs. Fig. 1 shows graphically how all concepts relate.

Th. 4.3.1

Assume that the limit \(\alpha=\lim_{t\to\infty}N(t)/t\) exists with \(0<\alpha<\infty\). Then, the limit \(R=\lim_{t\to\infty}R(t)/t\) exists iff \(Z = \lim_{n\to\infty} n^{-1}\sum_{k=1}^{n}Z_{k}\) exists, in which case \(R=\alpha Z\).

Proof

As \(T_{N(t)} \leq t < T_{N(t)+1}\) and \(R(\cdot)\) is non-decreasing,

\begin{equation*} R(T_{N(t)}) \leq R(t) \leq R(T_{N(t)+1}). \end{equation*}

Hence

\begin{align*} \frac{N(t)}{t} \frac{R(T_{N(t)})}{N(t)} &\leq \frac{R(t)} t \leq \frac{R(T_{N(t)+1})}{N(t)+1} \frac{N(t)+1}{t}. \end{align*}

Since \(R(T_{N(t)}) = \sum_{k=1}^{N(t)} Z_{k}\) and \(N(t)/t \to \alpha\), we conclude that \(R(t)/t \to R\) iff \((N(t))^{-1}\sum_{k=1}^{N(t)} Z_{k} \to Z\). By the same arguments as in Th. 4.1.1, if one of these limits exists, so does the other, completing the proof.

renewal_reward.png
Figure 1: The left panel provides a graphical `proof' of the renewal reward equation \(R=\alpha Z\). The right panel shows a sample path of an inventory system with constant production capacity and a fixed demand rate. Since the inventory operates under an \((s,S)\) policy, it moves from cycle to cycle.

The renewal reward theorem finds an easy application in the analysis of the Economic Production Quantity (EPQ) model. This is an inventory system in which demand arrives at a fixed rate \(d\) and a machine produces items, when on, at a constant rate \(r\geq d\).2 We need \(r\geq d\) for stability. The inventory is controlled by an \((s,S)\) policy, the lead time \(l=0\), the holding and backlogging cost are \(h\) and \(b\) per unit per time, and it costs \(K\) to switch on the machine. Fig. 1 shows a sample path of the inventory level. We use the renewal reward theorem (noting that the cumulative cost is non-decreasing) to find the policy parameters3 That is, the optimal \(s\) and \(S\). that minimize the long-run time-average cost. Clearly, the inventory level repeats over time, so it suffices to study just one cycle that starts and stops when the inventory level hits \(S\). From Fig. 1 it is simple to see that, respectively,

\begin{align*} t_1 &:= \frac{S-s}{d}, & t_2 &:= \frac{S-s}{r-d}, & T := t_1+t_{2} = \frac{S-s}{d}\frac{r}{r-d}, \end{align*}

are the time to empty the inventory from level \(S\) to \(s\) (after which production switches on), the time needed to replenish the inventory from level \(s\) to \(S\) (after which production switches off), and the cycle time.

Assume next that \(s\leq 0 < S\).4 Ex 4.3.8 asks you to generalize this assumption. Writing \(T_+\) (\(T_-\)) for the amount of time that the inventory is positive (negative) during a cycle, it follows that

\begin{align*} T_{+} &= \frac{S}{d} +\frac{S}{r-d}, & T_{-} &= \frac{-s}{d} + \frac{-s}{r-d}, & T&=T_{+} + T_{-}, \end{align*}

as \(s\leq 0\). Therefore,5 This also follows from the geometry of the triangles in the right panel of Fig. 1.

\begin{align*} \frac{T_+}T &= \frac{S}{S-s}, & \frac{T_-}T &= \frac{-s}{S-s}. \end{align*}

Since the holding and backlogging cost are proportional to the areas of the corresponding triangles, the average cost of one cycle becomes

\begin{align*} g &=\frac{K}{T} + b\frac{-s}{2} \frac{T_-} T + h\frac{S}{2} \frac{T_+}{T}. \end{align*}

Substituting the expressions for \(T_{-}, T_{+}\), and \(T\) in terms of \(s\) and \(S\), taking the partial derivatives with respect to \(s\) and \(S\) and solving for the optimal values by setting \(\partial g/\partial s = \partial g/\partial S = 0\) gives6 Interestingly, when \(r=d\), the cost \(g^{*}\) equals \(0\). Why is that so?

\begin{align*} g^{*} &= \sqrt{2Kd} \sqrt{\frac{hb}{h+b}}\sqrt{1-\frac d r}, & s^{*} &= -\frac{g^{*}}b, & S^{*} = \frac{g^{*}}{h}. \tag{4.3.1} \end{align*}

Take the limits \(r\to \infty\) and \(b\to\infty\) to obtain the Economic Order Quantity (EOQ) formula.7 See also Ex 5.8.7.

\begin{align*} g^{*}_{EOQ} &= \sqrt{2Kdh} & s^{*} &= 0, & Q^{*} = \sqrt{\frac{2Kd}{h}}, \end{align*}

where we write \(Q^{*}\) instead of \(S^{*}\) to indicate that we order in fixed quantities. Observe that the EOQ policy is simultaneously an \((s,S)\) policy, \((Q,r)\) policy, and \((T,S)\) policy.8 Recall the policies of Section 2.5.

An interesting extension is a system with constraints on cycle length and order size.9 Can you find an optimal policy?

TF 4.3.1

Consider a machine that fails regularly and needs to be repaired. Assuming that repairs take a negligible amount of time, the time between machine failures has density \(f_X\), and each repair costs \(c\). Claim: The renewal reward theorem tells us that the cost rate equals \(\frac{c}{\int_{0}^{\infty} x f(x) \d x}\).

Solution
Did you actually try? Maybe see the 'hints' above!:
Solution, for real

True. The cost increase per failure is \(c\). Thus, this \(c\) takes the role of \(Z\) in the renewal reward theorem. The time between the failures is \(\alpha= 1/\E X\), where \(X\) is the inter-arrival time between two failures. Therefore, \(R = c \alpha = c/ \E X\).

TF 4.3.2

For a \(G/G/1\) queue, write \(\E \U\) for the expected busy time and \(\E I\) for the expected idle time. Claim:

\begin{equation*} \rho = \E \U / (\E \U +\E I). \end{equation*}
Solution
Did you actually try? Maybe see the 'hints' above!:
Solution, for real

True. See Ex 4.3.7.

TF 4.3.3

An \((s,S)\) policy is a restocking policy where \(s\) denotes the size of replenishment orders and \(S\) represents the speed at which items are produced.

Solution
Did you actually try? Maybe see the 'hints' above!:
Solution, for real

False. In an \((s,S)\) policy, \(s\) denotes the reorder point and \(S\) the order-up-to level.

TF 4.3.4

Claim: In the renewal reward theorem \(R = \alpha Z\), the quantity \(\alpha\) must be the arrival rate \(\lambda\).

Solution
Did you actually try? Maybe see the 'hints' above!:
Solution, for real

False. The quantity \(\alpha\) is the rate at which the epochs \(\{T_k\}\) occur, which can be any renewal process, not necessarily the arrival process.

Ex 4.3.5

Use the renewal reward theorem to relate the load \(\rho=\lambda \E S\) to the limiting fraction of time the server of a stable \(G/G/1\) queue is busy.

Hint

Observe that \(R(t) = \int_0^t \1{\L(s)>0} \d s\) is the total time the server is busy during \([0,t]\), hence \(R(t)/t\) is the fraction of time busy. Then take \(T_k = D_k\) as the epochs at which to inspect \(R(t)\), and realize that \(R(D_{k}) = \sum_{i=1}^{k} S_{i}\).

Solution
Did you actually try? Maybe see the 'hints' above!:
Solution, for real

As \(k/T_{k} \to \delta\), it follows from the renewal reward theorem that \(R(t)/t \to \delta \E S\). By rate stability, \(\delta = \lambda\), so that \(\rho=\lambda \E S = \delta \E S = R\), from which the claim follows.

Ex 4.3.6

We can derive the relation between utilization and the fraction of busy time of the server of a \(G/G/1\) queue by considering the inequalities

\begin{equation*} \sum_{k=1}^{A(t)} S_k \geq \int_0^t \1{\L(s)>0} \d s \geq \sum_{k=1}^{D(t)} S_k. \tag{4.3.2} \end{equation*}

Explain the inequalities and show that, by taking appropriate limits, we obtain \(\lambda \E S \geq \rho \geq \delta \E S\).

Hint

For the direction of the inequalities Eq. (4.3.2), observe that \(t\) can lie halfway between a service interval, and \(A(t) \geq D(t)\). Divide by \(t\) in Eq. (4.3.2). Then for the LHS, multiply by \(A(t)/A(t)\) and take appropriate limits. Similar for the RHS, but multiply by \(D(t)/D(t)\).

Solution
Did you actually try? Maybe see the 'hints' above!:
Solution, for real

For the LHS, as \(A(t)\to \infty\) as \(t\to\infty\),

\begin{equation*} \lim_{t\to\infty} \frac 1 t\sum_{k=1}^{A(t)} S_k = \lim_{t\to\infty} \frac{A(t)}t \frac{1}{A(t)} \sum_{k=1}^{A(t)} S_k = \lim_{t\to\infty} \frac{A(t)}t \cdot \lim_{t\to\infty}\frac{1}{A(t)} \sum_{k=1}^{A(t)} S_k = \lambda \E S. \end{equation*}

Apply similar limits to the RHS. The limit of the middle term gives the fraction of time the server is busy.

Ex 4.3.7

Show for the \(G/G/1\) queue that \(\rho = \E \U / (\E \U + \E I)\) where \(\E \U\) is the expected busy time and \(\E I\) the expected idle time.

Hint

Take the process \(R(\cdot)\) such that the system earns money when the server is busy, and sample at moments when busy times start.

Solution
Did you actually try? Maybe see the 'hints' above!:
Solution, for real

Take \(R(t) = \int_0^t \1{\Ls(s)=1} \d s\), so that \(R(t)/t \to \rho\), i.e., the fraction of time there is someone on the server. (We could also take \(R(t) = \int_{0}^{t} \1{L(s) > 0}\d s\), because \(L_{s}(t) = 1 \iff L(t) > 0\).) Writing \(U_{k}\) for the \(k\)th busy time and \(I_{k}\) for the \(k\)th idle time, we sample \(R(\cdot)\) at moments \(T_{k} = \sum_{j=1}^{k} (I_j + U_{j})\), that is, we check \(R(\cdot)\) at the end of each busy period. Then, by the definition of \(R(\cdot)\), its increase between two sample moments is \(Z_{k} := R(T_{k}) - R\rb{T_{k-1}} = U_{k}\). The rate at which we sample the system is \(\alpha \approx N(t)/t \approx k/T_{k} \to 1/(\E I + \E \U)\) as \(k \to \infty\) (by the strong law). And thus, with the renewal reward theorem \(\rho = R = \alpha Z = \E \U /(\E I +\E \U)\).

Ex 4.3.8

For the EPQ model, take \(T\) to be the cycle time. Use elementary geometry to show that the average cost \(g=K/T + h(s+S)/2\) when \(0\leq s < S\), and \(g=K/T - b(s+S)/2\) when \(s<S \leq 0\).

Solution
Did you actually try? Maybe see the 'hints' above!:
Solution, for real

Suppose \(0\leq s < S\). The average holding cost for the rectangle with base \(T\) and height \(s\) is \(h T s\), i.e., \(h\) times the area of the rectangle. The cost of the triangular area with base \(T\) and height \(S-s\) is \(h T(S-s)/2\). Thus, the total cost of a cycle is \(K+ h T(S+s)/2\). Dividing by \(T\) gives the time-average cost.

The other case follows by the same reasoning.