Stochastic OR

5.10 Hitting problems with applications in single-item inventory control

In a single-item inventory under period review and controlled by an \((s,S)\) policy, every time the inventory position becomes \(s\) or lower, a replenishment up to state \(S\) is placed at cost \(K\).1 If you don't like challenges, you can skip this section. When the position is at \(x > s\), the inventory cost is \(L(x)\), see Eq. (5.9.1). Typically, we are interested in the long-run average cost when the position starts at \(S\) and stops when getting at or below \(s\). By the renewal reward theorem, it suffices to compute the expected cost \(v(s,S)\) and expected time \(T(s,S)\) of one cycle that starts at \(S\) and stops at \(x\leq s\). The long-run average cost is then given by \(v(s,S)/T(s,S)\).

Both \(v(s,S)\) and \(T(s,S)\) can be found as the solution of a hitting problem. The stopping set is \(x \leq s\). When in state \(x>s\), the continuation (or running) cost for \(v\) is, as said, \(L(x)\), and \(1\) for \(T\), because it costs 1 period to stay 1 period in state \(x\). The stopping cost for \(v\) is the ordering cost \(K\), and it is \(0\) for \(T\), because once stopped, time no longer progresses.

Below we discuss a general algorithm to numerically solve such hitting problems.

A bit of theory helps to set up this algorithm. We start with a discrete Markov chain \(X = \{X_k, k \geq 0\}\) with a reasonably small state space, e.g., at most several thousand states.2 Otherwise we should resort to other numerical methods. The state space is split into two disjoint sets \(C\) and \(D\). When \(X_k \in C\), the process continues, while if \(X_k \in D\), it stops. Define the stopping time \(\tau = \inf\set{ k : X_k \in D}\), and assume that the transition probabilities \(P(x,y)\) of \(X\) are such that the chain gets absorbed in \(D\) in finite expected time, that is, \(\EE{x}{\tau} < \infty\) for all possible initial states \(x\). We write \(\tilde L(x) = L(x) \1{x\in C} + K \1{x\in D}\) for the cost associated with spending a period or stopping in state \(x\). The interesting question is to ask: What is the expected cost until the process stops?3 For suitable conditions on \(c\), see for instance, Dynkin and Yushkevich 1968.

Focus first on some state \(y\), then, when starting in \(x\), \(\EE{x}{\sum_{t=0}^{\tau} \1{X_{t} = y} \tilde L(y)}\) is the expected cost for spending time in, or hitting, state \(y\). With this, we write the total cost until stopping as

\begin{align*} v(x) = \sum_{y} \EE{x}{\sum_{t=0}^{\tau} \1{X_{t} = y} \tilde L(y)}. \end{align*}

By splitting the expectation and taking the summation over \(t\) out of the expectation we obtain a very useful expression for \(v\):

\begin{align*} v(x) &= \sum_{y} g(x, y) \tilde L(y), \end{align*}

where \(g\), known as Green's kernel, is given by

\begin{align*} g(x, y) = \sum_{t=0}^{\infty} \EE{x}{\1{X_{t} = y, t \leq \tau}}. \end{align*}

It represents the expected number of visits to \(y\) until absorption when the chain starts at \(x\). In terms of probabilities, \(g\) can be written as4 Use that \(\E{\1{A}} = \P A\) for an indicator event \(A\).

\begin{align*} g(x, y) &= \sum_{t=0}^{\infty} \PP{x}{X_{t} = y, t \leq \tau}. \tag{5.10.1} \end{align*}

Finally, if \(X\) can start at some state \(x\) with probability \(\mu_x\), the expected number of visits to \(y\) becomes

\begin{align*} g(y) = \sum_x \mu_x g(x,y). \tag{5.10.2} \end{align*}

The following seemingly simple, and certainly short, algorithm computes \(g\) as defined in .

Python
from collections import defaultdict


def green(mu, P, is_stop, eps):
    g = defaultdict(float)
    while mu:
        for x in list(mu.keys()):
            g[x] += (p := mu[x])

            mu[x] = 0
            if is_stop(x):
                continue

            for y, pxy in P(x):
                mu[y] += p * pxy
        mu = defaultdict(float, {x: p for x, p in mu.items() if p > eps})
    return g

Before explaining how this works, we discuss this function's arguments. The dictionary mu maps a state \(x\) to the probability \(\mu_{x}\) that the chain starts in \(x\). For instance, when the process starts (almost) surely at \(X_0 = x\), then mu = {x: 1}. Next, P is a function that maps a state \(x\) to a list of pairs \((y, p_{x,y})\) where \(y\) is a state to which \(X\) can move from \(x\) and \(p_{x, y}\) is the probability that the transition from \(x\) to \(y\) happens. The function is_stop returns True when the chain should stop (is absorbed) at \(x\), and False otherwise. Finally, eps is a small number that prevents keeping states that have negligible probability of being accessed.

So, intuitively speaking, how does this algorithm work? As long as not all probability mass has been absorbed5 That is, the dictionary mu is not yet empty. , a for loop cycles over all states \(x\) seen so far. The dictionary g counts the expected number of visits to some state \(x\) and is updated in every round of the loop. We need to fix the range of the for loop by calling list(mu.keys()) because the keys of mu can change in the body of the loop. Next, mu[x] is the probability that \(x\) has been visited since the last pass of the for loop. In accordance with we add this probability to the expected total number of visits to g[x] and assign it to p for ease.

We set mu[x] = 0 for two reasons. When the chain is absorbed (stopped) at \(x\), we continue with the next state to be checked in the for loop. In this case, the total probability mass should decrease by the absorption probability \(p\), which is indeed achieved by setting mu[x] = 0. If the chain does not stop at \(x\), the probability \(p\) should be distributed over all states \(y\) to which \(X\) can transition. If \(y \neq x\), then mu[y] += p * pxy, because \(y\) could have been accessed from other states too. When \(y=x\), the probability of staying in \(x\) is \(p p_{xy}\).6 Recall that the expected number of consecutive periods to stay in state \(x\) is \(\sim \Geo{p_x}\). Since we earlier set mu[x] = 0, we can also write mu[x] += p * pxy for state \(x\). In other words, we can now use += for all states, instead of having to single out the state \(x\) with an if statement.

Once the for loop completes, we prune in the last line all states that have negligible probability of having been accessed.

There remains, however, a fundamental question: why is it allowed to change mu in the for loop itself, and does it really converge to the Green kernel we discussed above?

To really understand why the green function works, we need to discuss some further ideas of discrete Markov chains. Recall the interpretation of the Green kernel \(g(x,y)\) as the expected number of times \(y\) will be visited up to absorption and when starting in \(X_{0}=x\). Clearly, using the Markov property, \(g\) should satisfy the recursive expression

\begin{align*} g(x,y) &= \EE{x}{\1{X_{0}=y}} + \EE{x}{g(X_{1}, y)} \\ &\stackrel{1}= \1{x=y} + \sum_{z}\EE{x}{\1{X_{1}=z}} g(z, y) \\ &\stackrel{2}= \1{x=y} + \sum_{z} P(x, z) g(z, y); \end{align*}

in 1 we use that \(g(z, y)\) is a constant, hence can be taken out of the expectation, in 2 that \(\EE{x}{\1{X_1=z}} = P(x,z)\). Casting this in matrix format makes this structure clearer:

\begin{align*} G = I + P G. \tag{5.10.3} \end{align*}

Then, by using this recursive structure, and noting that \(X\) will be absorbed in some set \(D\) in expected finite time and \(P\) is finite so that \(P^t\) converges geometrically fast to \(0\) elementwise as \(t\to\infty\),

\begin{align*} G = I + P G = I + P(I + P G) = I + P + P^{2} G = \sum_{t=0}^{n} P^{t} + P^{n+1}G \to \sum_{t=0}^{\infty} P^{t}. \end{align*}

And, if we define recursively \(\mu^{t} = \mu^{t-1} P = \mu P^{t}\), so that \(\mu_{y}^{t} = \sum_{x} \mu_{x} P^{t}(x,y)\),

\begin{align*} g(y) = \sum_{x} \mu_{x} \sum_{t=0}^{\infty} P^{t}(x, y) =\sum_{t=0}^{\infty} \mu_{y}^{t}. \end{align*}

The advantage of this algorithm over solving \(G = I + PG\) with matrix methods is as follows. The latter is a matrix equation for which we need to specify the size of \(G\). However, the states that have a reasonable probability of being visited are not yet known. Thus, solving \(G=I+PG\) may include many more or less superfluous states, which is numerically wasteful.

Let us develop a few intermediate algorithms that apply the above ideas to compute \(g(y)\). Note that mu is the starting distribution of \(X_0\). Skipping the standard imports, the key part updates \(g\) and \(\mu\) as \(g^{k+1} = g^{k} + \mu^{k}\), and \(\mu^{k+1} \leftarrow \mu^{k} P\). Thus, after \(k\) iterations, g equals \(\sum_{t=0}^{k-1} \mu^{t}\) and mu equals \(\mu^k\). In the update of mu we prune on too small events. As \(\mu^k\) becomes smaller and smaller as \(k\) increases, eventually, all elements of mu become less than eps, and the while loop terminates.

Python
g = defaultdict(float)
while mu:
    for x in mu: # update g
        g[x] += mu[x]

    new_mu = defaultdict(float)
    for x in mu:  # update mu
        if is_stop(x):
            continue # It seems cryptic to continue when stopping ...
        for y, pxy in P(x):
            new_mu[y] += mu[x] * pxy

    mu = {x: p for x, p in new_mu.items() if p > eps}

We can combine the updating of g and new_mu in one for loop because these dicts do not affect each other in the for loop. Observe that the assignment p := mu[x] makes the code faster since we prevent intermediate lookups.

Python
while mu:
    new_mu = defaultdict(float)
    for x in mu:
        g[x] += (p := mu[x])

        if is_stop(x):
            continue
        for y, pxy in P(x):
            new_mu[y] += p * pxy

    mu = {x: p for x, p in new_mu.items() if p > eps}

The next phase in the analysis of green is to see why it is allowed to update mu within the for loop, that is, why green gives the same value for \(g(y)\) as the above algorithms. Let us repeat here the core part of green.

Python
while mu:
    for x in list(mu.keys()):
        g[x] += (p := mu[x])

        mu[x] = 0
        if is_stop(x):
            continue

        for y, pxy in P(x):
            mu[y] += p * pxy

The difference with the previous algorithms is this. In the while loop of the previous algorithms, we update \(g^{k+1} = g^{k} + \mu^{k}\) first, and then we update \(\mu^{k+1} = \mu^{k} P\). In other words, the entire vector \(\mu^{k+1}\) is computed before adding it to \(g^{k}\) in the next pass of the while loop. However, in green, we update mu and g within the same iteration of the for loop. Note where we set mu[x] = 0: it should be placed before the stop condition, because when stopping, the chain is absorbed.

What actually happens exactly in green?

Th. 5.10.1

Let \(P\) be a matrix such that \(G(I - P) = I\) has a solution \(G\), compare Eq. (5.10.3). If \(g^0=0\) and \(\mu^0 = \mu\), then \(g^k \to \mu G\) when the update rules of green

\begin{align*} p &= \mu_{x}^{k}, & g^{k+1} &= g^k + p \delta_x, & \mu^{k+1} &= \mu^k - p \delta_x + p \delta_x P, \end{align*}

are applied sequentially for all \(x\) in the support of \(\mu^k\).

Proof

We first show that the quantity \(H^{k} = g^{k} + \mu^{k}G\) remains the same for all \(k\geq 0\) under the update rules. Using substitution of the update rules,

\begin{align*} H^{k+1} &=g^{k+1} + \mu^{k+1} G &= g^k + p \delta_x + \bigl(\mu^k - p \delta_x + p \delta_x P\bigr) G \\ &= g^k + p \delta_x + \mu^k G - p \delta_x (I - P) G \\ &= g^k + \mu^k G = H^{k}, \end{align*}

because \((I - P)G = I\), cf. Eq. (5.10.3). Then, by recursion, \(g^k + \mu^k G = g^0 + \mu^0 G = \mu G\) since \(g^0 = 0\) and \(\mu^0=\mu\). Since each acccessible state that is not absorbing is visited infinitely often and the chain as a whole is absorbing, \(\mu^k \to 0\), hence \(g^k \to \mu G\).

Concluding, the iterations in the while and for loop over g and mu in green lead to precisely the same Green kernel as in the other algorithms.

We have some final remarks.

  • The advantage of green is that we do not have to make a guess of the states that will be visited by \(X\) with reasonable probability. When using matrix methods to compute \(G\) such that \(G(I-P) = I\), we need to expand \(P\) to many states that are irrelevant (up to eps).
  • In inventory control, the update over \(P\) is often a fixed list. In such cases, it is quite a bit faster not to pass \(P\), but to write something like for i, pi in D_items: mu[x - i] += p * pi.
  • The difference between the, say, regular approaches and green can be compared to a Jacobi versus Gauss-Seidel updating scheme. In my experiments with inventory control, the green algorithm converges in about half the time of the other schemes.

Let us demonstrate the green algorithm for the \((s,S)\) inventory control algorithm. The parameters are provided in lighthouse_case.py.

Python
"""Lighthouse case: model parameters."""

from functools import cache

from functions import Min, Plus
from random_variable import NumericRV as RV

leadtime = 2  # lead time
h = 40 * 0.5 / 30  # daily holding cost
b = 100 * 0.2  # daily backlog cost
K = 50  # Order cost, used for sS and Qr
N = 100  # simulation duration

if N <= leadtime:
    print(
        f"The simulation duration {N=} is shorter than the lead time {leadtime=}."
    )
    quit()

# demand for one period
D = RV({0: 1 / 6, 1: 1 / 5, 2: 1 / 4, 3: 1 / 8, 4: 11 / 120, 5: 1 / 6})

# demand during leadtime
X = sum(D for _ in range(leadtime)) if leadtime > 0 else RV({0: 1})


def c(x):
    return b * Min(x) + h * Plus(x)


@cache
def L(x):
    return X.E(lambda i: c(x - i))

With these data, the rest follows from the above. Note that we replaced the call to P by the list D in the arguments of green as this runs somewhat faster.

Python
from collections import defaultdict
from functools import cache

from pytictoc import TicToc
from random_variable import NumericRV as RV

from lighthouse_case import D, K, L


def green(mu, D, is_stop, eps):
    g = defaultdict(float)
    while mu:
        for x in list(mu.keys()):
            g[x] += (p := mu[x])
            mu[x] = 0

            if is_stop(x):
                continue

            for i, pi in D:
                mu[x - i] += p * pi
        mu = defaultdict(float, {x: p for x, p in mu.items() if p > eps})
    return g


def sS(s, S, eps=1e-12):
    mu = defaultdict(float)
    mu[S] = 1.0
    is_stop = lambda x: x <= s
    D_items = list(D.items())
    # P = lambda x: ((x - y, py) for y, py in D_items) # example P
    visits = green(mu, D_items, is_stop, eps=eps)

    t = sum(p for x, p in visits.items() if not is_stop(x))
    cost = sum(p * L(x) for x, p in visits.items() if not is_stop(x))
    cost += K * sum(p for x, p in visits.items() if is_stop(x))
    return cost / t


S_star, s_star = 20, 16
print(f"{sS(s_star, S_star)=:.4f}")
Python
sS(s_star, S_star)=31.5101

In a second example, we consider an inventory system that accepts returns. These returns can be seen as negative demand. In that case, the regular algorithm to compute the average cost under an \((s,S)\) policy no longer works, but green has no problems with such cases.

Python
from functools import cache
from lighthouse_case import c, leadtime, D


R = RV({0: 1 / 3, 1: 1 / 3, 2: 1 / 4, 3: 1 / 8})
D -= R
X = sum(D for _ in range(leadtime)) if leadtime > 0 else RV({0: 1})


@cache
def L(x):
    return X.E(lambda i: c(x - i))


def sS(s, S, eps=1e-12):
    mu = defaultdict(float)
    mu[S] = 1.0
    is_stop = lambda x: x <= s
    D_items = list(D.items())
    visits = green(mu, D_items, is_stop, eps=eps)

    t = sum(p for x, p in visits.items() if not is_stop(x))
    cost = sum(p * L(x) for x, p in visits.items() if not is_stop(x))
    cost += K * sum(p for x, p in visits.items() if is_stop(x))
    return cost / t


S_star, s_star = 20, 16
print(f"{sS(s_star, S_star)=:.4f}")
Python
sS(s_star, S_star)=22.8449

We see that by accepting returns, the average cost decreases.

Opt 5.10.1

In the pruning step of green, why not prune like this

Python
mu = defaultdict(float, {x: p for x, p in mu.items() if (p > eps or not stop(x))})
Hint

How do we handle states at which is_stop(x) = True earlier in the algorithm?

Solution
Did you actually try? Maybe see the 'hints' above!:
Solution, for real

When a state \(x\) is in the stopping set, it has been visited once. We need to count this visit because there can be a cost associated with stopping at this \(x\).