5.9 An algorithm for the \((s, S)\) and \((T, S)\) policies

We develop an algorithm by which we can compute the long-run average cost for a single-item inventory system under periodic review and controlled by the \((s, S)\) or the \((T, S)\) policy. This algorithm finds its importance in its underlying ideas: hitting (or stopping) problems, the renewal reward theorem, and dynamic programming. We start with the \((T, S)\) policy, then move on to the \((s, S)\) policy, and then extend to a combination of the other two, the \((T, s, S)\) policy, to be introduced presently.

Recall that under the \((T, S)\) policy, we place a replenishment order every \(T\) periods such that the inventory position \(P\) equals \(S\) after the replenishment. When the inventory level \(I\) equals \(x\), the inventory cost is given by

\begin{align*} c(x) &= b x^{-} + h x^{+}. \end{align*}

By conditioning on \(P=x\), and using¹¹ See Th. 5.7.3. that \(I=P-X\), it follows that

\begin{align*} L(x) = \E{c(I) | P=x} = \E{c(x-X)} = \sum_{k=0}^{\infty} f_{k} c(x-k) \tag{5.9.1} \end{align*}

is the expected inventory cost per period when \(P=x\).

Let us write \(V(n, x)\) for the expected cost until reordering if we start at position \(P=x\) and there are \(n\), \(0\leq n \leq T\), periods remaining. By reasoning analogously as in Section 5.6, \(V(n, x)\) satisfies the recursion

\begin{align*} V(n, x) &= \begin{cases} L(x) + \E{V(n-1, x-D)}, &\quad n > 0, \\ K, & \quad n = 0. \end{cases} \end{align*}

To see this, observe that when \(P=x\) and \(n>0\), we first pay \(L(x)\) for keeping inventory during one period, then a demand \(D\) arrives, and for the next period we have one less period to go until ordering. When \(n=0\), we pay the order cost \(K\) to replenish to \(S\). Note that this recursion always terminates because \(V(n, \cdot)\) is expressed in terms of \(V(n-1, \cdot)\), so always one period less to go.

Since a (renewal) cycle starts at level \(S\), the expected cycle cost is \(V(T, S)\) and the expected cycle duration is \(T\) periods. The renewal reward theorem implies that the long-run average cost is given by \(V(T, S)/T\).

To facilitate the development of the other models, we introduce some important definitions²² Of dynamic programming. that are used in the above recursion: \(L(x)\) is called the running cost, \(K\) the hitting cost, \(T\) is the hitting time, and \(\set{n : n=0}\) is the hitting set.³³ We use the adjectives hitting and stopping interchangeably.

With this terminology we can easily structure the analysis of the \((s,S)\) policy. In view of Eq. (2.5.3), when the inventory position \(P\) enters (or hits) the stopping set \(\set{i : i \leq s}\) we place an order and pay a stopping cost \(K\). When \(P>s\), we pay the running cost \(L(x)\) and continue for another period. Hence, the expected cost until stopping satisfies the recursion

\begin{align*} V(s, x) = \begin{cases} L(x) + \E{V(s, x-D)}, &\quad x > s, \\ K, & \quad x \leq s. \end{cases} \end{align*}

For the hitting time we find

\begin{align*} T(s, x) = \begin{cases} 1 + \E{T(s, x-D)}, &\quad x > s, \\ 0, & \quad x \leq s, \end{cases} \end{align*}

because the running cost is \(1\) if we continue for another period, and when the process hits the stopping set, the time stops counting, so the stopping cost is \(0\). Note that this recursion is structurally identical to that of \(V\).

To find an algorithm, we expand the recursion for \(V\) at \(x>s\) as

\begin{align*} V(s, x) = \E{V(s, x-D)} = \sum_{k=0}^{\infty} f_{k}V(s, x-k) = f_{0} V(s, x) + f_{1} V(s, x-1) + \cdots. \end{align*}

Clearly, \(V(s, x)\) appears at the LHS and RHS, so \(V(s, x)\) is expressed in terms of itself, that is, circular.⁴⁴ This is different from the recursion for the \((T,S)\) policy. The way out is to isolate \(V(s, x)\) by rewriting this as

\begin{align*}\label{eq:54} (1-f_{0}) V(s, x) = f_{1} V(s, x-1) + f_{2} V(s, x-2) + \cdots. \tag{5.9.2} \end{align*}

Thus, assuming that⁵⁵ This is not restrictive, because \(\P{D=0} = 1\) implies that there is no demand, in which case there is no need to keep inventory. \(f_0 < 1\),

\begin{align*} V(s, x) = (1-f_{0})^{-1} \sum_{k=1}^{\infty} f_{k} V(x-k). \tag{5.9.3} \end{align*}

We can now use one algorithm to compute \(V\) and \(T\) by adjusting the running and stopping cost accordingly.

Finally, when the expected cycle cost \(V(s, S)\) and expected hitting time \(T(s,S)\) have been computed, the long-run average cost is \(V(s,S)/T(s,S)\).

By combining the two policies above we obtain the \((T,s,S)\) policy. The recursion for \(V(n, s, x)\) specifies the behavior:

\begin{align*}\label{eq:ss9} V(n, s, x) = \begin{cases} L(x) + \E{V(n-1, s, x-D)}, &\quad x > s, \\ K, & \quad x \leq s \text{ or } n = 0. \end{cases} \tag{5.9.4} \end{align*}

The structure of the recursion for \(T(n, s, x)\) is the same, but with running cost \(1\) and stopping cost \(0\).

To see how the \((s,S)\) and \((T,s,S)\) policy relate, realize that when \(T\) becomes very large, the set \(set{i : i \leq s}\) will be hit with overwhelming probability long before the time-to-go has become \(0\). Therefore, the average cost under the \((s,S)\) policy must be \(V(s,S)/T(s,S) = \lim_{n\to\infty} V(n, s, S)/T(n, s, S)\). By similar reasoning, \(V(T,S)/T = \lim_{s\to-\infty} V(T, s, S)/T(T, s, S)\).

It is interesting to understand why we can obtain the closed-form expression \(T(q) = q/(\mu-\lambda)\) in Eq. (5.6.1), while for the \((s,S)\) and \((T,s,S)\) policies we need an algorithm. The difference is that in the earlier situation⁶⁶ So, possibly big steps upward, but only single steps downward. \(T(q) = 1 + \E{T(q+Y-1)}\) while in the case of inventory control, the inventory position can jump down by more than one unit when \(\P{D\geq 2} > 0\). Thus, the behavior of \(v\) and \(T\) just above \(s\) is more complicated. Moreover, in the queueing situation, the queue length can never be negative, while in the inventory situation the level \(I\) can take negative values. As a consequence, the cost function \(L(x)\) is not linear⁷⁷ But still convex. , hence the idea to substitute \(T(q) = \alpha q + \beta\) and solve for \(\alpha\) and \(\beta\) will not work in general.

Next, we note that deriving a recursive algorithm for the cost under the \((Q,r)\) policy seems not so attractive. To see why, observe that the stopping cost is given by \(V(r, x) = V(r, x + Q)\), that is, it is no longer constant such as \(K\), but dependent on other values of \(v\). Therefore, the simplifying trick that led to Eq. cannot be applied here.

We now implement the algorithm for the \((T,s,S)\) and \((s,S)\) policies in Python. By comparing the recursions, we see that both policies have the running cost and stopping cost for the expected cost and time until the stopping set is hit. However, the two policies differ in their stopping sets and the details of the cost computations. We therefore put all common functionality in a base class and use a hitting_value method in the subclasses TsSPolicy and sSPolicy to cope with the differences.

For ease of understanding, we first discuss TsSPolicy; assume that the running cost and stopping cost are already provided by the super class. The is_stop function implements the stopping criterion.

from functools import cache

from inventory_base import InventoryModelPrimitives


class TsSPolicy(InventoryModelPrimitives):
    def hitting_value(self, *, running_cost, stopping_cost):
        is_stop = lambda n, s, x: x <= s or n == 0

        @cache
        def v(T, s, S):
            if is_stop(T, s, S):
                return stopping_cost(S)
            return running_cost(S) + self.D.E(lambda i: v(T - 1, s, S - i))

        return v


class TSPolicy(TsSPolicy):
    "Long run average cost for a TS inventory system."

    def average_cost(self, T, S):
        return super().average_cost(T, -float('inf'), S)

In this code, the method hitting_value returns a function⁸⁸ Pay attention here: the returned object is a function, not a number. \(v\). This function, when applied to the arguments T, s, S, returns the number v(T, s, S). You should solve No description for this link to understand how this idea works in detail.

Similarly, for the \((s,S)\) policy we implement the recursive scheme of Eq. (5.9.3).

from functools import cache

from inventory_base import InventoryModelPrimitives


class sSPolicy(InventoryModelPrimitives):
    def hitting_value(self, *, running_cost, stopping_cost):
        is_stop = lambda s, x: x <= s

        @cache
        def v(s, S):
            if is_stop(s, S):
                return stopping_cost(S)
            res = running_cost(S)
            res += self.D.E(lambda i: 0.0 if i == 0 else v(s, S - i))
            return res / (1.0 - self.D.pmf(0))

        return v

Here is the super class. Once again, solve No description for this link to understand why we put the arguments (*args, **kwargs) at the end of the methods.⁹⁹ Consult an LLM on the meaning of *args and **kwargs.

from abc import ABC, abstractmethod
from functools import cache

import numpy as np
from functions import Min, Plus
from random_variable import NumericRV as RV


class InventoryModelPrimitives:
    "Common functionality for sS and TsS inventory control systems."

    def __init__(self, D, leadtime, h, b, K):
        if D.pmf(0) == 1:
            raise ValueError("Invalid D: pmf(0) == 1")
        self.D = D
        self.leadtime = leadtime
        self.h = h
        self.b = b
        self.K = K
        self.X = sum(self.D for _ in range(leadtime)) if leadtime > 0 else RV({0: 1})

    def c(self, x):
        return self.b * Min(x) + self.h * Plus(x)

    @cache
    def L(self, x):
        return self.X.E(lambda i: self.c(x - i))

    @abstractmethod
    def hitting_value(self, *, running_cost, stopping_cost):
        raise NotImplementedError

    @cache
    def hitting_cost(self, *args, **kwargs):
        return self.hitting_value(
            running_cost=lambda x: self.L(x),
            stopping_cost=lambda x: self.K,
        )(*args, **kwargs)

    @cache
    def hitting_time(self, *args, **kwargs):
        return self.hitting_value(
            running_cost=lambda x: 1.0,
            stopping_cost=lambda x: 0.0,
        )(*args, **kwargs)

    def average_cost(self, *args, **kwargs):
        return self.hitting_cost(*args, **kwargs) / self.hitting_time(
            *args, **kwargs
        )

Let us examine a few numerical examples.

from lighthouse_case import D, K, b, h, leadtime
from TsS import TSPolicy, TsSPolicy
from ss import sSPolicy


sS = sSPolicy(D, leadtime, h, b, K)

s, S = 16, 20
print("sS policy")
print(f"hitting time: {sS.hitting_time(s, S):.2f}")
print(f"hitting cost: {sS.hitting_cost(s, S):.2f}")
print(f"average cost: {sS.average_cost(s, S):.2f}")

sS policy
hitting time: 2.2865088
hitting cost: 72.04810303999999
average cost: 31.51009217196102

We can check that for \(T\) large, the \((T,s,S)\) policy and the \((s,S)\) have the same average cost.

TsS = TsSPolicy(D, leadtime, h, b, K)

print("TsS policy")
print(f"average cost: {TsS.average_cost(T=1, s=s, S=S):.2f}")
print(f"average cost: {TsS.average_cost(T=2, s=s, S=S):.2f}")
print(f"average cost: {TsS.average_cost(T=10, s=s, S=S):.2f}")

TsS policy
average cost: 60.3
average cost: 38.59665071770335
average cost: 31.51019163236799

For the \((T,S)\) policy, setting \(T\) to a large value is costly, as expected.

TS = TSPolicy(D, leadtime, h, b, K)

print("TS policy")
print(f"average cost: {TS.average_cost(T=1, S=S):.2f}")
print(f"average cost: {TS.average_cost(T=2, S=S):.2f}")
print(f"average cost: {TS.average_cost(T=10, S=S):.2f}")
print(f"average cost: {TS.average_cost(T=20, S=S):.2f}")

TS policy
average cost: 60.3
average cost: 34.541666666666664
average cost: 35.8361856753891
average cost: 193.77674738939885

TF 5.9.1

Claim: For the \((s,S)\) policy, the expected cost until stopping \(V(s,x)\) satisfies the same recursion as the one for the \((T,S)\) policy.

Solution

Did you actually try? Maybe see the 'hints' above!:

Solution, for real

False. For the \((s,S)\) policy the recursion is circular in \(x\) and cannot be solved by backward induction in time, and therefore needs the extra factor \(1-f_0\); this is explicitly contrasted with the \((T,S)\) policy.

TF 5.9.2

Claim: The next formula is a correct implementation to compute the expected time to hit \(s\) under the \((s,S)\) policy:

\begin{align*} \begin{cases} (1-f_{0})^{-1}(1 + \E{T(s, x-D)}), &\quad x > s, \\ 0, & \quad x \leq s, \end{cases} \end{align*}

Solution

Did you actually try? Maybe see the 'hints' above!:

Solution, for real

False. It is the time to hit the set \(\{i : i \leq s\}\), not the time to hit \(s\).

TF 5.9.3

Claim: The next code can be used to compute the expected cost until stopping for the \((s,S)\) and the \((T,S)\) policy:

def hitting_cost(self, *args, **kwargs):
    return self.hitting_value(
        running_cost=lambda x: self.L(x),
        stopping_cost=lambda x: self.K,
    )(*args, **kwargs)

Solution

Did you actually try? Maybe see the 'hints' above!:

Solution, for real

True.

TF 5.9.4

Claim: This code correctly computes the expected cost until stopping for the \((s,S)\) policy:

def hitting_value(self, *, running_cost, stopping_cost):
    is_stop = lambda s, x: x <= s

    @cache
    def v(s, S):
        if is_stop(s, S):
            return stopping_cost(S)
        res = running_cost(S)
        res += self.D.E(lambda i: 0.0 if i == 0 else v(s, S - i))
        return res

    return v

Solution

Did you actually try? Maybe see the 'hints' above!:

Solution, for real

False. See the comment to see which line is wrong.

def hitting_value(self, *, running_cost, stopping_cost):
    is_stop = lambda s, x: x <= s

    @cache
    def v(s, S):
        if is_stop(s, S):
            return stopping_cost(S)
        res = running_cost(S)
        res += self.D.E(lambda i: 0.0 if i == 0 else v(s, S - i))
        return res # this line is wrong
    return v

TF 5.9.5

Claim: When \(P=x\) and the leadtime \(\leadtime>0\), the inventory cost under the \((T,s,S)\) policy is given by

\begin{align*} c(x) = b x^{-} + h x^{+}. \end{align*}

Solution

Did you actually try? Maybe see the 'hints' above!:

Solution, for real

False. It should be \(L(x)\).

Ex 5.9.6

Consider the following functions to illustrate how functions like hitting_time and hitting_cost can be understood.

def exp_apply(exponent):
    def apply(base):
        return base ** exponent
    return apply

lion = exp_apply(2)
tiger = exp_apply(3)

Without running the code, what do you think will be printed by:

print(exp_apply(1)(9))
print(lion(5))
print(tiger(4))

Hint

Suppose we rename lion and tiger as follows:

square = exp_apply(2)
cube = exp_apply(3)

Solution

Did you actually try? Maybe see the 'hints' above!:

Solution, for real

Observe that the function exp_apply returns an inner function apply. This inner function captures the variable exponent from its surrounding scope. As a result, the returned function remembers the value of exponent. For example, in square = exp_apply(2), the function exp_apply(2) does not perform any computation yet. Instead, it returns a function that gets assigned to square for which the exponent is fixed to 2. Calling the returned function on a number completes the computation:

print(exp_apply(1)(9))  # 9
print(square(5))  # 25 = 5**2
print(cube(4))  # 64 = 4**3

In general terms: the outer function exp_apply specifies the general structure of a computation: it describes the pattern “take a number and raise it to a fixed exponent,” without yet committing to a particular exponent. By calling the outer function and assigning its output to a new name, we specialize this general procedure. The returned function has the same structure, but with the free parameter exponent bound to a concrete value. In this way, we obtain a new, more specific function from a general computational template.

Look how elegant the sqrt function can be defined.

sqrt = exp_apply(0.5)

A compact version of exp_apply would be like this.

def exp_apply(exponent):
    return lambda base: base**exponent

Hopefully you find such ideas interesting: the concepts illustrated in this exercise are called functional programming, currying, and closures.