Policy Improvement Algorithm - Defining Policies Previous Next
Thus, the optimal policy is to use "Expert Repair" when the computer is down (and do nothing when the computer is up).
To draw this same conclusion by using the policy improvement algorithm, let us formally formulate the problem as a Markov decision model. At each stage, there are three possible decisions (): (1) do nothing, (2) operator repair, and (3) expert repair. Management has declared that when the computer is down (state = 1), some kind of repair must ensue (i.e., one cannot choose decision 1 - do nothing). To ensure this, an infinite cost will be assigned for choosing decision 1 in state 1. Thus, the interesting policies are
Although policies  through  are feasible, they are clearly not optimal, since it doesn't make much sense to spend money to repair a computer which is up (this should become obvious later). Thus, we need to decide between policy
  and . We arbitrarily choose  as our initial policy.