Policy Improvement Algorithm - Introduction Previous Next
To illustrate the policy improvement algorithm (average cost criterion) and to demonstrate its interactive routine in your IOR Tutorial, consider problem 16.6-1 in the chapter Markov Chains, as reproduced below.
A computer is inspected at the end of every hour. It is found to be either working (up) or failed (down). If the computer is found to be up, the probability of it remaining up for the next hour is 0.90. If it is down, repair action is taken which may take more than an hour. Whenever the machine is down (regardless of how long it has been down) the probability of still being down an hour later is 0.35.
Now suppose that the above description only applies to one available mode of repair ("Operator repair"), and that another more expensive mode ("Expert repair") also is available. Whenever the computer is down, the latter mode reduces the probability of still being down an hour later to 0.10. The relevant data for a down computer are summarized below.