To illustrate the policy improvement algorithm (average cost criterion)
and to demonstrate its interactive routine in your IOR Tutorial, consider problem 16.6-1 in
the chapter Markov Chains, as reproduced below. |
A computer is inspected at the end of every hour. It is
found to be either working (up) or failed (down). If the computer is found
to be up, the probability of it remaining up for the next hour is 0.90.
If it is down, repair action is taken which may take more than an hour.
Whenever the machine is down (regardless of how long it has been down)
the probability of still being down an hour later is 0.35. |
Now suppose that the above description only applies to one available
mode of repair ("Operator repair"), and that another more expensive mode
("Expert repair") also is available. Whenever the computer is down, the
latter mode reduces the probability of still being down an hour later to
0.10. The relevant data for a down computer are summarized below. |
 |