To draw this same conclusion by using the policy improvement algorithm,
let us formally formulate the problem as a Markov decision model. At each
stage, there are three possible decisions ( ):
(1) do nothing, (2) operator repair, and (3) expert repair. Management
has declared that when the computer is down (state = 1), some kind of repair
must ensue (i.e., one cannot choose decision 1 - do nothing). To ensure
this, an infinite cost will be assigned for choosing decision 1 in state
1. Thus, the interesting policies are |