Old Policy:
 |
New Policy:
 |
After one iteration of the policy improvement algorithm,
the policy has been changed (i.e., improved) from using "Operator repair"
when the computer is down to using "Expert repair" when the computer is
down. |
Is the current solution optimal? |
We do not know, because the new policy is different from
the old policy. Thus, another iteration of the policy improvement algorithm
needs to be performed. |
Next we will determine the set of equations which need
to be solved for , ,
and . |