Policy Improvement Algorithm - Improving Policy
Previous
Next
Shown below is the above expression evaluated for state
and
. The
and
used in these equations are highlighted at the bottom of the screen
Among the possible decisons
minimizes the expression. Thus
. That is, when the computer is down, the new policy is to use "Expert repair".
.