Policy Improvement Algorithm - Improving Policy Previous Next
Shown below is the above expression evaluated for state  and . The  and  used in these equations are highlighted at the bottom of the screen
Among the possible decisons  minimizes the expression. Thus . That is, when the computer is down, the new policy is to use "Expert repair".

.