The solution to the above system of equations (with
set equal to zero) is shown below.
The next step in policy improvement is to use the values of
computed above for
to find an alternative policy
such that, for each state ,
is the decision that makes
a minimum. Thus, for each state ,
the above expression is evaluated for all values of
(1, 2, and 3) in order to find that value of
which minimizes the expression.