Policy Improvement Algorithm - Solving Equations Previous Next
The solution to the above system of equations (with  set equal to zero) is shown below.
The next step in policy improvement is to use the values of  computed above for  to find an alternative policy  such that, for each state  is the decision that makes
a minimum. Thus, for each state , the above expression is evaluated for all values of  (1, 2, and 3) in order to find that value of  which minimizes the expression.