应用错误收集

如何描述贝尔曼方程的最优策略（pi *）？

时间：2016-11-03 20:09:43

标签： optimization machine-learning reinforcement-learning

我试图在this link等许多资源中找到pi *。但是，我找不到什么是pi *。 V *与V_pi *相同吗？

Screenshot of the question

1 个答案:

答案 0 :(得分：2)

π*用于表示“最优政策”。 V *和Q *是最佳值函数。最优价值函数导致最优政策。

查看https://web.fe.up.pt/~eol/schaefer/diplom/ReinforcementLearning.htm

处的第4.6节