我试图找出本文第二部分的代码(Q-learning + NN) https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0
1)为什么我们开始学习网络?在权重矩阵中写入targetQ [0,a [0]]是不是更容易? 2)为什么经过网络训练W [s,[a0]]!= targetQ [0,a [0]]?因此损失!= 0