此代码:
R = ql.matrix([ [0,0,0,0,1,0],
[0,0,0,1,0,1],
[0,0,100,1,0,0],
[0,1,1,0,1,0],
[1,0,0,1,0,0],
[0,1,0,0,0,0] ])
来自:
R被定义为“每个状态的奖励矩阵”。此矩阵中的状态和奖励是什么?
# Reward for state 0
print('R[0,]:' , R[0,])
# Reward for state 0
print('R[1,]:' , R[1,])
打印:
R[0,]: [[0 0 0 0 1 0]]
R[1,]: [[0 0 0 1 0 1]]
[0 0 0 0 1 0]
是state0&[0 0 0 1 0 1]
是state1吗?