Question

我有这段代码，我可以找出错误的来源

boxes=(2,2,4,2)
action=(0,1)
num_a=2
Q_table = np.zeros(boxes+(num_a,))
if (pre_a != -1):    
  if (s == -1):        
     bestQ = 0        
  else:
     bestQ=np.amax(Q_table[s])
  Q_table[pre_s,pre_a]+=alpha*(R+gamma*bestQ-Q_table[pre_s,pre_a])

if (s==-1):   
            R=-100
            bestQ=0
            print("failure")
            print(pre_s,pre_a)
            Q_table[pre_s,pre_a]+=alpha*(R+gamma*bestQ-Q_table[pre_s,pre_a])
            print("RESETTING!!!!!")
            [pre_s, s, pre_a, a, x, x_dot, theta, theta_dot] = reset_cart(beta)
            resets= resets + 1
            success = 0
      else:
           R=10
           success=success + 1
           bestQ=np.amax(Q_table[s])
           #Q_table[s+(pre_a,)]+=alpha*(R+gamma*bestQ-Q_table[s+(pre_a,)])
           Q_table[pre_s,pre_a]+=alpha*(R+gamma*bestQ-Q_table[pre_s,pre_a])

当我运行它时，我收到以下错误：

IndexError: index 2 is out of bounds for axis 0 with size 2

但有时代码工作正常，有时候会弹出这个错误。

任何人都可以调试这个。

IndexError：索引2超出了轴0的范围，大小为2 // Python 3 Qlearning

0 个答案: