我以前有:
self.memory = np.zeros((MEMORY_CAPACITY, s_dim * 2 + a_dim + 1), dtype=np.float32)
但是我需要在此内存中添加一个变量“ done”,所以我做到了:
self.memory = np.zeros((MEMORY_CAPACITY, s_dim * 2 + a_dim + 2), dtype=np.float32)
现在,我使用以下命令在内存中添加了变量“ done”:
def store_transition(self, s, a, r, s_, done):
transition = np.hstack((s, a, [r], s_, done))
index = self.pointer % MEMORY_CAPACITY # replace the old memory with new memory
self.memory[index, :] = transition
现在添加了它,但是我还需要将其恢复到其他函数中:
indices = np.random.choice(MEMORY_CAPACITY, size=BATCH_SIZE)
bt = self.memory[indices, :]
bs = bt[:, :self.s_dim]
ba = bt[:, self.s_dim: self.s_dim + self.a_dim]
br = bt[:, -self.s_dim - 1: -self.s_dim]
bs_ = bt[:, -self.s_dim:]
bd = bt[:, here should be done]
所以bd应该包含done变量,我个人认为应该是:
bd = bt[:, -1:]
但是我不确定。...
此外,由于阵列变大了,一些旧的放置位置必须正确更改,但是我不知道哪个,什么以及如何...。
有人可以帮助我吗?
Jan
答案 0 :(得分:0)
不太清楚您对这个零件的意思还有一些旧的...
但是numpy切片语法有效。参见以下示例:
>>> x = np.random.randn(5, 6)
>>> x.shape
(5, 6)
>>> x
array([[-0.66028509, -0.03515113, 0.54097151, 1.64021491, 1.55407344,
-1.88961789],
[-0.73310028, -0.38558638, 0.33200719, -0.142615 , 0.57087033,
-0.67726621],
[ 0.32542737, -1.13508259, 1.58907859, 0.94438687, 0.33949198,
1.52579515],
[ 0.59211854, 0.39976888, 0.13617402, 0.57993582, -0.25274804,
-1.15533191],
[ 0.21203948, 0.72443024, -1.74406077, 0.97494208, 0.12653774,
-0.00668887]])
>>> x[:, :-1]
array([[-0.66028509, -0.03515113, 0.54097151, 1.64021491, 1.55407344],
[-0.73310028, -0.38558638, 0.33200719, -0.142615 , 0.57087033],
[ 0.32542737, -1.13508259, 1.58907859, 0.94438687, 0.33949198],
[ 0.59211854, 0.39976888, 0.13617402, 0.57993582, -0.25274804],
[ 0.21203948, 0.72443024, -1.74406077, 0.97494208, 0.12653774]])
>>> x[:, :-1].shape
(5, 5)
>>> x[:, -1:]
array([[-1.88961789],
[-0.67726621],
[ 1.52579515],
[-1.15533191],
[-0.00668887]])
>>> x[:, -1:].shape
(5, 1)