Question

我正在模拟零售商店的库存管理系统；因此，我有一个（15,15）的零矩阵，其中状态是行和动作列：

Q = np.matrix(np.zeros([15, 15]) )

具体来说，最低库存水平是0，最高库存水平是14，状态是当前库存水平和操作库存订单（数量）。

因此，我想用“ -1”代替零，其中状态和动作之和> 14：

print(final_Q)

#First row, from which I can order everything (since 0 + 14 == 14)
[[0 0   0   0   0   0   0   0   0   0   0   0   0   0   0]
#Second row, from which I can order max. 13 products (1 + 14 > 14)
[[0 0   0   0   0   0   0   0   0   0   0   0   0   0   -1]]
#Third row, from which the max is 12    
[[0 0   0   0   0   0   0   0   0   0   0   0   0   -1  -1]]

(...)

我尝试手动实现，但是如何自动获得最终矩阵？

Answer 1

# Q matrix
Q = np.matrix(np.zeros([15+1, 15+1] ))

# Create a diagonal of -1s
Q = Q[0:15][0:15]
il1 = np.tril_indices(15)
Q[il1] = -1
Q = np.rot90(Q)

# Adjust single values
Q[parameters["max_products"]-1][0, 1:] = Q[parameters["max_products"]][0, 1:]
Q = Q[:15, :]

这绝对不是计算有效的，但是可以。

Answer 2

Q = np.tril(-1*np.ones(15), -1)[:, ::-1]

>>> Q
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., 0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., 0., -1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,-1., -1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1.,-1., -1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1., -1.,-1., -1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1., -1., -1.,-1., -1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1., -1., -1., -1.,-1., -1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1., -1., -1., -1., -1.,-1., -1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0., -1., -1., -1., -1., -1., -1.,-1., -1.],
       [ 0.,  0.,  0.,  0.,  0.,  0., -1., -1., -1., -1., -1., -1., -1.,-1., -1.],
       [ 0.,  0.,  0.,  0.,  0., -1., -1., -1., -1., -1., -1., -1., -1.,-1., -1.],
       [ 0.,  0.,  0.,  0., -1., -1., -1., -1., -1., -1., -1., -1., -1.,-1., -1.],
       [ 0.,  0.,  0., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,-1., -1.],
       [ 0.,  0., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,-1., -1.],
       [ 0., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,-1., -1.]])

为Q学习建立可用的动作矩阵

2 个答案: