numpy indexing:每行的固定长度部分,具有不同的起始列

时间:2017-03-13 12:00:31

标签: python numpy

我有一个像z这样的二维数组和一个表示"开始列位置的第一个数组"比如starts。另外,我有一个固定的row_length = 2

z = np.arange(35).reshape(5, -1)
# --> array([[ 0,  1,  2,  3,  4,  5,  6],
             [ 7,  8,  9, 10, 11, 12, 13],
             [14, 15, 16, 17, 18, 19, 20],
             [21, 22, 23, 24, 25, 26, 27],
             [28, 29, 30, 31, 32, 33, 34]])

starts = np.array([1,5,3,3,2])

我想要的是这种缓慢的for循环的结果,如果可能的话更快。

result = np.zeros(
    (z.shape[0], row_length), 
    dtype=z.dtype
)
for i in range(z.shape[0]):
    s = starts[i]
    result[i] = z[i, s:s+row_length]

因此,此示例中的result最终应如下所示:

array([[ 1,  2],
       [12, 13],
       [17, 18],
       [24, 25],
       [30, 31]])

我似乎无法使用花哨的索引或np.take来提供此结果。

1 个答案:

答案 0 :(得分:1)

一种方法是使用broadcasted additions使用startsrow_length获取这些索引,然后使用NumPy's advanced-indexing从数据数组中提取所有这些元素,像这样 -

idx = starts[:,None] + np.arange(row_length)
out = z[np.arange(idx.shape[0])[:,None], idx]

示例运行 -

In [197]: z
Out[197]: 
array([[ 0,  1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12, 13],
       [14, 15, 16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25, 26, 27],
       [28, 29, 30, 31, 32, 33, 34]])

In [198]: starts = np.array([1,5,3,3,2])

In [199]: row_length = 2

In [200]: idx = starts[:,None] + np.arange(row_length)

In [202]: z[np.arange(idx.shape[0])[:,None], idx]
Out[202]: 
array([[ 1,  2],
       [12, 13],
       [17, 18],
       [24, 25],
       [30, 31]])