Question

给定两个数组：一个输入数组和一个重复数组，我想接收一个数组，该数组沿着新维度对每行重复指定的次数，并填充到结尾。

to_repeat = np.array([1, 2, 3, 4, 5, 6])
repeats = np.array([1, 2, 2, 3, 3, 1])
# I want final array to look like the following:
#[[1, 0, 0],
# [2, 2, 0],
# [3, 3, 0],
# [4, 4, 4],
# [5, 5, 5],
# [6, 0, 0]]

问题是我正在处理大型数据集（大约10M），所以列表理解太慢-实现此目的的快速方法是什么？

Answer 1

这是一个基于this idea的masking的人-

m = repeats[:,None] > np.arange(repeats.max())
out = np.zeros(m.shape,dtype=to_repeat.dtype)
out[m] = np.repeat(to_repeat,repeats)

样本输出-

In [44]: out
Out[44]: 
array([[1, 0, 0],
       [2, 2, 0],
       [3, 3, 0],
       [4, 4, 4],
       [5, 5, 5],
       [6, 0, 0]])

或者使用广播乘法-

In [67]: m*to_repeat[:,None]
Out[67]: 
array([[1, 0, 0],
       [2, 2, 0],
       [3, 3, 0],
       [4, 4, 4],
       [5, 5, 5],
       [6, 0, 0]])

对于大型数据集/大小，我们可以利用multi-cores并在broadcasting上使用numexpr module来提高存储效率-

In [64]: import numexpr as ne

# Re-using mask `m` from previous method
In [65]: ne.evaluate('m*R',{'m':m,'R':to_repeat[:,None]})
Out[65]: 
array([[1, 0, 0],
       [2, 2, 0],
       [3, 3, 0],
       [4, 4, 4],
       [5, 5, 5],
       [6, 0, 0]])

如何使用填充沿新维度重复一个numpy数组？

1 个答案: