用Numpy数组进行向量化的累积串联

时间:2019-07-16 15:12:18

标签: python-3.x numpy numpy-ndarray

我有一个类似的numpy数组

array([array([1]), array([2, 3]), array([4, 5, 6])], dtype=object)

我想获得一个看起来像

的数组
array([array([1]), array([1, 2, 3]), array([1, 2, 3, 4, 5, 6])], dtype=object)

基本上,我正在寻找与np.cumsum类似的函数,该函数可用于numpy数组。 我该怎么做呢?另外,将内部元素作为numpy数组而不是列表是否更节省时间,还是因为数据类型均为object而没有区别?我可以通过某种方式限制数据类型来加快速度吗

np.array([np.array([1]), np.array([2, 3]), np.array([4, 5, 6])], dtype=np.ndarray)

2 个答案:

答案 0 :(得分:2)

以下方法首先将所有内容连接起来,然后切成薄片。这意味着数据缓冲区由所有部分数组共享。要给每个人自己的内存,将需要TB的RAM(取决于dtype)。

from timeit import timeit
import numpy as np

def cumconc(A):
    total = np.concatenate(A)
    return np.array([*map(total.__getitem__, map(slice, np.fromiter(map(len,A),int,len(A)).cumsum()))])

等效列表理解:

    return np.array([total[:j] for j in np.cumsum([len(j) for j in A])])

示例:

chunks = np.array([np.full(np.random.randint(20,61), i) for i in range(100000)])

chunks看起来

array([array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0]),
       array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
       array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2]),
       ...,
       array([99997, 99997, 99997, 99997, 99997, 99997, 99997, 99997, 99997,
       99997, 99997, 99997, 99997, 99997, 99997, 99997, 99997, 99997,
       99997, 99997, 99997, 99997, 99997, 99997, 99997, 99997, 99997,
       99997, 99997, 99997, 99997, 99997, 99997, 99997, 99997, 99997,
       99997, 99997, 99997, 99997, 99997, 99997, 99997]),
       array([99998, 99998, 99998, 99998, 99998, 99998, 99998, 99998, 99998,
       99998, 99998, 99998, 99998, 99998, 99998, 99998, 99998, 99998,
       99998, 99998, 99998, 99998, 99998, 99998, 99998, 99998, 99998,
       99998, 99998, 99998, 99998, 99998, 99998, 99998, 99998, 99998,
       99998, 99998, 99998, 99998, 99998]),
       array([99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999,
       99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999,
       99999, 99999, 99999])], dtype=object)

应用功能:

cumconc(chunks)

结果:

array([array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0]),
       array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
       array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2]),
       ..., array([    0,     0,     0, ..., 99997, 99997, 99997]),
       array([    0,     0,     0, ..., 99998, 99998, 99998]),
       array([    0,     0,     0, ..., 99999, 99999, 99999])],
      dtype=object)

多快?

timeit(lambda: cumconc(chunks), number=10)
# 0.8433913141489029

答案 1 :(得分:0)

您可以将itertools.accumulatenp.concatenate与自定义功能结合使用来实现它。但是,我效率不高

from itertools import accumulate
n = array([array([1]), array([2, 3]), array([4, 5, 6])], dtype=object)
np.array(list(accumulate(n, lambda x, y: np.concatenate([x, y]))))

Out[1785]:
array([array([1]), array([1, 2, 3]), array([1, 2, 3, 4, 5, 6])],
      dtype=object)