我有一个类似的numpy数组
array([array([1]), array([2, 3]), array([4, 5, 6])], dtype=object)
我想获得一个看起来像
的数组array([array([1]), array([1, 2, 3]), array([1, 2, 3, 4, 5, 6])], dtype=object)
基本上,我正在寻找与np.cumsum
类似的函数,该函数可用于numpy数组。
我该怎么做呢?另外,将内部元素作为numpy数组而不是列表是否更节省时间,还是因为数据类型均为object
而没有区别?我可以通过某种方式限制数据类型来加快速度吗
np.array([np.array([1]), np.array([2, 3]), np.array([4, 5, 6])], dtype=np.ndarray)
答案 0 :(得分:2)
以下方法首先将所有内容连接起来,然后切成薄片。这意味着数据缓冲区由所有部分数组共享。要给每个人自己的内存,将需要TB的RAM(取决于dtype)。
from timeit import timeit
import numpy as np
def cumconc(A):
total = np.concatenate(A)
return np.array([*map(total.__getitem__, map(slice, np.fromiter(map(len,A),int,len(A)).cumsum()))])
等效列表理解:
return np.array([total[:j] for j in np.cumsum([len(j) for j in A])])
示例:
chunks = np.array([np.full(np.random.randint(20,61), i) for i in range(100000)])
chunks
看起来
array([array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0]),
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2]),
...,
array([99997, 99997, 99997, 99997, 99997, 99997, 99997, 99997, 99997,
99997, 99997, 99997, 99997, 99997, 99997, 99997, 99997, 99997,
99997, 99997, 99997, 99997, 99997, 99997, 99997, 99997, 99997,
99997, 99997, 99997, 99997, 99997, 99997, 99997, 99997, 99997,
99997, 99997, 99997, 99997, 99997, 99997, 99997]),
array([99998, 99998, 99998, 99998, 99998, 99998, 99998, 99998, 99998,
99998, 99998, 99998, 99998, 99998, 99998, 99998, 99998, 99998,
99998, 99998, 99998, 99998, 99998, 99998, 99998, 99998, 99998,
99998, 99998, 99998, 99998, 99998, 99998, 99998, 99998, 99998,
99998, 99998, 99998, 99998, 99998]),
array([99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999,
99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999, 99999,
99999, 99999, 99999])], dtype=object)
应用功能:
cumconc(chunks)
结果:
array([array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0]),
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2]),
..., array([ 0, 0, 0, ..., 99997, 99997, 99997]),
array([ 0, 0, 0, ..., 99998, 99998, 99998]),
array([ 0, 0, 0, ..., 99999, 99999, 99999])],
dtype=object)
多快?
timeit(lambda: cumconc(chunks), number=10)
# 0.8433913141489029
答案 1 :(得分:0)
您可以将itertools.accumulate
和np.concatenate
与自定义功能结合使用来实现它。但是,我效率不高
from itertools import accumulate
n = array([array([1]), array([2, 3]), array([4, 5, 6])], dtype=object)
np.array(list(accumulate(n, lambda x, y: np.concatenate([x, y]))))
Out[1785]:
array([array([1]), array([1, 2, 3]), array([1, 2, 3, 4, 5, 6])],
dtype=object)