如何有效地vstack一系列大型numpy数组块?

时间:2016-07-31 03:34:02

标签: python python-3.x numpy scipy

我正在生成一系列numpy数组,如下所示:

def chunker(seq, size):
    return (seq[pos:pos + size] for pos in range(0, len(seq), size))

for i in chunker(X,10000):
    e = function(i)
    print('new marix',e)

new matrix (10000, 3208)
new matrix (10000, 3208)
new matrix (10000, 3208)
new matrix (10000, 3208)
new matrix (10000, 3208)
new matrix (10000, 3208)
new matrix (10000, 3208)
new matrix (10000, 3208)
...
new matrix (10000, 3208)

我希望vstack上述n矩阵在一个矩阵中。因此,我尝试了以下内容:

    X = np.vstack(e)

然而,当我打印X时,我又回来了:

new matrix (10000, 3208)
new matrix (10000, 3208)
new matrix (10000, 3208)
new matrix (10000, 3208)
new matrix (10000, 3208)
new matrix (10000, 3208)
new matrix (10000, 3208)
new matrix (10000, 3208)
...
new matrix (10000, 3208)

而不是新的vstacked单个矩阵。知道如何vstack这个numpy数组序列吗?。

更新

根据jedward的回答,我按如下方式编辑了我的代码:

将numpy导入为np

def chunker(seq, size):
    return (seq[pos:pos + size] for pos in range(0, len(seq), size))

for (r,i) in enumerate(chunker(X,10000)):
    e = function(i)
    print('new matrix',e)
    X[r,:] = e

print(X)

1 个答案:

答案 0 :(得分:1)

一种方法虽然可能不是最有效的,但是可以创建一个列表,列出要堆叠的列表,然后在循环外堆栈一次。

例如:

import numpy as np

def chunker(seq, size):
    return (seq[pos:pos + size] for pos in range(0, len(seq), size))

# Some fake function (n.b. this is a silly way to reverse a list)
def function(arr):
    arr.reverse()
    return arr

# Generate fake X
X = list(range(100))

chunks = []
for i in chunker(X,10):
    e = function(i)
    print('new matrix',e)
    chunks.append(e)

merged = np.vstack(chunks)
print(merged)

输出:

new matrix [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
new matrix [19, 18, 17, 16, 15, 14, 13, 12, 11, 10]
new matrix [29, 28, 27, 26, 25, 24, 23, 22, 21, 20]
new matrix [39, 38, 37, 36, 35, 34, 33, 32, 31, 30]
new matrix [49, 48, 47, 46, 45, 44, 43, 42, 41, 40]
new matrix [59, 58, 57, 56, 55, 54, 53, 52, 51, 50]
new matrix [69, 68, 67, 66, 65, 64, 63, 62, 61, 60]
new matrix [79, 78, 77, 76, 75, 74, 73, 72, 71, 70]
new matrix [89, 88, 87, 86, 85, 84, 83, 82, 81, 80]
new matrix [99, 98, 97, 96, 95, 94, 93, 92, 91, 90]
[[ 9  8  7  6  5  4  3  2  1  0]
 [19 18 17 16 15 14 13 12 11 10]
 [29 28 27 26 25 24 23 22 21 20]
 [39 38 37 36 35 34 33 32 31 30]
 [49 48 47 46 45 44 43 42 41 40]
 [59 58 57 56 55 54 53 52 51 50]
 [69 68 67 66 65 64 63 62 61 60]
 [79 78 77 76 75 74 73 72 71 70]
 [89 88 87 86 85 84 83 82 81 80]
 [99 98 97 96 95 94 93 92 91 90]]

创建中间列表:

merged = np.zeros([0,10])
for i in chunker(X,10):
    e = function(i)
    print('new matrix',e)
    merged = np.vstack([merged, e])

print(merged)

但最有效的方法是在循环之前初始化一个numpy数组,然后在循环中设置该数组的行。你需要先计算最终merged数组的维数(这里我只是将其创建为10x10矩阵,因为我知道它的大小)。

merged = np.zeros([10,10])
for (r,i) in enumerate(chunker(X,10)):
    e = function(i)
    print('new matrix',e)
    merged[r,:] = e

print(merged)