Question

我尝试过Hpaulji概述的这种方法，但它似乎没有起作用：

How to append many numpy files into one numpy file in python

基本上，我正在迭代生成器，对数组进行一些更改，然后尝试保存每个迭代的数组。

以下是我的示例代码：

filename = 'testing.npy'

with open(filename, 'wb') as f:
    for x, _ in train_generator:
        prediction = base_model.predict(x)
        print(prediction[0,0,0,0:5])
        np.save(filename, prediction)

        current_iteration += 1
    if current_iteration == 5:
        break

在这里，我将进行5次迭代，所以我希望保存 5 不同的数组。

我打印出每个阵列的一部分，用于调试目的：

[ 0.  0.  0.  0.  0.]
[ 0.          3.37349415  0.          0.          1.62561738]
[  0.          20.28489304   0.           0.           0.        ]
[ 0.  0.  0.  0.  0.]
[  0.          21.98013496   0.           0.           0.        ]

但是当我尝试加载数组时，多次如此处所述， How to append many numpy files into one numpy file in python，我得到了一个EOFERROR：

file = r'testing.npy'

with open(file,'rb') as f:
    arr = np.load(f)
    print(arr[0,0,0,0:5])
    arr = np.load(f)
    print(arr[0,0,0,0:5])

它只输出最后一个数组，然后输出EOFERROR：

[  0.          21.98013496   0.           0.           0.        ]
EOFError: Ran out of input

print(arr[0,0,0,0:5])

我想要保存所有5个数组，但是当我多次加载save .npy文件时，我只得到最后一个数组。

那么，我应该如何保存保存并将新数组附加到文件中？

编辑：使用'.npz'进行测试只能保存最后一个数组

filename = 'testing.npz'

current_iteration = 0
with open(filename, 'wb') as f:
    for x, _ in train_generator:
        prediction = base_model.predict(x)
        print(prediction[0,0,0,0:5])
        np.savez(f, prediction)



        current_iteration += 1
        if current_iteration == 5:
            break


#loading

    file = 'testing.npz'

    with open(file,'rb') as f:
        arr = np.load(f)
        print(arr.keys())


>>>['arr_0']

Answer 1

您对np.save的所有电话都使用文件名，而不是文件句柄。由于您没有重复使用文件句柄，因此每个保存都会覆盖文件，而不是将数组附加到文件句柄。

这应该有效：

filename = 'testing.npy'

with open(filename, 'wb') as f:
    for x, _ in train_generator:
        prediction = base_model.predict(x)
        print(prediction[0,0,0,0:5])
        np.save(f, prediction)

        current_iteration += 1
    if current_iteration == 5:
        break

虽然在一个.npy文件中存储多个数组可能有好处（我想在内存有限的情况下有优势），但它们technically meant可以存储一个数组，你可以使用.npz个文件（np.savez或np.savez_compressed）来存储多个数组：

filename = 'testing.npz'
predictions = []
for (x, _), index in zip(train_generator, range(5)):
    prediction = base_model.predict(x)
    predictions.append(prediction)
np.savez(filename, predictions) # will name it arr_0
# np.savez(filename, predictions=predictions) # would name it predictions
# np.savez(filename, *predictions) # would name it arr_0, arr_1, …, arr_4

将numpy.arrays增量附加到保存文件

1 个答案: