在python中堆栈数组的内存有效方法

时间:2017-06-13 08:32:10

标签: python numpy out-of-memory memory-efficient

有大型数组,我想读取它们并堆叠它们,如:

>>> x=npy.arange(10).reshape(5,2)
>>> y=npy.arange(10,20).reshape(5,2)
>>> npy.append(x,y)
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

>>> z.reshape(2,x.shape[0],x.shape[1])
array([[[ 0,  1],
        [ 2,  3],
        [ 4,  5],
        [ 6,  7],
        [ 8,  9]],

       [[10, 11],
        [12, 13],
        [14, 15],
        [16, 17],
        [18, 19]]])

但结果会越来越大,最后程序会以out_of_memory停止。 代码是:

for i in range(1, days+1):
    with rasterio.open(directory+"B04_"+str(i)+".jp2") as dataset:
        band_4=dataset.read()[0]

    with rasterio.open(directory+"B08_"+str(i)+".jp2") as dataset:
        band_8=dataset.read()[0]

    _=(band_8- band_4) / (band_8+ band_4+0.0000001)
    ndvi=npy.append(ndvi, ـ )

ndvi=ndvi.reshape(days ,band_8.shape[0],band_8.shape[1])

读取和追加数组的内存效率最高的方法是什么?

1 个答案:

答案 0 :(得分:0)

尝试Dask(http://dask.pydata.org/)它应该可以解决您的问题。这个库允许将数据的某些部分放在磁盘上,如果它大到适合内存。