提高矢量化函数的内存效率

时间:2014-08-04 20:05:26

标签: python arrays memory-management numpy

我有九个大型浮动阵列(3000乘3000)。这些数组称为g_pro_X_array等。

向量化函数检查每个数组中的单元格,将它们加在一起,一旦超过它,就会从“粒度”查找表中返回一个值。

我的麻烦是,这是一个非常内存密集的操作(它使用近1GB的ram)是否有更多内存有效的方法来进行此计算?

这是我的代码:

    grain_lookup = {"array_1": self.grain_size_1, "array_2": self.grain_size_2, "array_3": self.grain_size_3, "array_4": self.grain_size_4, "array_5": self.grain_size_5, "array_6": self.grain_size_6, "array_7": self.grain_size_7, "array_8": self.grain_size_8, "array_9": self.grain_size_9}

    # Create a function to look up the d50 grainsize
    def get_grain_50(a1, a2, a3, a4, a5, a6, a7, a8, a9):
        if a1 >= 0.5:
            return grain_lookup["array_1"]
        elif a1 + a2 >= 0.5:
            return grain_lookup["array_2"]
        elif a1 + a2 + a3 >= 0.5:
            return grain_lookup["array_3"]
        elif a1 + a2 + a3 + a4 >= 0.5:
            return grain_lookup["array_4"]
        elif a1 + a2 + a3 + a4 + a5 >= 0.5:
            return grain_lookup["array_5"]
        elif a1 + a2 + a3 + a4 + a5 + a6 >= 0.5:
            return grain_lookup["array_6"]
        elif a1 + a2 + a3 + a4 + a5 + a6 + a7 >= 0.5:
            return grain_lookup["array_7"]
        elif a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 >= 0.5:
            return grain_lookup["array_8"]
        elif a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + a9 >= 0.5:
            return grain_lookup["array_9"]
        else:
            return -9999

    V_get_grain = np.vectorize(get_grain_50)

    d50 = np.empty_like(g_pro_1_array, dtype = float)

    d50 = V_get_grain(g_pro_1_array, g_pro_2_array, g_pro_3_array, g_pro_4_array, g_pro_5_array, g_pro_6_array, g_pro_7_array, g_pro_8_array, g_pro_9_array)

1 个答案:

答案 0 :(得分:1)

您需要做一定的效率与内存和可读性之间的权衡,而您并没有真正提及它们。但是,将算法分成两部分是合理的:

  • 找到在达到限制之前必须堆叠的图像数量(0..9,其中9表示未达到限制)

  • 应用查找表

如果您害怕使用大量内存(cumsum使用大约640 MB),您可以一次执行一个图像的总和:

import numpy as np

# the grain table must be an array, fill in the numbers you want
graintable = np.array([100,200,300,400,500,600,700,800,900,-9999])

def V_get_grain(*images):
    # create the cumulative sum buffer (empty at this point)
    csum = np.zeros_like(images[0])
    # create the counter for number of samples needed to reach .5
    cnt = np.zeros(images[0].shape, dtype='uint8')

    # iterate through the images:
    for img in images:
        # add the image into the cumulative sum buffer
        csum += img
        # add 1 to the counter if the sum of a pixel is < .5
        cnt += csum < .5

    # now cnt has a number for each pixel:
    # 0: the first image >= .5
    # ...
    # 9: all images together < .5

    return graintable[cnt]

对于累计和,每个像素需要4或8个字节(取决于您使用的浮点数类型),计数器每个像素需要1个字节。这也应该相对较快(我的计算机花了368毫秒,9个3000x3000图像,8字节浮点数)。可以像调用问题中的函数一样调用该函数。