Question

我使用Python并且数组的值为1.0,2.0,3.0,4.0,5.0,6.0，np.nan为NoData。

我想用一个值填充所有“nan”。该值应该是周围值的大部分。

例如：

在该示例中，“p”将呈现“nan”。其大多数邻居的值为1.因此，“nan”将被值1替换。

注意，由“nan”组成的孔的大小可以是1到5.例如（最大尺寸为5 nan）：

此处“nan”的孔具有以下周围值：

surrounding_values = [1,1,1,1,1,2,1,2,3,2,3,1,1,1] -> Majority = 1

我尝试了以下代码：

from sklearn.preprocessing import Imputer

array = np.array(.......)   #consisting of 1.0-6.0 & np.nan
imp = Imputer(strategy="most_frequent")
fill = imp.fit_transform(array)

这非常好用。但是，它只使用一个轴（0 =列，1 =行）。默认值为0（列），因此它使用同一列的大多数周围值。例如：

Array
2 1 2 1 1
2 n 2 2 2
2 1 2 2 1
1 3 2 3 1

Filled Array
2 1 2 1 1
2 1 2 2 2
2 1 2 2 1
1 3 2 3 1

所以在这里你看，虽然大多数是2，但是大多数周围的列值都是1，因此它变为1而不是2。

因此，我需要使用python找到另一种方法。有什么建议或想法吗？

补充：

在我添加了Martin Valgur非常有帮助的改进后，你会看到结果。

将“0”视为海（蓝色），将其他值（> 0）视为陆地（红色）。

如果有一个被陆地包围的“小”海（海洋的大小可以再次为1-5 px），那么它将获得陆地，因为您可以在结果图像中成功查看。如果被包围的海域大于5px或在陆地之外，海洋将不会获得土地（这在图像中是不可见的，因为事实并非如此）。

如果有1px“nan”，其中大部分海洋比陆地多，它仍将成为陆地（在这个例子中它有50/50）。

下图显示了我的需求。在海（value = 0）和land（值> 0）之间的边界处，“nan”-pixel需要获得大部分地值的值。

这听起来很难，我希望我能够生动地解释它。

Answer 1

使用scipy.ndimage中的label()和binary_dilation()的可能解决方案：

import numpy as np
from scipy.ndimage import label, binary_dilation
from collections import Counter

def impute(arr):
    imputed_array = np.copy(arr)

    mask = np.isnan(arr)
    labels, count = label(mask)
    for idx in range(1, count + 1):
        hole = labels == idx
        surrounding_values = arr[binary_dilation(hole) & ~hole]
        most_frequent = Counter(surrounding_values).most_common(1)[0][0]
        imputed_array[hole] = most_frequent

    return imputed_array

编辑：关于您的松散相关的后续问题，您可以扩展上述代码以实现您的目标：

import numpy as np
from scipy.ndimage import label, binary_dilation, binary_closing

def fill_land(arr):
    output = np.copy(arr)

    # Fill NaN-s
    mask = np.isnan(arr)
    labels, count = label(mask)
    for idx in range(1, count + 1):
        hole = labels == idx
        surrounding_values = arr[binary_dilation(hole) & ~hole]
        output[hole] = any(surrounding_values)

    # Fill lakes
    land = output.astype(bool)
    lakes = binary_closing(land) & ~land
    labels, count = label(lakes)
    for idx in range(1, count + 1):
        lake = labels == idx
        output[lake] = lake.sum() < 6

    return output

Answer 2

我没有找到任何lib，所以我写了一个函数，如果在数组中间的所有None都可以使用这些

import numpy as np
from collections import Counter


def getModulusSurround(data):

    tempdata = list(filter(lambda x: x, data))
    c = Counter(tempdata)
    if c.most_common(1)[0][0]:
        return(c.most_common(1)[0][0])


def main():

    array = [[1, 2, 2, 4, 5],
             [2, 3, 4, 5, 6],
             [3, 4, None, 6, 7],
             [1, 4, 2, 3, 4],
             [4, 6, 2, 2, 4]]

    array = np.array(array)

    for i in range(5):
        for j in range(5):
            if array[i,j] == None:

                temparray = array[i-1:i+2,j-1:j+2]
                array[i,j] = getModulusSurround(temparray.flatten())

    print(array)

main()

Answer 3

在Martin Valgur的不可思议的帮助下，我得到了我需要的结果。

因此，我在Martins代码中添加了以下行：

from scipy.ndimage import label, binary_dilation
from scipy.stats import mode

def impute(arr):
    imputed_array = np.copy(arr)

    mask = np.isnan(arr)
    labels, count = label(mask)
    for idx in range(1, count + 1):
        hole = labels == idx
        surrounding_values = arr[binary_dilation(hole) & ~hole]

        sv_list = np.ndarray.tolist(surrounding_values) #!
        for sv in sv_list:  #!
            if sv == 0:
                sv_list.remove(sv)
        surrounding_values = np.array(sv_list)

        imputed_array[hole] = mode(surrounding_values).mode[0]

    return imputed_array

用大多数周围值填充孔（Python）

3 个答案: