Python - 如何使用相邻元素的中值替换numpy数组中的无值

时间:2016-03-12 10:45:44

标签: python arrays numpy

我生成了一个缺少数据的数组

A = np.zeros((80,80))
for i in range(80):
    if i%2 == 0:
        for j in range(80):
            A[i,j] = None
            if j%2 == 0:
                A[i,j] = 50+i+j
    else:
        for j in range(80):
            A[i,j] = None
            if j%2 != 0:
                A[i,j] = 50+i+j

这给了我以下截图。

enter image description here

我要做的是将所有“无”值替换为相邻元素的中值,这些值也不是无值。 是否有一种简单的方法可以在不通过循环中的每个元素的情况下执行此操作?

1 个答案:

答案 0 :(得分:0)

基于numba,我写了这样的东西(尽管它是针对蒙面数组的,但我很快将其用于NaN)。

正如Ami Tavory所说,你也可以用一些笨拙的技巧来做到这一点,但我发现如果你自己编写循环然后优化它,我会发现它更清晰(如果没有内置)。我选择了numba,因为即使它在某些方面有限,它也非常适合加速这种for循环。

import numba as nb
import numpy as np

@nb.njit
def median_filter_2d(array, filtersize):
    x = array.shape[0]
    y = array.shape[1]
    filter_half = filtersize // 2
    # Create an empty result
    res = np.zeros_like(array)
    # Loop through each pixel
    for i in range(x):
        for j in range(y):
            # If it's not NaN just let it stay
            if not np.isnan(array[i, j]):
                res[i, j] = array[i, j]
            else:
                # We don't want to go outside the image:
                start_x = max(0, i - filter_half)
                end_x = min(x, i + filter_half+1)
                start_y = max(0, j - filter_half)
                end_y = min(x, j + filter_half+1)

                # If you want to use nanmedian uncomment this line and comment everything else following
                #res[i, j] = np.nanmedian(array[start_x:end_x, start_y:end_y])      

                # Create a temporary array.
                tmp = np.zeros(filtersize*filtersize)
                counter = 0 # Counter because we want to know how many not-NaNs are present.

                # Get all adjacent pixel that are not NaN and insert them
                for ii in range(start_x, end_x):
                    for jj in range(start_y, end_y):
                        if not np.isnan(array[ii, jj]):
                            tmp[counter] = array[ii, jj]
                            counter += 1

                # Either do it with np.median but it will be slower
                #res[i, j] = np.median(tmp[0:counter])
                # or use some custom median-function
                res[i, j] = numba_median_insertionsortbased(tmp[0:counter])
    return res

辅助中值函数只是一个基于插入排序的排序,然后返回中间元素或两个中间元素的平均值。

@nb.njit
def numba_median_insertionsortbased(items):
    # Insertion sort
    for i in range(1, len(items)):
        j = i
        while j > 0 and items[j] < items[j-1]:
            items[j], items[j-1] = items[j-1], items[j]
            j -= 1
    # Median is the middle element (odd length) or the mean of the two middle elements (even length)
    if items.size % 2 == 0:
        return 0.5 * (items[(items.size // 2)-1] + items[(items.size // 2)])
    else:
        return items[(items.size // 2)]

如果您不想使用numba或者不能使用,则可以删除@nb.njit行并在pure-python中使用它。它会慢很多但它仍然可以工作。

对于80x80 A,我得到numba时间:

  

1000次循环,最佳3:1.63 ms每循环(自定义中位数)

     

100个循环,最佳3:4.88毫秒/循环(numpy median)

但对于大型过滤器,numpy中位数会更快一些,因为它们(希望)拥有比insertionsort更高级的方法。但是对于你的数组,元素将在临时数组中排序,因此基于insertionsort的中位数显然是最合适的。对于真实图像,它可能比numpy中位数慢。

和纯python:

  

1个循环,最佳3:406 ms每个循环(自定义中位数)

     

1个循环,每个循环最好为3:707 ms(没有内循环和临时数组的numpy nanmedian)

     

1个循环,最佳3:832 ms每个循环(临时数组上的numpy中位数)

除此之外:我总是觉得令人惊讶的是,即使没有numba,自定义中位数的内容比小1D输入的numpy中位数要快(好吧,它们已经排序,因此它是insertionsort的最佳情况:-)):< / p>

%timeit numba_median_insertionsortbased(np.arange(9)) # without the @nb.njit
10000 loops, best of 3: 21.7 µs per loop
%timeit np.median(np.arange(9))
10000 loops, best of 3: 123 µs per loop

并且这些更快的解决方案可以通过numba进一步加速:

%timeit numba_median_insertionsortbased(np.arange(9)) # WITH the @nb.njit
100000 loops, best of 3: 8.93 µs per loop