Python使用na值获得矩阵中邻居的平均值

时间:2015-05-06 05:31:30

标签: python numpy matrix

我有非常大的矩阵,所以不要通过遍历每一行和列来求和。

a = [[1,2,3],[3,4,5],[5,6,7]]
def neighbors(i,j,a):
    return [a[i][j-1], a[i][(j+1)%len(a[0])], a[i-1][j], a[(i+1)%len(a)][j]]
[[np.mean(neighbors(i,j,a)) for j in range(len(a[0]))] for i in range(len(a))]

此代码适用于3x3或小范围的矩阵,但对于像2k x 2k这样的大矩阵,这是不可行的。如果缺少矩阵中的任何值或者像na

那样,这也不起作用

此代码适用于3x3或小范围的矩阵,但对于像2k x 2k这样的大矩阵,这是不可行的。如果缺少矩阵中的任何值或者像na那样,这也不起作用。如果任何邻居值为na,则跳过该邻居获取平均值

3 个答案:

答案 0 :(得分:5)

射击#1

这假设您希望在窗口为3 x 3的输入数组中获取滑动窗口平均值,并且仅考虑西北 - 东 - 南邻域元素。

对于这种情况,可以使用具有适当内核的signal.convolve2d。最后,您需要将这些求和除以内核中的求和数,即kernel.sum()仅作为对求和的贡献。这是实施 -

import numpy as np
from scipy import signal

# Inputs
a = [[1,2,3],[3,4,5],[5,6,7],[4,8,9]]

# Convert to numpy array
arr = np.asarray(a,float)    

# Define kernel for convolution                                         
kernel = np.array([[0,1,0],
                   [1,0,1],
                   [0,1,0]]) 

# Perform 2D convolution with input data and kernel 
out = signal.convolve2d(arr, kernel, boundary='wrap', mode='same')/kernel.sum()

射击#2

这与假设#1中的假设相同,只是我们希望在只有零元素的邻域中找到平均值,并打算用这些平均值替换它们。

方法#1:以下是使用手动选择性卷积方法实现此目的的一种方法 -

import numpy as np

# Convert to numpy array
arr = np.asarray(a,float)    

# Pad around the input array to take care of boundary conditions
arr_pad = np.lib.pad(arr, (1,1), 'wrap')

R,C = np.where(arr==0)   # Row, column indices for zero elements in input array
N = arr_pad.shape[1]     # Number of rows in input array

offset = np.array([-N, -1, 1, N])
idx = np.ravel_multi_index((R+1,C+1),arr_pad.shape)[:,None] + offset

arr_out = arr.copy()
arr_out[R,C] = arr_pad.ravel()[idx].sum(1)/4

示例输入,输出 -

In [587]: arr
Out[587]: 
array([[ 4.,  0.,  3.,  3.,  3.,  1.,  3.],
       [ 2.,  4.,  0.,  0.,  4.,  2.,  1.],
       [ 0.,  1.,  1.,  0.,  1.,  4.,  3.],
       [ 0.,  3.,  0.,  2.,  3.,  0.,  1.]])

In [588]: arr_out
Out[588]: 
array([[ 4.  ,  3.5 ,  3.  ,  3.  ,  3.  ,  1.  ,  3.  ],
       [ 2.  ,  4.  ,  2.  ,  1.75,  4.  ,  2.  ,  1.  ],
       [ 1.5 ,  1.  ,  1.  ,  1.  ,  1.  ,  4.  ,  3.  ],
       [ 2.  ,  3.  ,  2.25,  2.  ,  3.  ,  2.25,  1.  ]])

为了处理边界条件,还有其他填充选项。查看numpy.pad了解更多信息。

方法#2:这将是前面Shot #1中列出的基于卷积的方法的修改版本。这与之前的方法相同,只是在最后,我们选择性地替换 具有卷积输出的零元素。这是代码 -

import numpy as np
from scipy import signal

# Inputs
a = [[1,2,3],[3,4,5],[5,6,7],[4,8,9]]

# Convert to numpy array
arr = np.asarray(a,float)

# Define kernel for convolution                                         
kernel = np.array([[0,1,0],
                   [1,0,1],
                   [0,1,0]]) 

# Perform 2D convolution with input data and kernel 
conv_out = signal.convolve2d(arr, kernel, boundary='wrap', mode='same')/kernel.sum()

# Initialize output array as a copy of input array
arr_out = arr.copy()

# Setup a mask of zero elements in input array and 
# replace those in output array with the convolution output
mask = arr==0
arr_out[mask] = conv_out[mask]

备注: Approach #1是输入数组中零元素数量较少的首选方式,否则请使用Approach #2

答案 1 :(得分:3)

这是@Divakar答案下的评论附录(而非独立答案)。

出于好奇,我尝试了针对scipy卷积的不同'伪'卷积。最快的一个是%(模数)包装一个,这让我感到惊讶:显然numpy通过索引做了一些聪明的事情,尽管显然不需要填充会节省时间。

fn3 - > 9.5ms,fn1 - > 21ms,fn2 - > 232ms

import timeit

setup = """
import numpy as np
from scipy import signal
N = 1000
M = 750
P = 5 # i.e. small number -> bigger proportion of zeros
a = np.random.randint(0, P, M * N).reshape(M, N)
arr = np.asarray(a,float)"""

fn1 = """ 
arr_pad = np.lib.pad(arr, (1,1), 'wrap')
R,C = np.where(arr==0)
N = arr_pad.shape[1]
offset = np.array([-N, -1, 1, N])
idx = np.ravel_multi_index((R+1,C+1),arr_pad.shape)[:,None] + offset
arr[R,C] = arr_pad.ravel()[idx].sum(1)/4"""

fn2 = """
kernel = np.array([[0,1,0],
                   [1,0,1],
                   [0,1,0]]) 
conv_out = signal.convolve2d(arr, kernel, boundary='wrap', mode='same')/kernel.sum()
mask = arr == 0.0
arr[mask] = conv_out[mask]"""

fn3 = """ 
R,C = np.where(arr == 0.0)
arr[R, C] = (arr[(R-1)%M,C] + arr[R,(C-1)%N] + arr[R,(C+1)%N] + arr[(R+1)%M,C]) / 4.0
"""

print(timeit.timeit(fn1, setup, number = 100))
print(timeit.timeit(fn2, setup, number = 100))
print(timeit.timeit(fn3, setup, number = 100))

答案 2 :(得分:1)

使用numpyscipy.ndimage,您可以应用"足迹"它定义了你在哪里寻找每个元素的邻居并将函数应用于这些邻居:

import numpy as np
import scipy.ndimage as ndimage

# Getting neighbours horizontally and vertically,
#   not diagonally
footprint = np.array([[0,1,0],
                      [1,0,1],
                      [0,1,0]])
a = [[1,2,3],[3,4,5],[5,6,7]]
# Need to make sure that dtype is float or the
#   mean won't be calculated correctly
a_array = np.array(a, dtype=float)

# Can specify that you want neighbour selection to
#   wrap around at the borders
ndimage.generic_filter(a_array, np.mean, 
                       footprint=footprint, mode='wrap')
Out[36]: 
array([[ 3.25,  3.5 ,  3.75],
       [ 3.75,  4.  ,  4.25],
       [ 4.25,  4.5 ,  4.75]])