Question

嗨我有一些具有多个nan值的数组。我正在寻找一种方法来估算通过其他有限数据定义的平面的纳米值的值。覆盖整个尺寸超过1000 * 1000的平面将是夸张的数据。所以我的想法是用20 * 20的窗口循环遍历每个nan值位置，并找到由该窗口中的avialable数据定义的平面，并估计窗口中心的值。但这个过程需要相当长的时间来处理。所以我在寻找是否有人能建议我这样做的有效方法。我会很感激。

dim = np.shape(data)
row, col = np.where(np.isnan(data))
a = row > 10
b = row < dim[0] - 10
c = col > 10
d = col < dim[1] - 10
row = row[a & b & c & d]
col = col[a & b & c & d]
interdata = np.zeros(np.shape(data))
interdata[np.isfinite(data)] = data[np.isfinite(data)]
for ii,jj in zip(row,col):
        block = data[ii - 10:ii + 10, jj - 10:jj + 10]  # data in 11 by 11 window
        if not np.all(np.isnan(block)):
            block[block > 2 * np.median(
                block[np.isfinite(block)])] = np.nan  # replace the outliers greater than twice the median by nan.
            pointvalue = block[np.isfinite(block)]
            loc = np.ones((pointvalue.shape[0], 3))
            loc[:, 0:2] = np.transpose(np.where(np.isfinite(block)))
            C, _, _, _ = sp.linalg.lstsq(loc, pointvalue) # plane fitting
            interdata[ii, jj] = C[0] * 10 + C[1] * 10 + C[2] # estimation of value from coefficients defining plane

它可能看起来像一个重复的问题，但我看了很多类似的问题，以前曾被问过。他们中的大多数都处理连续数据，因此避免循环为他们工作。

Answer 1

你能在这里使用类似'伪'卷积方法的东西吗？

Python get get average of neighbours in matrix with na value

显然11x11比3x3更麻烦（已经有点乱了）但是你可以用稍微小一点的样本区来平均吗？

编辑，您是否比较了进行最小二乘平面拟合估算与仅对周围点值进行平均之间的差异？

EDIT2，你尝试过任何scipy插值，它们可能会更快吗？

EDIT3，继续我对此的想法，我认为测试这个很容易。做1000x1000只需要几分之一秒才能进行20次退火，使用较大的阵列需要很长时间才能设置，但如果插值对象中的步幅增加（即100），则运行速度非常快

import numpy as np
from scipy import interpolate

data = np.array([[i + 0.1 * j for i in range(1000)] for j in range(1000)])
data = data * (1.0 + np.random.randn(*data.shape) * 0.2)
data[np.random.randint(100,900,100),np.random.randint(0,999,100)] = np.nan

row, col = np.where(np.isnan(data))
data[row, col] = 0.0 ### first patch zeros in to stop nan killing interpolate
yind = np.arange(0, data.shape[0], 10, dtype=np.int)
xind = np.arange(0, data.shape[1], 10, dtype=np.int)
for i in range(20): ### repeat to 'anneal' to steady state could test dz each loop
  interp = interpolate.RectBivariateSpline(yind, xind, data[::10,::10])
  data[row, col] = data[row, col] * 0.5 + interp.ev(row, col) * 0.5

EDIT4 - 可能需要RectBivariateSpline（）中的平滑参数，可以实验

Answer 2

据我所知，你开始时的错误前提是：行和列是独立的！你需要那些行和列，其中[a＆amp; b＆amp; c＆amp; d]。否则，您接受某些点的行，然后接受OTHER点的列。然后在行的每个元素上进行循环，并在其中对每个列的元素进行循环。这是针对N nan点进行NxN操作的！如果你认为你的错误点是（250,430）和（160,470）你是＆＃34;修复＆＃34;：（250,430），（250,470），（160,470）和（160），430）。我建议：

whr = numpy.where( (col < ...) & (col > ...) & (row < ...) & (row > ...))
for rr,cc in zip(row[whr],col[whr]):

循环numpy数组的有效方法

2 个答案: