Question

我目前正在尝试运行一个采用2D数组并过滤出Nan值和高振幅噪声的程序。

程序通过首先将2D数组解构为行切片，然后按每行的索引来进行此操作，将Nan值和大于数据残差的第一个标准偏差的值替换为来自数组的值。多项式适合相等大小的数据。下面的代码块是程序的主要部分，旨在通过嵌套的for循环来实现：

# I have to remove the first and last columns of data and filter them on their own,
# as per instruction. The next 24 lines of code do this.
data_initial = data[:,0]
data_final = data[:,-1]

# Create Polynomial fit to data, excluding NaNs. 
idx = np.isfinite(x) & np.isfinite(data_initial)
coeff_initial = p.polyfit(x[idx], data_initial[idx], 30, full=True)
pfit_initial = p.polyval(x, coeff_initial[0])

idy = np.isfinite(x) & np.isfinite(data_final)
coeff_final = p.polyfit(x[idy], data_final[idy], 30, full=True)
pfit_final = p.polyval(x, coeff_final[0])

# replace NaN values in first and last profiles with their corresponding 
# polynomial fit values
for i in range(0,len(data_initial)):
  if np.isnan(data_initial[i]) == True:
    data_initial[i] = pfit_initial[i]

for i in range(0,len(data_final)):
  if np.isnan(data_final[i]) == True:
    data_final[i] = pfit_final[i]

data_initial = data_initial.reshape(len(data_initial),1)
data_final = data_final.reshape(len(data_final),1)

data_smooth = np.array([])

for i in range(0,len(x)):
  data_set = data[i,1:-1] # "data" is the name of the original 2D array

  # The next 3 lines are responsible for fitting a polynomial line to each row of the data
  ids = np.isfinite(y[1:-1]) & np.isfinite(data_set)
  coeff = p.polyfit(y[1:-1][ids], data_set[ids], 20)
  pfit = p.polyval(y[1:-1], coeff)

  # The next nested for loop is designed to replace any Nan values with values
  # from the polynomial fit line "pfit"
  for k in range(0,len(data_set)):
    if np.isnan(data_set[k]) == True:
      data_set[k] = pfit[k]

  # The next 2 lines calculates the residual noise of the data and the 1st standard 
  # deviation of the data
  residuals = data_set - pfit
  standard_dev = 1*np.std(residuals)

  # The next nested for loop is designed to replace any values inside "residuals" 
  # with polynomial fit values if they exceed the 1st standard deviation
  for j in range(0,len(residuals)):
    if abs(residuals[j]) >= abs(standard_dev):
      data_set[j] = pfit[j]

  # The final four lines inside of the overarching loop reshape the data so that 
  # each row can be stacked to create a 2D array
  if len(data_smooth) == 0:
    data_smooth = data_set[None,:]
  else:
    data_smooth = np.vstack((data_smooth,data_set))

# These 2 lines add the previously sliced first and last columns back to the now 
# filtered 2D array of data.
data_smooth_1 = np.hstack((data_initial,data_smooth))
Data_filtered = np.hstack((data_smooth_1,data_final))

但是，当该程序完全运行时（并且这样做没有错误），它似乎对原始数据没有任何作用。原始数据集和“过滤后的”数据集一点一点都相同。当我在嵌套的for循环中重新分配值时，我做错什么了吗？我很茫然，一直在努力解决这个问题好几天了。请帮忙！

更新：我发现它正在按应有的方式过滤数据，但问题在于它正在重新定义上一个数组的每个元素。为什么会这样？

Python-意外替换原始数据

0 个答案: