Question

我早些时候问过一个有关创建有效循环以找到异常值的问题，这里有人提供了很好的答案。但是，我现在正在编写此代码以在numpy数组中查找异常值。对于此程序，假定array = x_a且数据点为x。 Mu是x_a的平均值，没有数据点，而sigma是x_a的标准偏差，没有数据点。如果数据点是异常，则将其从x_a中删除。在下面的代码中，我正在尝试执行此操作。我创建了数据集x_a和x_trimmed（假定是根据需要而没有点x的数据集）。下面是代码和输出。

    import numpy as np

x_a = np.array([([99.5697438, 94.47019021, 55.0, 106.86672855, 102.78730151, 
      131.85777845, 88.25376895, 96.94439838, 83.67782174,                
      115.57993209, 118.97651966, 94.40479467, 79.63342207, 77.88602065, 
      96.59145004, 99.50145353, 97.25980235, 87.72010069, 101.30597215, 
      87.3110369, 110.0687946, 104.71504012, 89.34719772, 160.0, 
      110.61519268, 112.94716398, 104.41867586])

outliers=True
while outliers:
    #Define mean and std of the dataset
    x_mu = np.mean(x_a, axis=0)
    x_std = np.std(x_a, axis=0)
    #Define the dataset WITHOUT the data point, and calculate the mean and std WITHOUT the datapoint
    x_trimmed = [x for x in x_a if (x < x_mu + (3 * x_std)) or (x > x_mu - (3 * x_std))]
    trim_mu = np.mean(x_trimmed, axis=0)
    trim_std = np.std(x_trimmed, axis=0)
    for cell in x_a:
        if cell > x_mu + (3 * x_std) or cell < x_mu - (3 * x_std):
            print("Removed the data point " + str(cell))
            index=np.where(x_a==cell)
            x_a = np.delete(x_a, obj=index)
    if np.array_equal(x_a, x_trimmed):
        print("No more outlier detected!")
        outliers=False

但是，输出如下：

Removed the data point 160.0
No more outlier detected!

我已经手动删除了数据点55.0和131.85777845，发现点131.85777845的确与平均值之间存在大约3.07标准偏差。预期的输出是应删除160、55和131.85777845。

要显示正确的输出，需要对代码进行哪些修改？

删除Python numpy数组中的异常？

0 个答案: