Question

我使用Isolation forest发现了异常值，并将它们分配给变量y_outliers。现在如何从熊猫中删除具有这些值的行？

from sklearn.ensemble import IsolationForest
clf = IsolationForest(max_samples=100, contamination = 0.1, random_state=42)
clf.fit(X)
y_outliers = clf.predict(X)

下一步该怎么做？如果y_outliers是一个包含1（异常值）和-1（异常值）的数组，如何删除行？我尝试

for i in y_outliers:
    if i == -1:
        X.drop(X.index(i))

但是得到了TypeError: 'RangeIndex' object is not callable

我也尝试过

for i in X:
    print(i)
    if y_outliers.loc[i] == -1:
        X.drop(i)

但是得到了

'the label [f1] is not in the [index]'

Answer 1

有两种方法可以做到这一点。为此，请按以下步骤操作：

for i in range(len(y_outliers)):
    if y_outliers[i] == -1:
        X.drop(i,inplace = True)

另一种方法是：

import pandas as pd
isolationdata = pd.DataFrame({'dropIndex':y_outliers})
result = pd.merge(X, isolationdata, left_index=True, right_index=True)
result = result[result.dropIndex == 1]

接受并认可该解决方案（如果可行）。我已经测试了两个代码，它们都可以工作。如果有错误，可以对其进行注释。

删除IsolationForest发现的离群值与熊猫下降行的离群值

1 个答案: