因此,我研究了LocalOutliner Detection的sklearn示例,并尝试将其应用于我拥有的示例数据集。但是某种程度上,结果本身对我来说并没有任何意义。
我已实现的内容如下:(不包括导入内容)
import numpy as np
import matplotlib.pyplot as plt
import pandas
from sklearn.neighbors import LocalOutlierFactor
# import file
url = ".../Python/outliner.csv"
names = ['R1', 'P1', 'T1', 'P2', 'Flag']
dataset = pandas.read_csv(url, names=names)
array = dataset.values
X = array[:,0:2]
rng = np.random.RandomState(42)
# fit the model
clf = LocalOutlierFactor(n_neighbors=50, algorithm='auto', leaf_size=30)
y_pred = clf.fit_predict(X)
y_pred_outliers = y_pred[500:]
# plot the level sets of the decision function
xx, yy = np.meshgrid(np.linspace(0, 1000, 50), np.linspace(0, 200, 50))
Z = clf._decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.title("Local Outlier Factor (LOF)")
plt.contourf(xx, yy, Z, cmap=plt.cm.Blues_r)
a = plt.scatter(X[:200, 0], X[:200, 1], c='white',
edgecolor='k', s=20)
b = plt.scatter(X[200:, 0], X[200:, 1], c='red',
edgecolor='k', s=20)
plt.axis('tight')
plt.xlim((0, 1000))
plt.ylim((0, 200))
plt.legend([a, b],
["normal observations",
"abnormal observations"],
loc="upper left")
plt.show()
有人可以告诉我为什么检测失败吗?
我尝试使用参数和范围,但对大纲检测器本身没有太大的更改。
如果有人可以指出我在这个问题上的正确方向,那就太好了。谢谢
编辑:添加了导入:File
答案 0 :(得分:1)
我假设您关注了this example。该示例尝试比较实际/观测数据(散点图)与从中学习的决策函数(轮廓图)。由于数据是已知的/组成的(200个正常值+ 20个离群值),我们可以简单地通过使用X[200:]
(从第200个索引开始)选择离群值,并使用X[:200]
(从0至199个索引)选择正常值
因此,如果要绘制预测结果(作为散点图)而不是实际/观测数据,则需要执行以下代码。基本上,您是根据X
({1:正常,-1:离群值)分割y_pred
,然后在散点图中使用它:
import numpy as np
import matplotlib.pyplot as plt
import pandas
from sklearn.neighbors import LocalOutlierFactor
# import file
url = ".../Python/outliner.csv"
names = ['R1', 'P1', 'T1', 'P2', 'Flag']
dataset = pandas.read_csv(url, names=names)
X = dataset.values[:, 0:2]
# fit the model
clf = LocalOutlierFactor(n_neighbors=50, algorithm='auto', leaf_size=30)
y_pred = clf.fit_predict(X)
# map results
X_normals = X[y_pred == 1]
X_outliers = X[y_pred == -1]
# plot the level sets of the decision function
xx, yy = np.meshgrid(np.linspace(0, 1000, 50), np.linspace(0, 200, 50))
Z = clf._decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.title("Local Outlier Factor (LOF)")
plt.contourf(xx, yy, Z, cmap=plt.cm.Blues_r)
a = plt.scatter(X_normals[:, 0], X_normals[:, 1], c='white', edgecolor='k', s=20)
b = plt.scatter(X_outliers[:, 0], X_outliers[:, 1], c='red', edgecolor='k', s=20)
plt.axis('tight')
plt.xlim((0, 1000))
plt.ylim((0, 200))
plt.legend([a, b], ["normal predictions", "abnormal predictions"], loc="upper left")
plt.show()
如您所见,普通数据的散点图将遵循等高线图: