我正在尝试为9个输入功能实现隔离林 使用了来自的例子 http://scikit-learn.org/stable/auto_examples/ensemble/plot_isolation_forest.html#sphx-glr-auto-examples-ensemble-plot-isolation-forest-py
我的火车和测试装置有9个功能,因此我创建了相同特征尺寸的Xtrian和Xtest
X.shape
(100, 9)
>> X_train.shape
(200, 9)
我的代码:
print(__doc__)
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import IsolationForest
rng = np.random.RandomState(42)
# Generate train data
X = 0.3 * rng.randn(100, 9)
X_train = np.r_[X + 2, X - 2]
# Generate some regular novel observations
X = 0.3 * rng.randn(20, 9)
X_test = np.r_[X + 2, X - 2]
# Generate some abnormal novel observations
X_outliers = rng.uniform(low=-4, high=4, size=(20, 9))
# fit the model
clf = IsolationForest(max_samples=100, random_state=rng)
clf.fit(X_train)
y_pred_train = clf.predict(X_train)
y_pred_test = clf.predict(X_test)
y_pred_outliers = clf.predict(X_outliers)
# plot the line, the samples, and the nearest vectors to the plane
xx, yy = np.meshgrid(np.linspace(-5, 5, 50), np.linspace(-5, 5, 50))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.title("IsolationForest")
plt.contourf(xx, yy, Z, cmap=plt.cm.Blues_r)
b1 = plt.scatter(X_train[:, 0], X_train[:, 1], c='white')
b2 = plt.scatter(X_test[:, 0], X_test[:, 1], c='green')
c = plt.scatter(X_outliers[:, 0], X_outliers[:, 1], c='red')
plt.axis('tight')
plt.xlim((-5, 5))
plt.ylim((-5, 5))
plt.legend([b1, b2, c],
["training observations",
"new regular observations", "new abnormal observations"],
loc="upper left")
plt.show()
但我收到错误
---------------------------------------------------------------------------
ValueError: Number of features of the model must match the input. Model n_features is 9 and input n_features is 2
在我的情况下,我的错误显示:模型n_features为9,输入n_features为2
我在这里缺少的任何输入:
答案 0 :(得分:2)
即使你已经适合具有9个特征的模型,代码的绘图部分仍然只假设两个维度,就像你正在处理的例子中的情况一样:
# plot the line, the samples, and the nearest vectors to the plane
xx, yy = np.meshgrid(np.linspace(-5, 5, 50), np.linspace(-5, 5, 50))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
查看传递给np.c_()
的{{1}}数组的形状:
clf.decision_function()
您收到错误是因为np.c_[xx.ravel(), yy.ravel()].shape
(2500, 2)
期待9-D输入,但您只提供2-D数组。
分类器本身仍然可以毫无问题地访问。例如,您仍然可以使用clf
和decision_function()
方法,但是您将无法使用您要删除的代码绘制所有9个维度 - 它仅用于绘制2 -D。即使使用9个维度运行predict()
也几乎肯定会抛出np.meshgrid()
- 请参阅this discussion了解更多信息。
无论如何,尝试绘制9-D空间在这里不会有太大帮助。您可以将注意力集中在分类器强度的更标准的可视化表示上,例如ROC curves
或甚至是老式的confusion matrix
。