ValueError:模型的要素数必须与输入匹配。 model n_feature>输入nfeature

时间:2017-05-02 02:23:13

标签: python numpy scikit-learn

我正在尝试为9个输入功能实现隔离林 使用了来自的例子 http://scikit-learn.org/stable/auto_examples/ensemble/plot_isolation_forest.html#sphx-glr-auto-examples-ensemble-plot-isolation-forest-py

我的火车和测试装置有9个功能,因此我创建了相同特征尺寸的Xtrian和Xtest

X.shape 
(100, 9)
 >> X_train.shape
(200, 9)

我的代码:

print(__doc__)

import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import IsolationForest

rng = np.random.RandomState(42)

# Generate train data
X = 0.3 * rng.randn(100, 9)
X_train = np.r_[X + 2, X - 2]
# Generate some regular novel observations
X = 0.3 * rng.randn(20, 9)
X_test = np.r_[X + 2, X - 2]
# Generate some abnormal novel observations
X_outliers = rng.uniform(low=-4, high=4, size=(20, 9))

# fit the model
clf = IsolationForest(max_samples=100, random_state=rng)
clf.fit(X_train)
y_pred_train = clf.predict(X_train)
y_pred_test = clf.predict(X_test)
y_pred_outliers = clf.predict(X_outliers)

# plot the line, the samples, and the nearest vectors to the plane
xx, yy = np.meshgrid(np.linspace(-5, 5, 50), np.linspace(-5, 5, 50))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.title("IsolationForest")
plt.contourf(xx, yy, Z, cmap=plt.cm.Blues_r)

b1 = plt.scatter(X_train[:, 0], X_train[:, 1], c='white')
b2 = plt.scatter(X_test[:, 0], X_test[:, 1], c='green')
c = plt.scatter(X_outliers[:, 0], X_outliers[:, 1], c='red')
plt.axis('tight')
plt.xlim((-5, 5))
plt.ylim((-5, 5))
plt.legend([b1, b2, c],
           ["training observations",
            "new regular observations", "new abnormal observations"],
           loc="upper left")
plt.show()

但我收到错误

---------------------------------------------------------------------------

ValueError: Number of features of the model must match the input. Model n_features is 9 and input n_features is 2

在我的情况下,我的错误显示:模型n_features为9,输入n_features为2

我在这里缺少的任何输入:

1 个答案:

答案 0 :(得分:2)

即使你已经适合具有9个特征的模型,代码的绘图部分仍然只假设两个维度,就像你正在处理的例子中的情况一样:

# plot the line, the samples, and the nearest vectors to the plane
xx, yy = np.meshgrid(np.linspace(-5, 5, 50), np.linspace(-5, 5, 50))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])

查看传递给np.c_()的{​​{1}}数组的形状:

clf.decision_function()

您收到错误是因为np.c_[xx.ravel(), yy.ravel()].shape (2500, 2) 期待9-D输入,但您只提供2-D数组。

分类器本身仍然可以毫无问题地访问。例如,您仍然可以使用clfdecision_function()方法,但是您将无法使用您要删除的代码绘制所有9个维度 - 它仅用于绘制2 -D。即使使用9个维度运行predict()也几乎肯定会抛出np.meshgrid() - 请参阅this discussion了解更多信息。

无论如何,尝试绘制9-D空间在这里不会有太大帮助。您可以将注意力集中在分类器强度的更标准的可视化表示上,例如ROC curves或甚至是老式的confusion matrix