sklearn LinearSVC - X每个样本有1个特征;期待5

时间:2015-08-19 21:37:29

标签: python machine-learning scikit-learn

我正在尝试预测测试数组的类,但是我得到了以下错误,以及堆栈跟踪:

Traceback (most recent call last):
  File "/home/radu/PycharmProjects/Recommender/Temporary/classify_dict_test.py", line 24, in <module>
    print classifier.predict(test)
  File "/home/radu/.local/lib/python2.7/site-packages/sklearn/linear_model/base.py", line 215, in predict
    scores = self.decision_function(X)
  File "/home/radu/.local/lib/python2.7/site-packages/sklearn/linear_model/base.py", line 196, in decision_function
    % (X.shape[1], n_features))
ValueError: X has 1 features per sample; expecting 5

生成此代码的代码是:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC

corpus = [
    "I am super good with Java and JEE",
    "I am super good with .NET and C#",
    "I am really good with Python and R",
    "I am really good with C++ and pointers"
    ]

classes = ["java developer", ".net developer", "data scientist", "C++ developer"]

test = ["I think I'm a good developer with really good understanding of .NET"]

tvect = TfidfVectorizer(min_df=1, max_df=1)

X = tvect.fit_transform(corpus)

classifier = LinearSVC()
classifier.fit(X, classes)

print classifier.predict(test)

我已尝试查看LinearSVC documentation有关可能引发此错误的指南或提示,但我无法弄明白。

非常感谢任何帮助!

1 个答案:

答案 0 :(得分:14)

变量test是一个字符串 - SVC需要一个与X具有相同维数的特征向量。在将其提供给SVC之前,必须使用相同的向量化器实例将测试字符串转换为特征向量:

X_test=tvect.transform(test)
classifier.predict(X_test)