ValueError:X.shape [1] = 8应该等于1500,即训练时的要素数量

时间:2019-04-19 09:42:30

标签: python scikit-learn

我正在使用sklearn训练机器学习模型来对波斯文字进行情感分析。这是我的代码:

vectorizer = TfidfVectorizer(max_features=1500,
                             sublinear_tf=True,
                             use_idf=True,
                             stop_words=stop_words)

X = vectorizer.fit_transform(data).toarray()

le = LabelEncoder()
le.fit(["pos", "neu", "neg"])
y = le.transform(data_labels)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

classifier_rbf = SVC(kernel='rbf', gamma=1, C=1)
classifier_rbf.fit(X_train, y_train)
y_pred = classifier_rbf.predict(X_test)

with open('svm_rbf_classifier.pkl', 'wb') as fid:
    _pickle.dump(y_pred, fid)

with open('tfidf_vectorizer.pkl', 'rb') as fid:
    vectorizer = _pickle.load(fid)

print(classification_report(y_test, y_pred))
print()
print(accuracy_score(y_test, y_pred))

在训练和测试阶段之后,我只想加载我的矢量化器和分类器,以逐一预测波斯语注释。我编写了这段代码来实现这一点:

with open('tfidf_vectorizer.pkl', 'rb') as fid:
    vectorizer = _pickle.load(fid)

with open('svm_rbf_classifier.pkl', 'rb') as fid:
    classifier_rbf = _pickle.load(fid)

comment = 'من نسبت به نتایجی که تیم این روزا کسب میکنه نگرانم'
X = vectorizer.fit_transform([comment]).toarray()

predicted = classifier_rbf.predict(X)
print(predicted)

但是当我尝试它时,出现以下错误:

Traceback (most recent call last):
  File "C:/Projects/Sentiment/test.py", line 18, in <module>
    predicted = classifier_rbf.predict(X)
  File "C:\Python\Python36\lib\site-packages\sklearn\svm\base.py", line 576, in predict
    y = super(BaseSVC, self).predict(X)
  File "C:\Python\Python36\lib\site-packages\sklearn\svm\base.py", line 325, in predict
    X = self._validate_for_predict(X)
  File "C:\Python\Python36\lib\site-packages\sklearn\svm\base.py", line 478, in _validate_for_predict
    (n_features, self.shape_fit_[1]))
ValueError: X.shape[1] = 8 should be equal to 1500, the number of features at training time

我不明白这一点,因为我使用的是与训练和测试相同的矢量化器。我究竟做错了什么?

1 个答案:

答案 0 :(得分:1)

您不应该fit_transform您的注释数据,而只能对其进行转换。更改

X = vectorizer.fit_transform([comment]).toarray()

X = vectorizer.transform([comment]).toarray()