Question

我正在尝试使用CountVectorizer对语料库中的单词数组进行矢量化处理。一套训练，另一套进行测试。训练集还可以，我使用fit_transform（data）对其进行训练。但是我不能在测试集上使用fit（），但是我仍然应该将其向量化以用于稍后检查预测的准确性。因此，仅调用transform（）。但是，在调用transform时，出现错误告诉我词汇不适合。

我尝试使用_validate_vocabulary（），但这不能解决问题。

in check_is_fitted
    raise NotFittedError(msg % {'name': type(estimator).__name__})
sklearn.exceptions.NotFittedError: CountVectorizer - Vocabulary wasn't fitted      


  self.count_vectorizer = CountVectorizer(lowercase=False,
                                            tokenizer=tokenize_my,
                                            max_features=5000)



    temp_vector = self.count_vectorizer
    temp_vector._validate_vocabulary()
    vec = temp_vector.transform(data)
    vec_tfidf = self.tfidf.transform(data)

    return vec, vec_tfidf

以前从未使用Python编程过，所以也许我在这里错过了一些基本知识，但是无法弄清楚我在这里做错了什么。

尝试使用CountVectorizer的transform（）函数，而不使用fit（）

0 个答案: