我一直在使用sklearn进行情感分析。我有一个3000多条评论的csv文件,我在60%的评论中训练我的分类器。 当我尝试使用CountVectorizer.transform()为分类器提供自定义评论来预测标签时,它会抛出以下错误:
Anaconda\lib\site-packages\sklearn\feature_extraction\text.py", line 864, in transform
raise ValueError("Vocabulary wasn't fitted or is empty!")
ValueError: Vocabulary wasn't fitted or is empty!
请帮助我,这是适合训练集的代码:
def preprocess():
data,target = load_file()
count_vectorizer = CountVectorizer(binary='true',min_df=1)
data = count_vectorizer.fit_transform(data)
tfidf_data = TfidfTransformer(use_idf=False).fit_transform(data)
return tfidf_data
这是用于预测自定义评论情绪的代码:
def customQuestionScorer(question, clf):
X_new_tfidf = vectorizer.transform([question]).toarray()
print (clf.predict(X_new_tfidf))
q = "I really like this movie"
customQuestionScorer(q,classifier)
答案 0 :(得分:1)