我正在关注如何保存分类器的stackoverflow here中的帖子。当我尝试按照第二篇文章中提到的方式。但我一直在
ValueError:词汇表未安装或为空!
我的培训代码如下:
train = load_files(learning_data_train)
count_vect = CountVectorizer(tokenizer=tokenize,stop_words='english')
X_train_counts = count_vect.fit_transform(train.data)
clf = SGDClassifier(loss='hinge', penalty='l1',alpha=1e-3, n_iter=5).fit(X_train_counts, train.target)
filename = "SGD.pk1"
joblib.dump(clf, filename)
我的测试代码如下:
count_vect = CountVectorizer(tokenizer=tokenize,stop_words='english')
filename = "SGD.pk1"
clf = joblib.load(filename)
print clf
file= "testfolder/"
docs_new = []
for i in os.listdir(file):
docs_new.append(open(file+i,"r").read())
X_new_counts = count_vect.transform(docs_new)
predicted = clf.predict(X_new_counts)
for doc, category in zip(docs_new, predicted):
print(' => %s' % ( train.target_names[category]))
执行
时抛出错误X_new_counts = count_vect.transform(docs_new)
我在这里做错了吗?
答案 0 :(得分:0)
您使用过CountVectorizer,尝试使用fit_transform
X_new_counts = count_vect.fit_transform(docs_new)
检查: