python scikit - ValueError

时间:2014-12-11 06:24:35

标签: python numpy scipy scikit-learn

我正在关注如何保存分类器的stackoverflow here中的帖子。当我尝试按照第二篇文章中提到的方式。但我一直在

  

ValueError:词汇表未安装或为空!

我的培训代码如下:

train = load_files(learning_data_train)
count_vect = CountVectorizer(tokenizer=tokenize,stop_words='english')
X_train_counts = count_vect.fit_transform(train.data)
clf = SGDClassifier(loss='hinge', penalty='l1',alpha=1e-3, n_iter=5).fit(X_train_counts, train.target)
filename = "SGD.pk1"
joblib.dump(clf, filename)

我的测试代码如下:

count_vect = CountVectorizer(tokenizer=tokenize,stop_words='english')
filename = "SGD.pk1"
clf = joblib.load(filename)
print clf 
file= "testfolder/"
docs_new = []
for i in os.listdir(file):
    docs_new.append(open(file+i,"r").read())
X_new_counts = count_vect.transform(docs_new)
predicted = clf.predict(X_new_counts)
for doc, category in zip(docs_new, predicted):
    print(' => %s' % ( train.target_names[category]))

执行

时抛出错误
X_new_counts = count_vect.transform(docs_new)

我在这里做错了吗?