我尝试计算我的标记器的准确性。但是当我运行程序时,我总是得到不同的准确性,尽管我使用相同的训练和开发数据。这个结果背后的原因是什么?提前谢谢。
with open('train.txt') as f:
training_sentences = list(splitter(f))
with open('develop.txt') as f:
test_sentences = list(splitter(f))
.
.
.
SOME FEATURES AS A LIST OF DICTS
.
.
.
def transform_to_dataset(training_sentences):
X, y = [], []
for tagged in training_sentences:
for index in range(len(tagged)):
X.append(features(untag(tagged), index))
y.append(tagged[index][1])
return X, y
X, y = transform_to_dataset(training_sentences)
clf = Pipeline([
('vectorizer', DictVectorizer(sparse=False)),
('classifier', DecisionTreeClassifier(criterion='entropy'))
])
clf.fit(X, y)
X_test, y_test = transform_to_dataset(test_sentences)
print "Accuracy:", clf.score(X_test, y_test)
答案 0 :(得分:0)
sklearn
DecisionTreeClassifier
使用随机数生成器来确定其splitting。如果要保证每次运行的结果相同,请设置分类器的random_state
参数