从同一个程序中获得不同精度的原因是什么?

时间:2018-04-18 03:02:27

标签: python machine-learning scikit-learn logistic-regression

我尝试计算我的标记器的准确性。但是当我运行程序时,我总是得到不同的准确性,尽管我使用相同的训练和开发数据。这个结果背后的原因是什么?提前谢谢。

with open('train.txt') as f:
training_sentences = list(splitter(f))

with open('develop.txt') as f:
test_sentences = list(splitter(f))

.
.
.
SOME FEATURES AS A LIST OF DICTS
.
.
.

def transform_to_dataset(training_sentences):
    X, y = [], []
    for tagged in training_sentences:
        for index in range(len(tagged)):
            X.append(features(untag(tagged), index))
            y.append(tagged[index][1])
    return X, y

X, y = transform_to_dataset(training_sentences)


clf = Pipeline([
    ('vectorizer', DictVectorizer(sparse=False)),
    ('classifier', DecisionTreeClassifier(criterion='entropy'))
])

clf.fit(X, y)  

X_test, y_test = transform_to_dataset(test_sentences)

print "Accuracy:", clf.score(X_test, y_test)

1 个答案:

答案 0 :(得分:0)

sklearn DecisionTreeClassifier使用随机数生成器来确定其splitting。如果要保证每次运行的结果相同,请设置分类器的random_state参数