scikit预测执行不同?

时间:2016-05-09 14:16:08

标签: python scikit-learn text-classification naivebayes

我正在尝试使用以下代码来学习scikit-learn:

import numpy as np

from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.feature_extraction.text import HashingVectorizer
from sklearn.naive_bayes import BernoulliNB, MultinomialNB

dataset = {
    "data": {
        "chinese beijing chinese",
        "chinese chinese shanghai",
        "chinese macao",
        "tokyo japan chinese",
    },
    "target": np.array([0,0,0,1]),
    "target_names": ["c","c","c","j"]
}

vectorizer = TfidfVectorizer(sublinear_tf=True, max_df=0.5, stop_words='english')
X_train = vectorizer.fit_transform(dataset["data"])
y_train = dataset["target"]

clf = MultinomialNB(alpha=.01)
clf.fit(X_train, y_train)

X_test = vectorizer.transform(["chinese chinese chinese tokyo japan"])
pred = clf.predict(X_test)

print(pred)

我知道我用太少的数据进行测试,但几次运行的输出是不同的:

$ python3 teste.py 
[0]
$ python3 teste.py 
[1]
$ python3 teste.py 
[0]
$ python3 teste.py 
[1]
$ python3 teste.py 
[0]
$ python3 teste.py 
[0]
$ python3 teste.py 
[0]
$ python3 teste.py 
[0]
$ python3 teste.py 
[0]
$ python3 teste.py 
[1]
$ python3 teste.py 
[1]
$ python3 teste.py 
[0]

我预计输出总是“[1]”。

是小输入数据集的变量输出结果吗? 或者结果应该按照我的预期确定?

我正在使用python3 3.4.3,scikit-learn 0.17.1和numpy 1.11.0。

0 个答案:

没有答案