Question

我正在尝试使用以下代码来学习scikit-learn：

import numpy as np

from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.feature_extraction.text import HashingVectorizer
from sklearn.naive_bayes import BernoulliNB, MultinomialNB

dataset = {
    "data": {
        "chinese beijing chinese",
        "chinese chinese shanghai",
        "chinese macao",
        "tokyo japan chinese",
    },
    "target": np.array([0,0,0,1]),
    "target_names": ["c","c","c","j"]
}

vectorizer = TfidfVectorizer(sublinear_tf=True, max_df=0.5, stop_words='english')
X_train = vectorizer.fit_transform(dataset["data"])
y_train = dataset["target"]

clf = MultinomialNB(alpha=.01)
clf.fit(X_train, y_train)

X_test = vectorizer.transform(["chinese chinese chinese tokyo japan"])
pred = clf.predict(X_test)

print(pred)

我知道我用太少的数据进行测试，但几次运行的输出是不同的：

$ python3 teste.py 
[0]
$ python3 teste.py 
[1]
$ python3 teste.py 
[0]
$ python3 teste.py 
[1]
$ python3 teste.py 
[0]
$ python3 teste.py 
[0]
$ python3 teste.py 
[0]
$ python3 teste.py 
[0]
$ python3 teste.py 
[0]
$ python3 teste.py 
[1]
$ python3 teste.py 
[1]
$ python3 teste.py 
[0]

我预计输出总是“[1]”。

是小输入数据集的变量输出结果吗？或者结果应该按照我的预期确定？

我正在使用python3 3.4.3，scikit-learn 0.17.1和numpy 1.11.0。

scikit预测执行不同？

0 个答案: