我正在尝试使用以下代码来学习scikit-learn:
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.feature_extraction.text import HashingVectorizer
from sklearn.naive_bayes import BernoulliNB, MultinomialNB
dataset = {
"data": {
"chinese beijing chinese",
"chinese chinese shanghai",
"chinese macao",
"tokyo japan chinese",
},
"target": np.array([0,0,0,1]),
"target_names": ["c","c","c","j"]
}
vectorizer = TfidfVectorizer(sublinear_tf=True, max_df=0.5, stop_words='english')
X_train = vectorizer.fit_transform(dataset["data"])
y_train = dataset["target"]
clf = MultinomialNB(alpha=.01)
clf.fit(X_train, y_train)
X_test = vectorizer.transform(["chinese chinese chinese tokyo japan"])
pred = clf.predict(X_test)
print(pred)
我知道我用太少的数据进行测试,但几次运行的输出是不同的:
$ python3 teste.py
[0]
$ python3 teste.py
[1]
$ python3 teste.py
[0]
$ python3 teste.py
[1]
$ python3 teste.py
[0]
$ python3 teste.py
[0]
$ python3 teste.py
[0]
$ python3 teste.py
[0]
$ python3 teste.py
[0]
$ python3 teste.py
[1]
$ python3 teste.py
[1]
$ python3 teste.py
[0]
我预计输出总是“[1]”。
是小输入数据集的变量输出结果吗? 或者结果应该按照我的预期确定?
我正在使用python3 3.4.3,scikit-learn 0.17.1和numpy 1.11.0。