我尝试使用BernoulliNB来预测虹膜数据集。但是训练的模型返回一些随机值作为所有测试数据集的预测。我尝试了与决策树相同的数据集,其中训练模型以良好的准确度预测了测试数据集。
import pandas as pn
import sklearn as sk
from sklearn.model_selection import train_test_split as lk
def labelmod(x):
if(x =='Iris-versicolor'):
return 0
elif(x =='Iris-setosa'):
return 1
elif(x =='Iris-virginica'):
return 3
else:
return
def celldif(x):
return x.apply(labelmod)
ok = pn.read_csv(r"C:\Users\s420105\Desktop\iris.csv",header = None)
data=ok.dropna()
labels = data.ix[:,4:]
labels=labels.apply(celldif)
data=data.ix[:,0:3]
train_data,test_data,train_label,test_label=lk(data,labels,test_size=0.3)
from sklearn.naive_bayes import BernoulliNB
classifier = BernoulliNB().fit(train_data,train_label.values.ravel())
result= classifier.predict(test_data)
result
结果返回 array([0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,dtype = int64)as输出
测试和训练标签都没问题。对于决策树,我通过了train_label而没有包含values.ravel()
答案 0 :(得分:1)
您需要做的就是在BernoulliNB
之前缩放数据。我使用scikit-learn
的iris数据集,因为我没有您的CSV。但这不是ravel()
的问题,而是数据扩展问题。
import pandas as pn
import sklearn as sk
from sklearn.model_selection import train_test_split as lk
from sklearn import datasets
from sklearn.naive_bayes import BernoulliNB
from sklearn.preprocessing import StandardScaler
data = datasets.load_iris().data
labels = datasets.load_iris().target
data = StandardScaler().fit_transform(data)
train_data,test_data,train_label,test_label=lk(data,labels,test_size=0.3)
classifier = BernoulliNB().fit(train_data,train_label)
result= classifier.predict(test_data)
print(result)