为什么BernoulliNB分类器为所有数据集返回相同的值?

时间:2017-05-23 10:26:34

标签: python pandas machine-learning scikit-learn ipython

我尝试使用BernoulliNB来预测虹膜数据集。但是训练的模型返回一些随机值作为所有测试数据集的预测。我尝试了与决策树相同的数据集,其中训练模型以良好的准确度预测了测试数据集。

import pandas as pn
import sklearn as sk
from sklearn.model_selection import train_test_split as lk

def labelmod(x):
    if(x =='Iris-versicolor'):
        return 0
    elif(x =='Iris-setosa'):
        return 1
    elif(x =='Iris-virginica'):
        return 3
    else:
        return

def celldif(x):
    return x.apply(labelmod)

ok = pn.read_csv(r"C:\Users\s420105\Desktop\iris.csv",header = None)
data=ok.dropna()
labels = data.ix[:,4:]
labels=labels.apply(celldif)
data=data.ix[:,0:3]
train_data,test_data,train_label,test_label=lk(data,labels,test_size=0.3)

from sklearn.naive_bayes import BernoulliNB 
classifier = BernoulliNB().fit(train_data,train_label.values.ravel())
result= classifier.predict(test_data)
result

结果返回 array([0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0        0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,dtype = int64)as输出

测试和训练标签都没问题。对于决策树,我通过了train_label而没有包含values.ravel()

1 个答案:

答案 0 :(得分:1)

您需要做的就是在BernoulliNB之前缩放数据。我使用scikit-learn的iris数据集,因为我没有您的CSV。但这不是ravel()的问题,而是数据扩展问题。

import pandas as pn
import sklearn as sk
from sklearn.model_selection import train_test_split as lk
from sklearn import datasets
from sklearn.naive_bayes import BernoulliNB
from sklearn.preprocessing import StandardScaler

data = datasets.load_iris().data
labels = datasets.load_iris().target

data = StandardScaler().fit_transform(data)

train_data,test_data,train_label,test_label=lk(data,labels,test_size=0.3)
classifier = BernoulliNB().fit(train_data,train_label)
result= classifier.predict(test_data)
print(result)