样本数量不一致的变量,朴素贝叶斯

时间:2018-04-23 19:15:05

标签: python naivebayes

我有以下数值:

from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()

d = {'Pos': [1,2,3,4,5,6,7,8,9,10], 'Neg': [10,9,8,7,6,5,4,3,2,1], 'Res': ['win','win','win','win','draw','loss','loss','loss','loss','loss',]}
df = pd.DataFrame(d)

然后我尝试实现以下简单的Naive Bayes分类

train, test = train_test_split(df,test_size=0.2) 
train_data = (train.Pos.values, train.Neg.values) 
train_target = train.Res.values 
model.fit(train_data, train_target)

但是我一直收到以下错误:

Found input variables with inconsistent numbers of samples: [2, 8]

我已经进行了实验,似乎不是读取两个数组的值,而是读取了多少个数组(train.Pos.values,train.Neg.Values);这可能导致问题。

为什么会这样?以及如何更改我的代码以解决此问题?

2 个答案:

答案 0 :(得分:2)

使用

train, test = train_test_split(df,test_size=0.2)
train_data = train[['Pos', 'Neg']]
train_target = train['Res']

答案 1 :(得分:0)

您正在从数据框中创建一个numpy数组元组。您需要两列中的单个2D数组。

train, test = train_test_split(df, test_size=0.2)
train_data = train.values[:, :2] 
train_target = train.Res.values