如何将测试列与列车数据相匹配?

时间:2018-06-11 07:53:06

标签: machine-learning scikit-learn naivebayes

尝试使用朴素贝叶时出错。

from sklearn.naive_bayes import GaussianNB
import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/sjwhitworth/golearn/master/examples/datasets/tennis.csv')

X_train = pd.get_dummies(df[['outlook', 'temp', 'humidity', 'windy']])
y_train = df['play']

gNB = GaussianNB()
gNB.fit(X_train, y_train)

ndf=pd.DataFrame({'outlook':['sunny'], 'temp':['hot'], 'humidity':['normal'], 'windy':[False]})
X_test=pd.get_dummies(ndf[['outlook', 'temp', 'humidity', 'windy']])

gNB.predict(X_test)
  

ValueError:操作数无法与形状一起广播(1,4)   (9)

在这种情况下使用get_dummies方法是个好主意吗?

1 个答案:

答案 0 :(得分:1)

显然这不是vivek指出的好习惯,但如果你想做的话,你就是代码:

from sklearn.naive_bayes import GaussianNB
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/sjwhitworth/golearn/master/examples/datasets/tennis.csv')

X_train = pd.get_dummies(df[['outlook', 'temp', 'humidity', 'windy']])
y_train = df['play']

gNB = GaussianNB()
gNB.fit(X_train, y_train)

ndf=pd.DataFrame({'outlook':['sunny'], 'temp':['hot'], 'humidity':['normal'], 'windy':[False]})
X_test=pd.get_dummies(ndf[['outlook', 'temp', 'humidity', 'windy']])

dict1 = {}
X_test.columns
for i in X_train.columns:
  if i in X_test.columns:
    dict1.update({i:[1]})
  else:
    dict1.update({i:[0]})
X_test_new = pd.DataFrame(data = dict1)


gNB.predict(X_test_new)