我不知道为什么但我得到这个错误?由于未知原因,GetDummies正在删除一列。我希望'train'和'test'数据都具有相同的列数。
data = pd.read_csv('data/trainData.csv')
train , test = train_test_split(data , test_size= 0.20 )
train = pd.get_dummies(train , columns =['job','marital','education','default','housing','loan','contact','month','day_of_week','poutcome'] , drop_first = True)
c = DecisionTreeClassifier(min_samples_split=550)
test = pd.get_dummies(test , columns = ['job','marital','education','default','housing','loan','contact','month','day_of_week','poutcome'] , drop_first = True)
train1 = train.iloc[:,0:9]
train2 = train.iloc[:,10:]
X_train =pd.concat([train1 , train2] , axis =1)
test1 = test.iloc[:,0:9]
test2 = test.iloc[:,10:]
X_test =pd.concat([test1 , test2] , axis =1)
y_train = train["Class"]
dt = c.fit(X_train , y_train)
y_true = test["Class"]
y_true = y_true.values
y_scores = c.predict(X_test)
我得到的错误低于......
`ValueError Traceback (most recentcall last)
<ipython-input-20-9cc441bd0222> in <module>()
13 y_true = test["Class"]
14 y_true = y_true.values
---> 15 y_scores = c.predict(X_test)
/home/ram98/anaconda3/lib/python3.6/site-packages/sklearn/tree/tree.py in predict(self, X, check_input)
410 """
411 check_is_fitted(self, 'tree_')
--> 412 X = self._validate_X_predict(X, check_input)
413 proba = self.tree_.predict(X)
414 n_samples = X.shape[0]
/home/ram98/anaconda3/lib/python3.6/site-packages/sklearn/tree/tree.py in _validate_X_predict(self, X, check_input)
382 "match the input. Model n_features is %s and "
383 "input n_features is %s "
--> 384 % (self.n_features_, n_features))
385
386 return X
ValueError: Number of features of the model must match the input. Model n_features is 52 and input n_features is 51 `