get_dummies在python中无法正常工作

时间:2017-11-01 18:50:19

标签: python pandas numpy scikit-learn sklearn-pandas

我不知道为什么但我得到这个错误?由于未知原因,GetDummies正在删除一列。我希望'train'和'test'数据都具有相同的列数。

    data = pd.read_csv('data/trainData.csv')
    train , test = train_test_split(data , test_size= 0.20 )
    train = pd.get_dummies(train , columns =['job','marital','education','default','housing','loan','contact','month','day_of_week','poutcome'] , drop_first = True)
    c = DecisionTreeClassifier(min_samples_split=550)
    test  = pd.get_dummies(test , columns = ['job','marital','education','default','housing','loan','contact','month','day_of_week','poutcome'] , drop_first = True)
    train1 = train.iloc[:,0:9]
    train2 = train.iloc[:,10:]
    X_train =pd.concat([train1 , train2] , axis =1)
    test1 = test.iloc[:,0:9]
    test2 = test.iloc[:,10:]
    X_test =pd.concat([test1 , test2] , axis =1)
    y_train = train["Class"]
    dt = c.fit(X_train , y_train)
    y_true = test["Class"]
    y_true = y_true.values
    y_scores = c.predict(X_test)

我得到的错误低于......

 `ValueError                                Traceback (most recentcall last)
 <ipython-input-20-9cc441bd0222> in <module>()
 13 y_true = test["Class"]
 14 y_true = y_true.values
  ---> 15 y_scores = c.predict(X_test)

  /home/ram98/anaconda3/lib/python3.6/site-packages/sklearn/tree/tree.py in predict(self, X, check_input)
410         """
411         check_is_fitted(self, 'tree_')
--> 412         X = self._validate_X_predict(X, check_input)
413         proba = self.tree_.predict(X)
414         n_samples = X.shape[0]

/home/ram98/anaconda3/lib/python3.6/site-packages/sklearn/tree/tree.py in _validate_X_predict(self, X, check_input)
382                              "match the input. Model n_features is %s and "
383                              "input n_features is %s "
--> 384                              % (self.n_features_, n_features))
385 
386         return X

ValueError: Number of features of the model must match the input. Model n_features is 52 and input n_features is 51 `

0 个答案:

没有答案