测试列车分割:错误

时间:2018-07-21 06:06:33

标签: python pandas dataframe scikit-learn train-test-split

我如何分割我的df:

X=Final_df.drop('survived',axis=1)
Y=Final_df['survived']


X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.3,random_state=123    )
logreg=LogisticRegression()
logreg.fit(X_train,Y_train)
train,test = train_test_split(Final_df, test_size=0.2)
Y_pred=logreg.predict(Y_test)

IM出现类似以下错误:

ValueError                                Traceback (most recent call last)
<ipython-input-38-f81a6db0e9ae> in <module>()
----> 1 Y_pred=logreg.predict(Y_test)

~\Anaconda3\lib\site-packages\sklearn\linear_model\base.py in predict(self, X)
    322             Predicted class label per sample.
    323         """
--> 324         scores = self.decision_function(X)
    325         if len(scores.shape) == 1:
    326             indices = (scores > 0).astype(np.int)

~\Anaconda3\lib\site-packages\sklearn\linear_model\base.py in decision_function(self, X)
    298                                  "yet" % {'name': type(self).__name__})
    299 
--> 300         X = check_array(X, accept_sparse='csr')
    301 
    302         n_features = self.coef_.shape[1]

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    439                     "Reshape your data either using array.reshape(-1, 1) if "
    440                     "your data has a single feature or array.reshape(1, -1) "
--> 441                     "if it contains a single sample.".format(array))
    442             array = np.atleast_2d(array)
    443             # To ensure that array flags are maintained

ValueError: Expected 2D array, got 1D array instead:
array=[0 0 0 0 0 0 0 1 0 1 0 1 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 1 1 1 1
 0 0 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 0 0 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0
 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 1 0
 0 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1
 1 0 1 0 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1
 1 1 1 1 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 1 1 0 1 0 0 1 0 0 1 0 1 0 1 0 1 1 1
 0 0 0 1 0 1 0 0 0 1 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 1 0 1 1 0 0 0 0
 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0
 1 1 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 1 0 1 0 1 1 0 0 0 0 0 1
 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 1 1 0 1 0 0 0 0 1 0 1 0 1 1 1 0 1 0 1 0 0
 1 0 1 0 1 1 0 1].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

2 个答案:

答案 0 :(得分:3)

您需要使用X_test进行预测,{strong>不使用Y_test。 X存储自变量(您用于预测的变量),Y存储因变量(您需要预测的变量)。

因此,您的最后一行应该是:

Y_pred=logreg.predict(X_test)

答案 1 :(得分:0)

模型应该预测X_test而不是Y_test

使用此功能:

X=Final_df.drop('survived',axis=1)
Y=Final_df['survived']


X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.3,random_state=123    )
logreg=LogisticRegression()
logreg.fit(X_train,Y_train)
train,test = train_test_split(Final_df, test_size=0.2)

# Here is the change
Y_pred=logreg.predict(X_test)