我如何分割我的df:
X=Final_df.drop('survived',axis=1)
Y=Final_df['survived']
X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.3,random_state=123 )
logreg=LogisticRegression()
logreg.fit(X_train,Y_train)
train,test = train_test_split(Final_df, test_size=0.2)
Y_pred=logreg.predict(Y_test)
IM出现类似以下错误:
ValueError Traceback (most recent call last)
<ipython-input-38-f81a6db0e9ae> in <module>()
----> 1 Y_pred=logreg.predict(Y_test)
~\Anaconda3\lib\site-packages\sklearn\linear_model\base.py in predict(self, X)
322 Predicted class label per sample.
323 """
--> 324 scores = self.decision_function(X)
325 if len(scores.shape) == 1:
326 indices = (scores > 0).astype(np.int)
~\Anaconda3\lib\site-packages\sklearn\linear_model\base.py in decision_function(self, X)
298 "yet" % {'name': type(self).__name__})
299
--> 300 X = check_array(X, accept_sparse='csr')
301
302 n_features = self.coef_.shape[1]
~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
439 "Reshape your data either using array.reshape(-1, 1) if "
440 "your data has a single feature or array.reshape(1, -1) "
--> 441 "if it contains a single sample.".format(array))
442 array = np.atleast_2d(array)
443 # To ensure that array flags are maintained
ValueError: Expected 2D array, got 1D array instead:
array=[0 0 0 0 0 0 0 1 0 1 0 1 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 1 1 1 1
0 0 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 0 0 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 1 0
0 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1
1 0 1 0 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1
1 1 1 1 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 1 1 0 1 0 0 1 0 0 1 0 1 0 1 0 1 1 1
0 0 0 1 0 1 0 0 0 1 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 1 0 1 1 0 0 0 0
1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0
1 1 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 1 0 1 0 1 1 0 0 0 0 0 1
1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 1 1 0 1 0 0 0 0 1 0 1 0 1 1 1 0 1 0 1 0 0
1 0 1 0 1 1 0 1].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
答案 0 :(得分:3)
您需要使用X_test
进行预测,{strong>不使用Y_test
。
X存储自变量(您用于预测的变量),Y存储因变量(您需要预测的变量)。
因此,您的最后一行应该是:
Y_pred=logreg.predict(X_test)
答案 1 :(得分:0)
模型应该预测X_test
而不是Y_test
。
使用此功能:
X=Final_df.drop('survived',axis=1)
Y=Final_df['survived']
X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.3,random_state=123 )
logreg=LogisticRegression()
logreg.fit(X_train,Y_train)
train,test = train_test_split(Final_df, test_size=0.2)
# Here is the change
Y_pred=logreg.predict(X_test)