SKlearn LogisticRegression输入形状错误ERROR

时间:2017-10-25 18:26:21

标签: python scikit-learn logistic-regression

如何摆脱此错误

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import pandas as pd

df =  pd.read_csv("train.csv")

clean = {"Sex": {"male":1, "female":0}} 
df.replace(clean, inplace = True)
df["label"] = df['Survived']
df =  df.drop(["Name","Ticket","Cabin","Embarked","Fare","Parch","Survived"],  axis = 1)
df = df.dropna(axis = 0, how="any")

X = df.drop(["label"],axis = 1).values
y = df["label"].values

X_train , y_train, X_test, y_test = train_test_split(X, y, test_size = 0.7)

log_reg  = LogisticRegression()
log_reg.fit(X_train, y_train)
print("Accuracy on test subset: (:.3f)".format(log_reg.score(X_train, y_train)))

ERROR
Traceback (most recent call last):
  File "C:\Users\user\Documents\17\kaggle'\logistic.py", line 20, in <module>
    log_reg.fit(X_train, y_train)
  File "C:\Users\user\AppData\Local\Programs\Python\Python36-32\lib\site-packages\sklearn\linear_model\logistic.py", line 1216, in fit
    order="C")
  File "C:\Users\user\AppData\Local\Programs\Python\Python36-32\lib\site-packages\sklearn\utils\validation.py", line 547, in check_X_y
    y = column_or_1d(y, warn=True)
  File "C:\Users\user\AppData\Local\Programs\Python\Python36-32\lib\site-packages\sklearn\utils\validation.py", line 583, in column_or_1d
    raise ValueError("bad input shape {0}".format(shape))
ValueError: bad input shape (500, 5)

1 个答案:

答案 0 :(得分:1)

错误是由于:

X_train , y_train, X_test, y_test = train_test_split(X, y, test_size = 0.7)

这不是train_test_split返回的内容。

实际用法应为:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.7)

train_test_split将按提供的数据顺序返回拆分的数组。因此,X将被拆分为X_train, X_test并首先返回,然后y将返回为y_train y_test。 希望这会有所帮助。