我在Scikit Learn中遇到过一些关于LinearRegression算法的问题 - 我已经在论坛上搜索了很多,但是由于某些原因,我没有设法绕过这个错误。我使用的是Python 3.5
以下是我的尝试,但不断得到一个值错误:“找到样本数不一致的输入变量:[403,174]”
X = df[["Impressions", "Clicks", "Eligible_Impressions", "Measureable_Impressions", "Viewable_Impressions"]].values
y = df["Total_Conversions"].values.reshape(-1,1)
print ("The shape of X is {}".format(X.shape))
print ("The shape of y is {}".format(y.shape))
The shape of X is (577, 5)
The shape of y is (577, 1)
X_train, y_train, X_test, y_test = train_test_split(X, y, test_size=0.3, random_state = 42)
linreg = LinearRegression()
linreg.fit(X_train, y_train)
y_pred = linreg.predict(X_test)
print (y_pred)
print ("The shape of X_train is {}".format(X_train.shape))
print ("The shape of y_train is {}".format(y_train.shape))
print ("The shape of X_test is {}".format(X_test.shape))
print ("The shape of y_test is {}".format(y_test.shape))
The shape of X_train is (403, 5)
The shape of y_train is (174, 5)
The shape of X_test is (403, 1)
The shape of y_test is (174, 1)
我错过了一些明显的东西吗?
非常感谢任何帮助。
亲切的问候, 阿德里安
答案 0 :(得分:2)
看起来你的火车和测试包含X和y的不同行数。这是因为你以错误的顺序存储了train_test_split()的返回值
更改此
X_train, y_train, X_test, y_test = train_test_split(X, y, test_size=0.3, random_state = 42)
到此
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state = 42)