Question

我的线性回归与单个功能完美配合。自尝试使用两个变量以来，出现以下错误：ValueError：发现输入变量的样本数不一致：[2，1]

第一个打印语句将打印以下内容：（2，6497）（1，6497）

然后代码在train_test_split阶段崩溃。

有什么想法吗？

feat_scores = {}
X = df[['alcohol','density']].values.reshape(2,-1)   
y = df['quality'].values.reshape(1,-1)

print (X.shape, y.shape)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

print (X_train.shape, y_train.shape)
print (X_test.shape, y_test.shape)

reg = LinearRegression()
reg.fit(X_train, y_train)

reg.predict(y_train)

Answer 1

您错过了这一行

X = df[['alcohol','density']].values.reshape(2,-1)   
y = df['quality'].values.reshape(1,-1)

不要将数据重塑为（2，6497）（1，6497），而是必须将其指定为（6497,2）（6497，） >

Sklearn直接获取数据帧/系列。所以你可以给

X = df[['alcohol','density']]
y = df['quality']

此外，您只能使用X值进行预测，因此

reg.predict(X_train)

或

reg.predict(X_test)

如何在python中修复sklearn多元线性回归ValueError（样本数量不一致：[2，1]）

1 个答案: