Question

在练习简单线性回归模型时，出现此错误：

ValueError: Expected 2D array, got scalar array instead:
array=60.
Reshape your data either using array.reshape(-1, 1) if your data has a single 
feature or array.reshape(1, -1) if it contains a single sample.

这是我的代码（Python 3.7）：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
data = pd.read_csv("hw_25000.csv")


hgt = data.Height.values.reshape(-1,1)
wgt = data.Weight.values.reshape(-1,1)

regression = LinearRegression()
regression.fit(hgt,wgt)

print(regression.predict(60))


print(data.columns)

plt.scatter(data.Height,data.Weight)
x = np.arange(min(data.Height),max(data.Height)).reshape(-1,1)
plt.plot(x,regression.predict(x),color="red")
plt.xlabel("Height")
plt.ylabel("Weight")
plt.title("Simple Linear Regression Model")
plt.show()

print(r2_score(wgt,regression.predict(hgt)))

我真的不知道我的代码有什么问题，所以我非常感谢您。谢谢

Answer 1

简短回答：

regression.predict([[60]])

长答案： gression.predict接受您要预测的二维数组。数组中的每个项目都是您要模型进行预测的“点”。假设我们要预测点60、52和31。然后我们说regression.predict([[60], [52], [31]])

之所以需要2d数组，是因为我们可以在比2d高的空间中进行线性回归。例如，我们可以在3d空间中进行线性回归。假设我们要预测给定数据点（x，y）的“ z”。然后我们需要说“ regression.predict（[[x，y]]）。

再来看这个例子，我们可以为一组“ x”和“ y”点预测“ z”。例如，我们要预测每个点的“ z”值：（0，2），（3，7），（10，8）。然后，我们将说“ regression.predict（[[0，2]，[3，7]，[10，8]]）”，这充分表明了对gression.predict需要采用二维值数组来预测点的需求。 / p>

Answer 2

ValueError相当清楚，预测会期望使用2D数组，但是您传递了标量。

hgt = np.random.randint(50, 70, 10).reshape(-1, 1)
wgt = np.random.randint(90, 120, 10).reshape(-1, 1)
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

regression = LinearRegression()
regression.fit(hgt,wgt)

regression.predict([[60]])

你得到

array([[105.10013717]])

ValueError：预期的2D数组，而是标量数组

2 个答案: