我正在尝试提高对线性回归/多重线性回归的理解。我在YouTube上观看了此视频,他在Excel中使用了回归工具对一组数据进行线性回归。
https://www.youtube.com/watch?v=HgfHefwK7VQ&list=PLo8L7S3J29iOX0pvRqAgLDDdwobNWqG9C&index=21&t=0s
他使用A,B和C作为因变量的预测的最终答案是45149.21
成本是自变量
这是我一直用来尝试复制他的结果的方法
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
# create linear regression object
lm = LinearRegression()
# develop a model using these variables as predictor variables
X = df[['A Made', 'B Made', 'C Made']]
Y = df['Cost']
# Fit the linear model using the three above-mentioned variables.
lm.fit(X , Y)
# value of the intercept
intercept = lm.intercept_
# values of the coefficients
coef = lm.coef_.tolist()
# final estimated linear model
Z = intercept + (coef[0] * 1200) + (coef[1] * 800) + (coef[2] * 1000)
吐出的预测值是
Z = 10606.098714826765
intercept = 35108.59711204488
coefficient (list) = [2.072061216849437, 4.153422708041111, 4.796887088174573]
有问题的实际数据
data = {
'Month':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19],
'Cost':[44439,43936,44464,41533,46343,44922,43203,43000,40967,48582,45003,44303,42070,44353,45968,47781,43202,44074,44610],
'A Made':[515,929,800,979,1165,651,847,942,630,1113,1086,843,500,813,1190,1200,731,1089,786],
'B Made':[541,692,710,685,1147,939,755,908,738,1175,1075,640,752,989,823,1108,590,607,513],
'C Made':[928,711,824,758,635,901,580,589,682,1050,984,828,708,804,904,1120,1065,1132,839]
}
df = pd.DataFrame(data)
我希望预测值接近该44000值。我在做什么错了?
编辑:松懈地找到正确的过程。再次检查后,截距打印出-2值。然后,我在分配截距值的地方做了一些调整,然后又回到了应该的位置。
感谢所有回答的人。非常感谢!
答案 0 :(得分:1)
我刚刚尝试了您的代码,并在将Z转换为{:{1}}时得到了此代码,唯一更改的是导入:从45714.69582687167
到from sklearn.linear_model import LinearRegression()
答案 1 :(得分:1)
再做一次,您的过程是正确的。您无需手动提取系数并进行拦截。
x_test = [[1200, 800, 1000]]
y_predict = lm.predict(x_test)
输出
array([[45714.69582687]])
顺便说一句,修复from sklearn.linear_model import LinearRegression
答案 2 :(得分:1)
Z = 45714.69582687167
这就是我通过运行代码获得的结果,该代码接近44000
并将导入更改为
from sklearn.linear_model import LinearRegression