我正在尝试对数据集运行简单的线性回归并检索系数。来自.csv
文件的数据如下所示:
"","time","LakeHuron"
"1",1875,580.38
"2",1876,581.86
"3",1877,580.97
"4",1878,580.8
...
import pandas as pd
import numpy as np
from sklearn import datasets, linear_model
def Main():
location = r"~/Documents/Time Series/LakeHuron.csv"
ts = pd.read_csv(location, sep=",", parse_dates=[0], header=None)
ts.drop(ts.columns[[0]], axis=1, inplace=True)
length = len(ts)
x = ts[1].values
y = ts[2].values
x = x.reshape(length, 1)
y = y.reshape(length, 1)
regr = linear_model.LinearRegression()
regr.fit(x, y)
print(regr.coef_)
if __name__ == "__main__":
Main()
由于这是一个简单的线性模型,因此$ Y_t = a_0 + a_1 * t $,在这种情况下应为$ Y_t = 580.202 -0.0242t $。以及运行上述代码时打印出来的所有内容都是[[-0.02420111]]
。无论如何得到第二个系数580.202?
我查看了http://scikit-learn.org/stable/modules/linear_model.html上的文档,它在数组中输出了两个变量。
答案 0 :(得分:1)
看起来你只有一个X和一个Y,所以输出是正确的。 试试这个:
#coef_ : array, shape (n_features, ) or (n_targets, n_features)
print(regr.coef_)
#intercept_ : array Independent term in the linear model.
print(regr.intercept_)