我编写了一个代码,使用WLS进行多元回归,形成最佳拟合线的等式。这是我正在使用的代码:
import numpy as np
import csv
import pandas as pd
import statsmodels.api as sm
from statsmodels.sandbox.regression.predstd import wls_prediction_std
#read data from csv
readCSV=pd.read_csv('test2.csv')
#store data into corresponding month lists
mJan=readCSV['month__Jan']
mFeb=readCSV['month__Feb']
mMar=readCSV['month__Mar']
mApr=readCSV['month__Apr']
mMay=readCSV['month__May']
mJun=readCSV['month__Jun']
mJul=readCSV['month__Jul']
mAug=readCSV['month__Aug']
mSep=readCSV['month__Sep']
mOct=readCSV['month__Oct']
mNov=readCSV['month__Nov']
mDec=readCSV['month__Dec']
#store data into corresponding weekdays lists
wMon=readCSV['weekday__Mon']
wTue=readCSV['weekday__Tue']
wWed=readCSV['weekday__Wed']
wThu=readCSV['weekday__Thu']
wFri=readCSV['weekday__Fri']
#convert month pandas data to numpy arrays
mJan=mJan.as_matrix()
mFeb=mFeb.as_matrix()
mMar=mMar.as_matrix()
mApr=mApr.as_matrix()
mMay=mMay.as_matrix()
mJun=mJun.as_matrix()
mJul=mJul.as_matrix()
mAug=mAug.as_matrix()
mSep=mSep.as_matrix()
mOct=mOct.as_matrix()
mNov=mNov.as_matrix()
mDec=mDec.as_matrix()
#convert weekday pandas data frame to numpy arrays
wMon=wMon.as_matrix()
wTue=wTue.as_matrix()
wWed=wWed.as_matrix()
wThu=wThu.as_matrix()
wFri=wFri.as_matrix()
prices=[]
dayNum=[]
with open('aapl.csv','rb') as csvfile:
csvFileReader=csv.reader(csvfile)
next(csvFileReader)
for row in csvFileReader:
fullDate=row[0]
#print(fullDate[7:])
l=row[0].split()
s=l[1].strip(',')
dayNum.append(int(s))
prices.append(float(row[1]))
#final input data set
X=np.vstack((dayNum,mJan,mFeb,mMar,mApr,mMay,mJun,mJul,mAug,mSep,mOct,mNov,mDec,wMon,wTue,wWed,wThu,wFri)).T
#final target parameter
mod_wls = sm.WLS(prices, X)
res_wls = mod_wls.fit()
print res_wls.params
print res_wls.summary()
代码工作得很好,我得到的输出看起来像这样:
> [ 6.74096438e-03 4.18506571e+01 5.60244116e+01 3.54806772e+01
> 2.83052274e+01 1.74134713e+01 1.86328033e+01 2.11088939e+01
> 2.97457264e+01 3.31973702e+01 3.80025208e+01 3.21529043e+01
> 3.64722064e+01 7.74843964e+01 7.77925151e+01 7.79671536e+01
> 7.76580605e+01 7.74847444e+01]
> WLS Regression Results
> ============================================================================== Dep. Variable: y R-squared:
> 0.999 Model: WLS Adj. R-squared: 0.999 Method: Least Squares F-statistic: 1.041e+04 Date: Thu, 16 Mar 2017 Prob (F-statistic): 9.43e-313 Time: 13:51:47 Log-Likelihood: -668.01 No. Observations: 240 AIC: 1370. Df Residuals: 223 BIC: 1429. Df Model: 17
> ==============================================================================
> coef std err t P>|t| [95.0% Conf. Int.]
> ------------------------------------------------------------------------------ x1 0.0067 0.030 0.227 0.821 -0.052
> 0.065 x2 41.8507 0.888 47.124 0.000 40.101 43.601 x3 56.0244 0.899 62.321 0.000 54.253 57.796 x4 35.4807 1.257 28.221 0.000 33.003 37.958 x5 28.3052 0.864 32.746 0.000 26.602 30.009 x6 17.4135 0.862 20.207 0.000 15.715 19.112 x7 18.6328 0.845 22.043 0.000 16.967 20.299 x8 21.1089 0.887 23.792 0.000 19.360 22.857 x9 29.7457 0.828 35.922 0.000 28.114 31.378 x10 33.1974 0.870 38.178 0.000 31.484 34.911 x11 38.0025 0.866 43.872 0.000 36.296 39.710 x12 32.1529 0.861 37.338 0.000 30.456 33.850 x13 36.4722 0.865 42.188 0.000 34.769 38.176 x14 77.4844 0.699 110.787 0.000 76.106 78.863 x15 77.7925 0.642 121.266 0.000 76.528 79.057 x16 77.9672 0.632 123.309 0.000 76.721 79.213 x17 77.6581 0.635 122.371 0.000 76.407 78.909 x18 77.4847 0.662 117.132 0.000 76.181 78.788
> ============================================================================== Omnibus: 142.534 Durbin-Watson:
> 0.414 Prob(Omnibus): 0.000 Jarque-Bera (JB): 2512.436 Skew: 1.927 Prob(JB): 0.00 Kurtosis: 18.375 Cond. No. nan
> ==============================================================================
>
> Warnings: [1] The smallest eigenvalue is -7.44e-15. This might
> indicate that there are strong multicollinearity problems or that the
> design matrix is singular.
已计算并显示所有18个参数的系数。我如何知道常数或截距的值?该等式必须是y = m1x1 + m2x2 + ...... + m18x18 + K的形式 我该如何计算这个值?另外,请解释我得到的关于特征值的警告的含义。谢谢!
答案 0 :(得分:0)
想出来了!我们可以使用statsmodels。
#final input data set
X=np.vstack((dayNum,mJan,mFeb,mMar,mApr,mMay,mJun,mJul,mAug,mSep,mOct,mNov,mDec,wMon,wTue,wWed,wThu,wFri)).T
X=sm.add_constant(X) #adds constant to the input array
#final target parameter
mod_wls = sm.WLS(prices, X)
res_wls = mod_wls.fit()
print res_wls.params
print res_wls.summary()
常量值与其他参数一起显示为输出。