多元回归中的常量值

时间:2017-03-16 10:53:21

标签: python-2.7 regression statsmodels

我编写了一个代码,使用WLS进行多元回归,形成最佳拟合线的等式。这是我正在使用的代码:

import numpy as np
import csv
import pandas as pd
import statsmodels.api as sm
from statsmodels.sandbox.regression.predstd import wls_prediction_std

#read data from csv
readCSV=pd.read_csv('test2.csv')

#store data into corresponding month lists
mJan=readCSV['month__Jan']
mFeb=readCSV['month__Feb']
mMar=readCSV['month__Mar']
mApr=readCSV['month__Apr']
mMay=readCSV['month__May']
mJun=readCSV['month__Jun']
mJul=readCSV['month__Jul']
mAug=readCSV['month__Aug']
mSep=readCSV['month__Sep']
mOct=readCSV['month__Oct']
mNov=readCSV['month__Nov']
mDec=readCSV['month__Dec']

#store data into corresponding weekdays lists
wMon=readCSV['weekday__Mon']
wTue=readCSV['weekday__Tue']
wWed=readCSV['weekday__Wed']
wThu=readCSV['weekday__Thu']
wFri=readCSV['weekday__Fri']

#convert month pandas data to numpy arrays
mJan=mJan.as_matrix()
mFeb=mFeb.as_matrix()
mMar=mMar.as_matrix()
mApr=mApr.as_matrix()
mMay=mMay.as_matrix()
mJun=mJun.as_matrix()
mJul=mJul.as_matrix()
mAug=mAug.as_matrix()
mSep=mSep.as_matrix()
mOct=mOct.as_matrix()
mNov=mNov.as_matrix()
mDec=mDec.as_matrix()

#convert weekday pandas data frame to numpy arrays
wMon=wMon.as_matrix()
wTue=wTue.as_matrix()
wWed=wWed.as_matrix()
wThu=wThu.as_matrix()
wFri=wFri.as_matrix()

prices=[]
dayNum=[]
with open('aapl.csv','rb') as csvfile:
    csvFileReader=csv.reader(csvfile)
    next(csvFileReader)
    for row in csvFileReader:
        fullDate=row[0]
        #print(fullDate[7:])            
        l=row[0].split()
        s=l[1].strip(',')
        dayNum.append(int(s))
        prices.append(float(row[1]))

#final input data set
X=np.vstack((dayNum,mJan,mFeb,mMar,mApr,mMay,mJun,mJul,mAug,mSep,mOct,mNov,mDec,wMon,wTue,wWed,wThu,wFri)).T

#final target parameter
mod_wls = sm.WLS(prices, X)
res_wls = mod_wls.fit()
print res_wls.params
print res_wls.summary()

代码工作得很好,我得到的输出看起来像这样:

> [  6.74096438e-03   4.18506571e+01   5.60244116e+01   3.54806772e+01
>    2.83052274e+01   1.74134713e+01   1.86328033e+01   2.11088939e+01
>    2.97457264e+01   3.31973702e+01   3.80025208e+01   3.21529043e+01
>    3.64722064e+01   7.74843964e+01   7.77925151e+01   7.79671536e+01
>    7.76580605e+01   7.74847444e+01]
>                             WLS Regression Results                            
> ============================================================================== Dep. Variable:                      y   R-squared:                    
> 0.999 Model:                            WLS   Adj. R-squared:                  0.999 Method:                 Least Squares   F-statistic:                 1.041e+04 Date:                Thu, 16 Mar 2017   Prob (F-statistic):          9.43e-313 Time:                        13:51:47   Log-Likelihood:                -668.01 No. Observations:                 240   AIC:                             1370. Df Residuals:                     223   BIC:                             1429. Df Model:                          17                                         
> ==============================================================================
>                  coef    std err          t      P>|t|      [95.0% Conf. Int.]
> ------------------------------------------------------------------------------ x1             0.0067      0.030      0.227      0.821        -0.052  
> 0.065 x2            41.8507      0.888     47.124      0.000        40.101    43.601 x3            56.0244      0.899     62.321      0.000        54.253    57.796 x4            35.4807      1.257     28.221      0.000        33.003    37.958 x5            28.3052      0.864     32.746      0.000        26.602    30.009 x6            17.4135      0.862     20.207      0.000        15.715    19.112 x7            18.6328      0.845     22.043      0.000        16.967    20.299 x8            21.1089      0.887     23.792      0.000        19.360    22.857 x9            29.7457      0.828     35.922      0.000        28.114    31.378 x10           33.1974      0.870     38.178      0.000        31.484    34.911 x11           38.0025      0.866     43.872      0.000        36.296    39.710 x12           32.1529      0.861     37.338      0.000        30.456    33.850 x13           36.4722      0.865     42.188      0.000        34.769    38.176 x14           77.4844      0.699    110.787      0.000        76.106    78.863 x15           77.7925      0.642    121.266      0.000        76.528    79.057 x16           77.9672      0.632    123.309      0.000        76.721    79.213 x17           77.6581      0.635    122.371      0.000        76.407    78.909 x18           77.4847      0.662    117.132      0.000        76.181    78.788
> ============================================================================== Omnibus:                      142.534   Durbin-Watson:                
> 0.414 Prob(Omnibus):                  0.000   Jarque-Bera (JB):             2512.436 Skew:                           1.927   Prob(JB):                         0.00 Kurtosis:                      18.375   Cond. No.                          nan
> ==============================================================================
> 
> Warnings: [1] The smallest eigenvalue is -7.44e-15. This might
> indicate that there are strong multicollinearity problems or that the
> design matrix is singular.

已计算并显示所有18个参数的系数。我如何知道常数或截距的值?该等式必须是y = m1x1 + m2x2 + ...... + m18x18 + K的形式 我该如何计算这个值?另外,请解释我得到的关于特征值的警告的含义。谢谢!

1 个答案:

答案 0 :(得分:0)

想出来了!我们可以使用statsmodels。

#final input data set
X=np.vstack((dayNum,mJan,mFeb,mMar,mApr,mMay,mJun,mJul,mAug,mSep,mOct,mNov,mDec,wMon,wTue,wWed,wThu,wFri)).T
X=sm.add_constant(X)        #adds constant to the input array
#final target parameter
mod_wls = sm.WLS(prices, X)
res_wls = mod_wls.fit()
print res_wls.params
print res_wls.summary()

常量值与其他参数一起显示为输出。