大熊猫的回归

时间:2014-06-27 13:41:31

标签: python database pandas statistics regression

我有两个独立的数据库 - 一个包含每小时数据的温度数据库和一个包含hvac使用的分钟数据的数据库。我试图将hvac数据绘制为一周,一个月和一年的温度序列,但由于增量与温度db不匹配,我遇到了麻烦。我已经尝试过最小二乘拟合,但是a)无法弄清楚如何在熊猫中做一个而b)在一两天之后变得非常不准确。有什么建议吗?

1 个答案:

答案 0 :(得分:3)

pandas timeseries对此应用程序来说是完美的。您可以合并一系列不同的采样频率,pandas将完美对齐它们。然后,您可以对数据进行下采样并执行预处理回归,即使用statsmodels。一个模拟的例子:

In [288]:

idx1=pd.date_range('2001/01/01', periods=10, freq='D')
idx2=pd.date_range('2001/01/01', periods=500, freq='H')
df1 =pd.DataFrame(np.random.random(10), columns=['val1'])
df2 =pd.DataFrame(np.random.random(500), columns=['val2'])
df1.index=idx1
df2.index=idx2
In [291]:

df3=pd.merge(df1, df2, left_index=True, right_index=True, how='inner')
df4=df3.resample(rule='D')
In [292]:

print df4
                val1      val2
2001-01-01  0.399901  0.244800
2001-01-02  0.014448  0.423780
2001-01-03  0.811747  0.070047
2001-01-04  0.595556  0.679096
2001-01-05  0.218412  0.116764
2001-01-06  0.961310  0.040317
2001-01-07  0.058964  0.606843
2001-01-08  0.075129  0.407842
2001-01-09  0.833003  0.751287
2001-01-10  0.070072  0.559986

[10 rows x 2 columns]
In [294]:

import statsmodels.formula.api as smf
mod = smf.ols(formula='val1 ~ val2', data=df4)
res = mod.fit()
print res.summary()
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   val1   R-squared:                       0.061
Model:                            OLS   Adj. R-squared:                 -0.056
Method:                 Least Squares   F-statistic:                    0.5231
Date:                Fri, 27 Jun 2014   Prob (F-statistic):              0.490
Time:                        10:46:34   Log-Likelihood:                -3.3643
No. Observations:                  10   AIC:                             10.73
Df Residuals:                       8   BIC:                             11.33
Df Model:                           1                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept      0.5405      0.224      2.417      0.042         0.025     1.056
val2          -0.3502      0.484     -0.723      0.490        -1.467     0.766
==============================================================================
Omnibus:                        3.509   Durbin-Watson:                   2.927
Prob(Omnibus):                  0.173   Jarque-Bera (JB):                1.232
Skew:                           0.399   Prob(JB):                        0.540
Kurtosis:                       1.477   Cond. No.                         4.69
==============================================================================