我有两个独立的数据库 - 一个包含每小时数据的温度数据库和一个包含hvac使用的分钟数据的数据库。我试图将hvac数据绘制为一周,一个月和一年的温度序列,但由于增量与温度db不匹配,我遇到了麻烦。我已经尝试过最小二乘拟合,但是a)无法弄清楚如何在熊猫中做一个而b)在一两天之后变得非常不准确。有什么建议吗?
答案 0 :(得分:3)
pandas
timeseries
对此应用程序来说是完美的。您可以合并一系列不同的采样频率,pandas
将完美对齐它们。然后,您可以对数据进行下采样并执行预处理回归,即使用statsmodels
。一个模拟的例子:
In [288]:
idx1=pd.date_range('2001/01/01', periods=10, freq='D')
idx2=pd.date_range('2001/01/01', periods=500, freq='H')
df1 =pd.DataFrame(np.random.random(10), columns=['val1'])
df2 =pd.DataFrame(np.random.random(500), columns=['val2'])
df1.index=idx1
df2.index=idx2
In [291]:
df3=pd.merge(df1, df2, left_index=True, right_index=True, how='inner')
df4=df3.resample(rule='D')
In [292]:
print df4
val1 val2
2001-01-01 0.399901 0.244800
2001-01-02 0.014448 0.423780
2001-01-03 0.811747 0.070047
2001-01-04 0.595556 0.679096
2001-01-05 0.218412 0.116764
2001-01-06 0.961310 0.040317
2001-01-07 0.058964 0.606843
2001-01-08 0.075129 0.407842
2001-01-09 0.833003 0.751287
2001-01-10 0.070072 0.559986
[10 rows x 2 columns]
In [294]:
import statsmodels.formula.api as smf
mod = smf.ols(formula='val1 ~ val2', data=df4)
res = mod.fit()
print res.summary()
OLS Regression Results
==============================================================================
Dep. Variable: val1 R-squared: 0.061
Model: OLS Adj. R-squared: -0.056
Method: Least Squares F-statistic: 0.5231
Date: Fri, 27 Jun 2014 Prob (F-statistic): 0.490
Time: 10:46:34 Log-Likelihood: -3.3643
No. Observations: 10 AIC: 10.73
Df Residuals: 8 BIC: 11.33
Df Model: 1
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept 0.5405 0.224 2.417 0.042 0.025 1.056
val2 -0.3502 0.484 -0.723 0.490 -1.467 0.766
==============================================================================
Omnibus: 3.509 Durbin-Watson: 2.927
Prob(Omnibus): 0.173 Jarque-Bera (JB): 1.232
Skew: 0.399 Prob(JB): 0.540
Kurtosis: 1.477 Cond. No. 4.69
==============================================================================