Question

我想为this link（https://drive.google.com/drive/folders/0B2Iv8dfU4fTUMVFyYTEtWXlzYkk）中找到的数据集运行3个窗口OLS regression estimation的示例，如下面的格式。我的数据集中的第三列（Y）是我的真实值 - 这是我想要预测的（估计）。

 time     X   Y
0.000543  0  10
0.000575  0  10
0.041324  1  10
0.041331  2  10
0.041336  3  10
0.04134   4  10
  ...
9.987735  55 239
9.987739  56 239
9.987744  57 239
9.987749  58 239
9.987938  59 239

使用简单的OLS regression estimation，我已尝试使用以下脚本。

# /usr/bin/python -tt

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

df = pd.read_csv('estimated_pred.csv')

model = pd.stats.ols.MovingOLS(y=df.Y, x=df[['X']], 
                               window_type='rolling', window=3, intercept=True)
df['Y_hat'] = model.y_predict

print(df['Y_hat'])
print (model.summary)
df.plot.scatter(x='X', y='Y', s=0.1)

然而，使用statsmodels或scikit-learn似乎是一个简单回归之外的好选择。我尝试使以下脚本工作，但使用更高的数据集子集（例如，对于超过100行的数据集）撤回IndexError: index out of bounds。

# /usr/bin/python -tt
import pandas as pd
import numpy as np
import statsmodels.api as sm


df=pd.read_csv('estimated_pred.csv')    
df=df.dropna() # to drop nans in case there are any
window = 3
#print(df.index) # to print index
df['a']=None #constant
df['b1']=None #beta1
df['b2']=None #beta2
for i in range(window,len(df)):
    temp=df.iloc[i-window:i,:]
    RollOLS=sm.OLS(temp.loc[:,'Y'],sm.add_constant(temp.loc[:,['time','X']])).fit()
    df.iloc[i,df.columns.get_loc('a')]=RollOLS.params[0]
    df.iloc[i,df.columns.get_loc('b1')]=RollOLS.params[1]
    df.iloc[i,df.columns.get_loc('b2')]=RollOLS.params[2]

#The following line gives us predicted values in a row, given the PRIOR row's estimated parameters
df['predicted']=df['a'].shift(1)+df['b1'].shift(1)*df['time']+df['b2'].shift(1)*df['X']

print(df['predicted'])
#print(df['b2'])

#print(RollOLS.predict(sm.add_constant(predict_x)))

print(temp)

我想对Y进行预测（即根据Y的先前值3滚动值预测X的当前值。最后，我想要包含均值所有预测的平方误差（MSE）（回归分析的摘要）。例如，如果我们查看第5行，X的值为2，{{1}的值我们说当前行Y的预测值是6，因此y将是mse。我们怎样才能使用{{} (10-6)^2在statsmodels版本0.20.0中已删除1}}或scikit-learn pd.stats.ols.MovingOLS，因为我无法找到任何引用？

Python - 使用statsmodels或scikit-learn的OLS回归估计

0 个答案: