我正在寻找在Python中构建一个函数,该函数使用以下等式创建一个简单的OLS回归:
Y_i - Y_i-1 = A + B(X_i - X_i-1) + E
换句话说,Y_Lag = alpha + beta(X_Lag)+错误项
当前,我有以下数据集(这是一个简短版本)
注意:Y =历史汇率
df = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 5)), columns=['Historic_Rate', 'Overnight', '1M', '3M', '6M'])
所以,我要构建的是迭代地将X变量放入一个简单的线性回归中,到目前为止,我已构建的代码如下:
#Start the iteration process for the regression to in turn fit 1 parameter
#Import required packages
import pandas as pd
import numpy as np
import statsmodels.formula.api as sm
#Import dataset
df = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 5)), columns=['Historic_Rate', 'Overnight', '1M', '3M', '6M'])
#Y_Lag is always 1 time period only
df['Y_Lag'] = df['Historic_Rate'].shift(1)
#Begin the process with 1 lag, taking one x variable in turn
array = df[0:0]
array.drop(array.columns[[0,5]], axis=1, inplace=True)
for X in array:
df['X_Lag'] = df['X'].shift(1)
Model = df[df.columns[4:5]]
Y = Model['Y_Lag']
X = Model['X_Lag']
Reg_model = sm.OLS(Y,X).fit()
predictions = model.predict(X)
# make the predictions by the model
# Print out the statistics
model.summary()
因此,从本质上讲,我正在寻找创建列标题的列表,这些列表标题又将系统地遍历我的循环,每个变量将被滞后,然后针对滞后的Y变量进行回归。
我也很感谢有关如何输出模型的知识。X,其中X是数组的第X次迭代,用于动态命名变量。
答案 0 :(得分:2)
您接近了,我想您只是在循环中将变量X
与字符串'X'
混淆了。我还认为您不是在计算Y_i - Y_i-1
,而是将Y_i-1
与X_i-1
进行回归。
这是循环遍历回归的方式。我们还将使用字典来存储回归结果,并以键作为列名。
import pandas as pd
import numpy as np
import statsmodels.api as sm
df = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 5)),
columns=['Historic_Rate', 'Overnight', '1M', '3M', '6M'])
fit_d = {} # This will hold all of the fit results and summaries
for col in [x for x in df.columns if x != 'Historic_Rate']:
Y = df['Historic_Rate'] - df['Historic_Rate'].shift(1)
# Need to remove the NaN for fit
Y = Y[Y.notnull()]
X = df[col] - df[col].shift(1)
X = X[X.notnull()]
X = sm.add_constant(X) # Add a constant to the fit
fit_d[col] = sm.OLS(Y,X).fit()
现在,如果您想做出一些预测,例如对于您的最后一个模型,则可以执行以下操作:
fit_d['6M'].predict(sm.add_constant(df['6M']-df['6M'].shift(1)))
#0 NaN
#1 0.5
#2 -2.0
#3 -1.0
#4 -0.5
#dtype: float64
您可以获得摘要:fit_d['6M'].summary()
OLS Regression Results
==============================================================================
Dep. Variable: Historic_Rate R-squared: 0.101
Model: OLS Adj. R-squared: -0.348
Method: Least Squares F-statistic: 0.2254
Date: Thu, 27 Sep 2018 Prob (F-statistic): 0.682
Time: 11:27:33 Log-Likelihood: -9.6826
No. Observations: 4 AIC: 23.37
Df Residuals: 2 BIC: 22.14
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const -0.4332 1.931 -0.224 0.843 -8.740 7.873
6M -0.2674 0.563 -0.475 0.682 -2.691 2.156
==============================================================================
Omnibus: nan Durbin-Watson: 2.301
Prob(Omnibus): nan Jarque-Bera (JB): 0.254
Skew: -0.099 Prob(JB): 0.881
Kurtosis: 1.781 Cond. No. 3.44
==============================================================================