在R中,可以像
一样执行多元线性回归temp = lm(log(volume_1[11:62])~log(price_1[11:62])+log(volume_1[10:61]))
在Python中,可以使用 R样式公式,所以我认为下面的代码也应该工作,
import statsmodels.formula.api as smf
import pandas as pd
import numpy as np
rando = lambda x: np.random.randint(low=1, high=100, size=x)
df = pd.DataFrame(data={'volume_1': rando(62), 'price_1': rando(62)})
temp = smf.ols(formula='np.log(volume_1)[11:62] ~ np.log(price_1)[11:62] + np.log(volume_1)[10:61]',
data=df)
# np.log(volume_1)[10:61] express the lagged volume
但是我得到了错误
PatsyError: Number of rows mismatch between data argument and volume_1[11:62] (62 versus 51)
volume_1[11:62] ~ price_1[11:62] + volume_1[10:61]
我想不可能只对列中的部分行进行回归,因为data = df有62行,其他变量有51行。
有没有像R那样方便的回归方法?
df类型为pandas Dataframe,列名称为volume_1,price_1
答案 0 :(得分:0)
使用patsy存储库中github question中的示例,这将是使滞后列正常工作的方法。
import statsmodels.formula.api as smf
import pandas as pd
import numpy as np
rando = lambda x: np.random.randint(low=1, high=100, size=x)
df = pd.DataFrame(data={'volume_1': rando(62), 'price_1': rando(62)})
def lag(x, n):
if n == 0:
return x
if isinstance(x,pd.Series):
return x.shift(n)
x = x.astype('float')
x[n:] = x[0:-n]
x[:n] = np.nan
return x
temp = smf.ols(formula='np.log(volume_1) ~ np.log(price_1) + np.log(lag(volume_1,1))',
data=df[11:62])