import pandas as pd
import statsmodels.formula.api as smf
df = pd.read_csv('http://www.bodowinter.com/tutorial/politeness_data.csv')
df = df.drop(38)
在R
我会这样做:
lmer(frequency ~ attitude + (1|subject) + (1|scenario), data=df)
在R
中给了我:
Random effects:
Groups Name Variance Std.Dev.
scenario (Intercept) 219 14.80
subject (Intercept) 4015 63.36
Residual 646 25.42
Fixed effects:
Estimate Std. Error t value
(Intercept) 202.588 26.754 7.572
attitudepol -19.695 5.585 -3.527
我尝试对statsmodels
执行相同操作:
model = smf.mixedlm("frequency ~ attitude", data=df, groups=df[["subject","scenario"]]).fit()
但是model.summary()
给了我不同的输出:
Mixed Linear Model Regression Results
=======================================================
Model: MixedLM Dependent Variable: frequency
No. Observations: 83 Method: REML
No. Groups: 2 Scale: 0.0000
Min. group size: 1 Likelihood: inf
Max. group size: 1 Converged: Yes
Mean group size: 1.0
-------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
-------------------------------------------------------
Intercept 204.500
attitude[T.pol] 8.800
groups RE 0.000
=======================================================
答案 0 :(得分:3)
下面的代码复制R结果。由于这是一个没有独立组的交叉模型,因此您需要将每个人都放在同一组中,并使用方差分量指定随机效应。
import pandas as pd
import statsmodels.api as sm
df = pd.read_csv('http://www.bodowinter.com/tutorial/politeness_data.csv')
df = df.dropna()
df["group"] = 1
vcf = {"scenario": "0 + C(scenario)", "subject": "0 + C(subject)"}
model = sm.MixedLM.from_formula("frequency ~ attitude", groups="group",
vc_formula=vcf, re_formula="0", data=df)
result = model.fit()
以下是结果:
Mixed Linear Model Regression Results
==============================================================
Model: MixedLM Dependent Variable: frequency
No. Observations: 83 Method: REML
No. Groups: 1 Scale: 646.0163
Min. group size: 83 Likelihood: -396.7268
Max. group size: 83 Converged: Yes
Mean group size: 83.0
--------------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
--------------------------------------------------------------
Intercept 202.588 26.754 7.572 0.000 150.152 255.025
attitude[T.pol] -19.695 5.585 -3.526 0.000 -30.641 -8.748
scenario Var 218.991 6.476
subject Var 4014.616 104.614
==============================================================
答案 1 :(得分:2)
我能想到的唯一方法就是将你的群体连接起来。
df["grp"] = df["subject"].astype(str) + df["scenario"].astype(str)
model = smf.mixedlm("frequency ~ attitude", data=df, groups=df["grp"]).fit()
model.summary()
Out[87]:
<class 'statsmodels.iolib.summary2.Summary'>
"""
Mixed Linear Model Regression Results
==============================================================
Model: MixedLM Dependent Variable: frequency
No. Observations: 83 Method: REML
No. Groups: 42 Scale: 615.6961
Min. group size: 1 Likelihood: -430.8261
Max. group size: 2 Converged: Yes
Mean group size: 2.0
--------------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
--------------------------------------------------------------
Intercept 202.588 10.078 20.102 0.000 182.836 222.340
attitude[T.pol] -19.618 5.476 -3.582 0.000 -30.350 -8.885
groups RE 3650.021 50.224
==============================================================
"""
答案 2 :(得分:0)
lmer
模型的等效smf.mixedlm
将是这样的:
lmer(frequency ~ attitude + (1 + attitude|subject) + (1 + attitude|scenario), data = df)
条款说明:
frequency ~ 0 + attitude + ...
attitude
。subject
(即,subject
的每个级别都会偏离全局拦截),以及每个级别内attitude
的固定效应斜率的偏差subject
,允许随机截距和斜率之间的相关性。scenario
的等效随机截距和斜率项。注意,如果您想允许随机截距和斜率自由变化(即在截距和斜率之间强制执行零相关),则必须将(1 + attitude|subject)
替换为(1|subject) + (0 + attitude|subject)
,类似地{ {1}}。