具有两种随机效应的混合模型 - statsmodels

时间:2018-04-26 21:57:26

标签: python r statsmodels mixed-models

import pandas as pd
import statsmodels.formula.api as smf

df = pd.read_csv('http://www.bodowinter.com/tutorial/politeness_data.csv')
df = df.drop(38)

R我会这样做:

lmer(frequency ~ attitude + (1|subject) + (1|scenario), data=df)

R中给了我:

Random effects:
 Groups   Name        Variance Std.Dev.
 scenario (Intercept)  219     14.80   
 subject  (Intercept) 4015     63.36   
 Residual              646     25.42   
Fixed effects:
            Estimate Std. Error t value
(Intercept)  202.588     26.754   7.572
attitudepol  -19.695      5.585  -3.527

我尝试对statsmodels执行相同操作:

model = smf.mixedlm("frequency ~ attitude", data=df, groups=df[["subject","scenario"]]).fit()

但是model.summary()给了我不同的输出:

      Mixed Linear Model Regression Results
=======================================================
Model:            MixedLM Dependent Variable: frequency
No. Observations: 83      Method:             REML     
No. Groups:       2       Scale:              0.0000   
Min. group size:  1       Likelihood:         inf      
Max. group size:  1       Converged:          Yes      
Mean group size:  1.0                                  
-------------------------------------------------------
                  Coef.  Std.Err. z P>|z| [0.025 0.975]
-------------------------------------------------------
Intercept        204.500                               
attitude[T.pol]    8.800                               
groups RE          0.000                               
=======================================================

3 个答案:

答案 0 :(得分:3)

下面的代码复制R结果。由于这是一个没有独立组的交叉模型,因此您需要将每个人都放在同一组中,并使用方差分量指定随机效应。

import pandas as pd                                                                                                        
import statsmodels.api as sm                                                                                               

df = pd.read_csv('http://www.bodowinter.com/tutorial/politeness_data.csv')                                                 
df = df.dropna()                                                                                                           
df["group"] = 1                                                                                                            

vcf = {"scenario": "0 + C(scenario)", "subject": "0 + C(subject)"}                                                         
model = sm.MixedLM.from_formula("frequency ~ attitude", groups="group",                                                    
                                vc_formula=vcf, re_formula="0", data=df)                                                   
result = model.fit()  

以下是结果:

            Mixed Linear Model Regression Results
==============================================================
Model:               MixedLM   Dependent Variable:   frequency
No. Observations:    83        Method:               REML     
No. Groups:          1         Scale:                646.0163 
Min. group size:     83        Likelihood:           -396.7268
Max. group size:     83        Converged:            Yes      
Mean group size:     83.0                                     
--------------------------------------------------------------
                 Coef.   Std.Err.   z    P>|z|  [0.025  0.975]
--------------------------------------------------------------
Intercept        202.588   26.754  7.572 0.000 150.152 255.025
attitude[T.pol]  -19.695    5.585 -3.526 0.000 -30.641  -8.748
scenario Var     218.991    6.476                             
subject Var     4014.616  104.614                             
==============================================================

答案 1 :(得分:2)

我能想到的唯一方法就是将你的群体连接起来。

df["grp"] = df["subject"].astype(str) + df["scenario"].astype(str)
model = smf.mixedlm("frequency ~ attitude", data=df, groups=df["grp"]).fit()

model.summary()
Out[87]: 
<class 'statsmodels.iolib.summary2.Summary'>
"""
            Mixed Linear Model Regression Results
==============================================================
Model:               MixedLM   Dependent Variable:   frequency
No. Observations:    83        Method:               REML     
No. Groups:          42        Scale:                615.6961 
Min. group size:     1         Likelihood:           -430.8261
Max. group size:     2         Converged:            Yes      
Mean group size:     2.0                                      
--------------------------------------------------------------
                 Coef.   Std.Err.   z    P>|z|  [0.025  0.975]
--------------------------------------------------------------
Intercept        202.588   10.078 20.102 0.000 182.836 222.340
attitude[T.pol]  -19.618    5.476 -3.582 0.000 -30.350  -8.885
groups RE       3650.021   50.224                             
==============================================================

"""

答案 2 :(得分:0)

lmer模型的等效smf.mixedlm将是这样的:

lmer(frequency ~ attitude + (1 + attitude|subject) + (1 + attitude|scenario), data = df)

条款说明:

  1. 全局拦截(您可以使用frequency ~ 0 + attitude + ...
  2. 禁用全局拦截)
  3. 固定效果的全局斜率attitude
  4. 随机拦截vor subject(即,subject的每个级别都会偏离全局拦截),以及每个级别内attitude的固定效应斜率的偏差subject,允许随机截距和斜率之间的相关性。
  5. scenario的等效随机截距和斜率项。
  6. 注意,如果您想允许随机截距和斜率自由变化(即在截距和斜率之间强制执行零相关),则必须将(1 + attitude|subject)替换为(1|subject) + (0 + attitude|subject),类似地{ {1}}。