将SAS中的重复测量代码转换为R中的等效代码

时间:2015-03-22 02:54:04

标签: r anova

我正在尝试将现有的SAS代码转换为研究项目。不幸的是,我发现自己对于如何处理重复测量ANOVA完全无能为力,尽管几个小时的时间都在查看其他人的问题StackExchange和整个网络。我怀疑这至少可能是因为我不知道要问的正确问题和有限的统计背景。

首先,我将提供一些示例数据(制表符分隔,我不确定将在SE上保留),然后解释我正在尝试做什么,然后是我在此刻编写的代码

样本数据:

Full data frame at: http://grandprairiefriends.org/document/data.df

Obs SbjctID Sex Treatment   Measured    BirthDate   DateStarted DateAssayed SubjectAge_Start_days   SubjectAgeAssay.d.  PreMass_mg  PostMass_mg DiffMass_mg PerCentMassDiff Length_mm   Width_mm    PO1_abs_min PO1_r2  PO2_abs_min PO2_r2  ProteinConc_ul  Protein1_net_abs    Protein1_mg_ml  Protein1_adjusted_mg_ml Protein2_net_abs    Protein2_mg_ml  Protein2_adjusted_mg_ml zPO_avg_abs_min z_Protein_avg_adjusted_mg_ml    POPer_ug_Protein    POPer_ug_Protein_x1000  ImgDarkness1    ImgDarkness2    ImgDarkness3    ImgDarkness4    DarknessAvg AGV_1_1 AGV_1_2 AGV_2_1 AGV_2_2 AGV_12_1    AGV_12_2    z_AGV   predicted_premass   resid_premass   predicted_premass_calculated    resid_premass_calculated    predicted_postmass_calculated   resid_postmass_calculated   predicted_postmass  resid_postmass  ln_premass_mg   ln_postmass_mg  ln_length   ln_melanization ln_po   sqrt_p
1   aF001   Female  a   PO_P    08/05/09    09/06/09    09/13/09    32  39  282.7   309.4   26.66   9.43    10.1    5.3 0.0175  0.996   0.0201  0.996   40  0.227   0.960   0.960   0.234   1.030   1.030   0.0188  0.995   0.00031 0.31491 33.7045 35.9165 28.8383 30.3763 32.2089 NA  NA  NA  NA  NA  NA  NA  5.660963    -0.016576413    4.077123    1.567263    4.077123    1.657382    5.660963    0.0735429694    8.143128    8.273329    3.336283    NA  -5.733124   -0.007231569
2   aF002   Female  a   PO_P    08/02/09    09/06/09    09/13/09    35  42  298.9   313.1   14.23   4.76    10.0    5.9 0.0123  0.999   0.0134  0.996   40  0.213   0.840   0.840   0.219   0.860   0.860   0.0129  0.850   0.00025 0.25196 31.8700 31.8800 32.4680 32.3020 32.1300 NA  NA  NA  NA  NA  NA  NA  5.640012    0.059996453 4.056173    1.643836    4.056173    1.690350    5.640012    0.1065103847    8.223519    8.290480    3.321928    NA  -6.276485   -0.234465254
3   aF003   Female  a   PO_P    08/03/09    09/06/09    09/13/09    34  41  237.1   270.6   33.53   14.14   9.4 5.3 0.0227  0.992   0.0248  0.994   40  0.245   1.120   1.120   0.235   1.030   1.030   0.0238  1.075   0.00037 0.36822 36.0565 41.9355 41.6260 40.0180 39.9090 NA  NA  NA  NA  NA  NA  NA  5.509734    -0.041209334    3.925894    1.542630    3.925894    1.674895    5.509734    0.0910560222    7.889352    8.080018    3.232661    NA  -5.392895   0.104336660
82  bM001   Male    b   PO_P    08/02/09    08/31/09    09/07/09    29  36  468.1   371.7   -96.38  -20.59  10.7    6.8 0.0049  0.999   0.0056  1.000   40  0.228   0.350   0.350   0.222   0.330   0.330   0.0053  0.340   0.00026 0.25735 NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  5.782468    0.366214334 4.198628    1.950054    4.198628    1.719513    5.640012    -0.0844204671   8.870673    8.537995    3.419539    NA  -7.559792   -1.556393349
157 cM022   Male    c   PO_P    08/03/09    10/31/09    11/07/09    89  96  451.1   402.4   -48.71  -10.80  11.3    6.9 0.0024  0.995   0.0026  0.995   10  0.091   0.110   0.028   NA  NA  NA  0.0025  0.028   0.00152 1.51515 NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  5.897342    0.214325251 4.313502    1.798165    4.313502    1.683895    5.897342    0.1000552907    8.817303    8.652486    3.498251    NA  -8.643856   -5.158429363

解释我想要完成的事情:

该实验试图确定特定的喂养方式(治疗)是否对受试者的实验后质量(ln_postmass_mg)有影响。每个个体的质量测量两次,一次开始时(ln_premass_mg),一次在喂养方式结束时测量。性别,治疗和测量都是分类变量。

我已经生成了一些R代码,但输出与SAS代码不匹配,它不应该,因为我不相信它是为重复测量而编码的。我不清楚我是否需​​要在R中转置或操纵我的数据帧以执行其他分析,或者是什么。我似乎正在阅读多种不同的方法来解决重复测量问题,并且我不确定哪个(如果有的话)适用于我的特定问题。如果有人能让我走上正确的轨道,学习如何编写R等价物所需的额外代码行,或者有建议,我会非常感激。

SAS代码

/* test for effect of diet regime */
/* repeated measures ANOVA for mass */
proc glm data=No_diet_lab;
class measured sex Treatment; 
model ln_premass ln_postmass=Measured Sex Treatment Measured*Sex Measured*Treatment Sex*Treatment  Measured*Sex*Treatment /nouni;
repeated time 2;

R代码

options(contrasts=c("contr.sum","contr.poly"))
model <- lm(cbind(ln_premass_mg, ln_postmass_mg) ~ Sex + Treatment + Measured + Sex:Treatment + Sex:Measured + Measured:Treatment + Sex:Treatment:Measured, data = diet_lab_data, na.action=na.omit)

1 个答案:

答案 0 :(得分:1)

这应该希望复制您的SAS输出:

首先我们将数据放在长形式中:

df <- subset(diet_lab_data, select = c("SubjectID", "Sex", "Treatment", "Measured",
                                       "ln_premass_mg", "ln_postmass_mg"))

dfL <- reshape(df, varying = list(5:6), idvar = "SubjectID", direction = "long",
              v.names = "ln_mass_mg")
dfL$time <- factor(dfL$time, levels = 1:2, labels = c("pre", "post"))
head(dfL); tail(dfL)

        SubjectID    Sex Treatment Measured time ln_mass_mg
aF001.1     aF001 Female         a     PO_P  pre   8.143128
aF002.1     aF002 Female         a     PO_P  pre   8.223519
aF003.1     aF003 Female         a     PO_P  pre   7.889352
aF004.1     aF004 Female         a     PO_P  pre   8.521993
aF005.1     aF005 Female         a     PO_P  pre   8.335390
aF006.1     aF006 Female         a     PO_P  pre   8.259743
        SubjectID    Sex Treatment Measured time ln_mass_mg
cM033.2     cM033   Male         c  Melaniz post   8.163398
bF037.2     bF037 Female         b  Melaniz post   8.222070
cM032.2     cM032   Male         c  Melaniz post   8.422485
cF030.2     cF030 Female         c  Melaniz post   8.580447
cM039.2     cM039   Male         c  Melaniz post   8.710118
cM036.2     cM036   Male         c  Melaniz post   8.049849

那更好。现在,我们使用aov拟合模型,并将time指定为主题内部因素。

aovMod <- aov(ln_mass_mg ~ Sex * Treatment * Measured * time +
              Error(SubjectID/time), data = dfL)

所有这一切,我不确定这是否是适当的分析,因为您的设计是不平衡的。考虑混合效应模型。