我正在尝试将现有的SAS代码转换为研究项目。不幸的是,我发现自己对于如何处理重复测量ANOVA完全无能为力,尽管几个小时的时间都在查看其他人的问题StackExchange和整个网络。我怀疑这至少可能是因为我不知道要问的正确问题和有限的统计背景。
首先,我将提供一些示例数据(制表符分隔,我不确定将在SE上保留),然后解释我正在尝试做什么,然后是我在此刻编写的代码
样本数据:
Full data frame at: http://grandprairiefriends.org/document/data.df
Obs SbjctID Sex Treatment Measured BirthDate DateStarted DateAssayed SubjectAge_Start_days SubjectAgeAssay.d. PreMass_mg PostMass_mg DiffMass_mg PerCentMassDiff Length_mm Width_mm PO1_abs_min PO1_r2 PO2_abs_min PO2_r2 ProteinConc_ul Protein1_net_abs Protein1_mg_ml Protein1_adjusted_mg_ml Protein2_net_abs Protein2_mg_ml Protein2_adjusted_mg_ml zPO_avg_abs_min z_Protein_avg_adjusted_mg_ml POPer_ug_Protein POPer_ug_Protein_x1000 ImgDarkness1 ImgDarkness2 ImgDarkness3 ImgDarkness4 DarknessAvg AGV_1_1 AGV_1_2 AGV_2_1 AGV_2_2 AGV_12_1 AGV_12_2 z_AGV predicted_premass resid_premass predicted_premass_calculated resid_premass_calculated predicted_postmass_calculated resid_postmass_calculated predicted_postmass resid_postmass ln_premass_mg ln_postmass_mg ln_length ln_melanization ln_po sqrt_p
1 aF001 Female a PO_P 08/05/09 09/06/09 09/13/09 32 39 282.7 309.4 26.66 9.43 10.1 5.3 0.0175 0.996 0.0201 0.996 40 0.227 0.960 0.960 0.234 1.030 1.030 0.0188 0.995 0.00031 0.31491 33.7045 35.9165 28.8383 30.3763 32.2089 NA NA NA NA NA NA NA 5.660963 -0.016576413 4.077123 1.567263 4.077123 1.657382 5.660963 0.0735429694 8.143128 8.273329 3.336283 NA -5.733124 -0.007231569
2 aF002 Female a PO_P 08/02/09 09/06/09 09/13/09 35 42 298.9 313.1 14.23 4.76 10.0 5.9 0.0123 0.999 0.0134 0.996 40 0.213 0.840 0.840 0.219 0.860 0.860 0.0129 0.850 0.00025 0.25196 31.8700 31.8800 32.4680 32.3020 32.1300 NA NA NA NA NA NA NA 5.640012 0.059996453 4.056173 1.643836 4.056173 1.690350 5.640012 0.1065103847 8.223519 8.290480 3.321928 NA -6.276485 -0.234465254
3 aF003 Female a PO_P 08/03/09 09/06/09 09/13/09 34 41 237.1 270.6 33.53 14.14 9.4 5.3 0.0227 0.992 0.0248 0.994 40 0.245 1.120 1.120 0.235 1.030 1.030 0.0238 1.075 0.00037 0.36822 36.0565 41.9355 41.6260 40.0180 39.9090 NA NA NA NA NA NA NA 5.509734 -0.041209334 3.925894 1.542630 3.925894 1.674895 5.509734 0.0910560222 7.889352 8.080018 3.232661 NA -5.392895 0.104336660
82 bM001 Male b PO_P 08/02/09 08/31/09 09/07/09 29 36 468.1 371.7 -96.38 -20.59 10.7 6.8 0.0049 0.999 0.0056 1.000 40 0.228 0.350 0.350 0.222 0.330 0.330 0.0053 0.340 0.00026 0.25735 NA NA NA NA NA NA NA NA NA NA NA NA 5.782468 0.366214334 4.198628 1.950054 4.198628 1.719513 5.640012 -0.0844204671 8.870673 8.537995 3.419539 NA -7.559792 -1.556393349
157 cM022 Male c PO_P 08/03/09 10/31/09 11/07/09 89 96 451.1 402.4 -48.71 -10.80 11.3 6.9 0.0024 0.995 0.0026 0.995 10 0.091 0.110 0.028 NA NA NA 0.0025 0.028 0.00152 1.51515 NA NA NA NA NA NA NA NA NA NA NA NA 5.897342 0.214325251 4.313502 1.798165 4.313502 1.683895 5.897342 0.1000552907 8.817303 8.652486 3.498251 NA -8.643856 -5.158429363
解释我想要完成的事情:
该实验试图确定特定的喂养方式(治疗)是否对受试者的实验后质量(ln_postmass_mg)有影响。每个个体的质量测量两次,一次开始时(ln_premass_mg),一次在喂养方式结束时测量。性别,治疗和测量都是分类变量。
我已经生成了一些R代码,但输出与SAS代码不匹配,它不应该,因为我不相信它是为重复测量而编码的。我不清楚我是否需要在R中转置或操纵我的数据帧以执行其他分析,或者是什么。我似乎正在阅读多种不同的方法来解决重复测量问题,并且我不确定哪个(如果有的话)适用于我的特定问题。如果有人能让我走上正确的轨道,学习如何编写R等价物所需的额外代码行,或者有建议,我会非常感激。
SAS代码:
/* test for effect of diet regime */
/* repeated measures ANOVA for mass */
proc glm data=No_diet_lab;
class measured sex Treatment;
model ln_premass ln_postmass=Measured Sex Treatment Measured*Sex Measured*Treatment Sex*Treatment Measured*Sex*Treatment /nouni;
repeated time 2;
R代码:
options(contrasts=c("contr.sum","contr.poly"))
model <- lm(cbind(ln_premass_mg, ln_postmass_mg) ~ Sex + Treatment + Measured + Sex:Treatment + Sex:Measured + Measured:Treatment + Sex:Treatment:Measured, data = diet_lab_data, na.action=na.omit)
答案 0 :(得分:1)
这应该希望复制您的SAS输出:
首先我们将数据放在长形式中:
df <- subset(diet_lab_data, select = c("SubjectID", "Sex", "Treatment", "Measured",
"ln_premass_mg", "ln_postmass_mg"))
dfL <- reshape(df, varying = list(5:6), idvar = "SubjectID", direction = "long",
v.names = "ln_mass_mg")
dfL$time <- factor(dfL$time, levels = 1:2, labels = c("pre", "post"))
head(dfL); tail(dfL)
SubjectID Sex Treatment Measured time ln_mass_mg
aF001.1 aF001 Female a PO_P pre 8.143128
aF002.1 aF002 Female a PO_P pre 8.223519
aF003.1 aF003 Female a PO_P pre 7.889352
aF004.1 aF004 Female a PO_P pre 8.521993
aF005.1 aF005 Female a PO_P pre 8.335390
aF006.1 aF006 Female a PO_P pre 8.259743
SubjectID Sex Treatment Measured time ln_mass_mg
cM033.2 cM033 Male c Melaniz post 8.163398
bF037.2 bF037 Female b Melaniz post 8.222070
cM032.2 cM032 Male c Melaniz post 8.422485
cF030.2 cF030 Female c Melaniz post 8.580447
cM039.2 cM039 Male c Melaniz post 8.710118
cM036.2 cM036 Male c Melaniz post 8.049849
那更好。现在,我们使用aov
拟合模型,并将time
指定为主题内部因素。
aovMod <- aov(ln_mass_mg ~ Sex * Treatment * Measured * time +
Error(SubjectID/time), data = dfL)
所有这一切,我不确定这是否是适当的分析,因为您的设计是不平衡的。考虑混合效应模型。