避免在重塑()

时间:2016-02-01 15:35:16

标签: r reshape reshape2

我正在尝试reshape() R中的一些时变数据。我正在使用以下数据集:

dframe <- structure(list(participant_id = structure(c(48L, 43L, 51L, 28L, 35L, 65L), .Label = c("PRA", "RA", "ASD", "LAD", "ASDGZV ", "RAGSD", "GREA", "SDFDSA", "DSFG", "FHJ", "RQGA", "AESFD", "RGAV", "FGHDF", "HSGD", "FDGH", "ASDF", "AGSD", "SADF", "SADF", "SF", "XV", "ASDCV", "ASDF", "ASDG", "SDF", "XCVZ", "ZXCV", "ASGV", "SAFDV", "ASDF", "SDFV", "SAFD", "SAFD", "AGS", "FDSGVX", "WAFDS", "DSAZC", "SADCZX", "SADFCX", "DSAFC", "FDSGV", "ADSCXZ", "SDFACZ", "SADFCZ", "AFSDZX", "EAWFDSZ", "FDVCZX", "SADZC", "FSADCZ", "AESFDZC", "WAFDSZC", "SDFC", "FSADC", "DSZXC", "SDAFC", "AFSDZC", "WFADS", "FSDVC", "GSDHBXC", "EFWADSCXZ", "EWAFDSC", "AFDSCZ", "AWEFDC", "AGSFV"), class = "factor"), baseline_pupilsize = c(6, 6, 7, 6, 6, 6), baseline_coe = c(11.19, 13.6, 3.96, 7.64, 6.12, 6.92), baseline_rcb = c(16.74, 25, 25, 18.37, 25, 25), final_pop = c(NA, NA, 7.1, 8, 6, NA), final_coe = c(NA, NA, 5.9263624, 4.89, 11.98, NA), final_rcb = c(NA, NA, 25L, NA, NA, NA)), .Names = c("participant_id", "baseline_pop", "baseline_coe", "baseline_rcb", "final_pop", "final_coe", "final_rcb"), row.names = c(NA, 6L), class = "data.frame")

这些是来自纵向研究的时变数据,以及我从源文件导入的更大数据集的子集。我想为popcoe研究访问提取rcbbaselinefinal的值(在我的完整数据集中有几次访问)介于两者之间,为了这个问题的目的我已经省略了。)

我可以做以下事情:

reshape(dframe,idvar='participant_id',v.names = c('pop','coe','rcb'),varying = 2:length(dframe),direction='long')

但是,最终应将pop中的值标记为coereshape2的文档告诉我,我应该明确引用varying值以避免'猜测'。所以,我试着这样做:

reshape(dframe,idvar='participant_id',v.names = c('pop','coe','rcb'),varying = c('baseline_pop','baseline_coe','baseline_rcb','final_pop','final_coe','final_rcb'),direction='long')

这导致完全相同的输出,尽管明确命名varying参数。我究竟做错了什么?据推测,由于字母表化,pop最终会得到coe的值,但我无法理解为什么会发生这种情况,因为我现在已经明确地声明了varying参数...

编辑:预期输出如下:

participant_id  time    pop coe         rcb
FDVCZX          1       6   11.19       16.74
ADSCXZ          1       6   13.6        25
AESFDZC         1       7   3.96        25
ZXCV            1       6   7.64        18.37
AGS             1       6   6.12        25
AGSFV           1       6   6.92        25
FDVCZX          2       NA  NA          NA
ADSCXZ          2       NA  NA          NA
AESFDZC         2       7.1 5.926362    25
ZXCV            2       8   4.89        NA
AGS             2       6   11.98       NA
AGSFV           2       NA  NA          NA

但是,正如您将看到的那样,pop值最终会出现在coe列中,反之亦然。

1 个答案:

答案 0 :(得分:0)

我们可以使用melt中的data.tablemeasure可以使用多个library(data.table) melt(setDT(dframe), measure=patterns('pop', 'coe', 'rcb'), value.name = c('pop', 'coe', 'rcb'), variable.name='time') # participant_id time pop coe rcb # 1: FDVCZX 1 6.0 11.190000 16.74 # 2: ADSCXZ 1 6.0 13.600000 25.00 # 3: AESFDZC 1 7.0 3.960000 25.00 # 4: ZXCV 1 6.0 7.640000 18.37 # 5: AGS 1 6.0 6.120000 25.00 # 6: AGSFV 1 6.0 6.920000 25.00 # 7: FDVCZX 2 NA NA NA # 8: ADSCXZ 2 NA NA NA # 9: AESFDZC 2 7.1 5.926362 25.00 #10: ZXCV 2 8.0 4.890000 NA #11: AGS 2 6.0 11.980000 NA #12: AGSFV 2 NA NA NA 列。

elm