我正在尝试reshape()
R中的一些时变数据。我正在使用以下数据集:
dframe <- structure(list(participant_id = structure(c(48L, 43L, 51L, 28L, 35L, 65L), .Label = c("PRA", "RA", "ASD", "LAD", "ASDGZV ", "RAGSD", "GREA", "SDFDSA", "DSFG", "FHJ", "RQGA", "AESFD", "RGAV", "FGHDF", "HSGD", "FDGH", "ASDF", "AGSD", "SADF", "SADF", "SF", "XV", "ASDCV", "ASDF", "ASDG", "SDF", "XCVZ", "ZXCV", "ASGV", "SAFDV", "ASDF", "SDFV", "SAFD", "SAFD", "AGS", "FDSGVX", "WAFDS", "DSAZC", "SADCZX", "SADFCX", "DSAFC", "FDSGV", "ADSCXZ", "SDFACZ", "SADFCZ", "AFSDZX", "EAWFDSZ", "FDVCZX", "SADZC", "FSADCZ", "AESFDZC", "WAFDSZC", "SDFC", "FSADC", "DSZXC", "SDAFC", "AFSDZC", "WFADS", "FSDVC", "GSDHBXC", "EFWADSCXZ", "EWAFDSC", "AFDSCZ", "AWEFDC", "AGSFV"), class = "factor"), baseline_pupilsize = c(6, 6, 7, 6, 6, 6), baseline_coe = c(11.19, 13.6, 3.96, 7.64, 6.12, 6.92), baseline_rcb = c(16.74, 25, 25, 18.37, 25, 25), final_pop = c(NA, NA, 7.1, 8, 6, NA), final_coe = c(NA, NA, 5.9263624, 4.89, 11.98, NA), final_rcb = c(NA, NA, 25L, NA, NA, NA)), .Names = c("participant_id", "baseline_pop", "baseline_coe", "baseline_rcb", "final_pop", "final_coe", "final_rcb"), row.names = c(NA, 6L), class = "data.frame")
这些是来自纵向研究的时变数据,以及我从源文件导入的更大数据集的子集。我想为pop
和coe
研究访问提取rcb
,baseline
和final
的值(在我的完整数据集中有几次访问)介于两者之间,为了这个问题的目的我已经省略了。)
我可以做以下事情:
reshape(dframe,idvar='participant_id',v.names = c('pop','coe','rcb'),varying = 2:length(dframe),direction='long')
但是,最终应将pop
中的值标记为coe
。 reshape2
的文档告诉我,我应该明确引用varying
值以避免'猜测'。所以,我试着这样做:
reshape(dframe,idvar='participant_id',v.names = c('pop','coe','rcb'),varying = c('baseline_pop','baseline_coe','baseline_rcb','final_pop','final_coe','final_rcb'),direction='long')
这导致完全相同的输出,尽管明确命名varying
参数。我究竟做错了什么?据推测,由于字母表化,pop
最终会得到coe
的值,但我无法理解为什么会发生这种情况,因为我现在已经明确地声明了varying
参数...
编辑:预期输出如下:
participant_id time pop coe rcb
FDVCZX 1 6 11.19 16.74
ADSCXZ 1 6 13.6 25
AESFDZC 1 7 3.96 25
ZXCV 1 6 7.64 18.37
AGS 1 6 6.12 25
AGSFV 1 6 6.92 25
FDVCZX 2 NA NA NA
ADSCXZ 2 NA NA NA
AESFDZC 2 7.1 5.926362 25
ZXCV 2 8 4.89 NA
AGS 2 6 11.98 NA
AGSFV 2 NA NA NA
但是,正如您将看到的那样,pop
值最终会出现在coe
列中,反之亦然。
答案 0 :(得分:0)
我们可以使用melt
中的data.table
,measure
可以使用多个library(data.table)
melt(setDT(dframe), measure=patterns('pop', 'coe', 'rcb'),
value.name = c('pop', 'coe', 'rcb'), variable.name='time')
# participant_id time pop coe rcb
# 1: FDVCZX 1 6.0 11.190000 16.74
# 2: ADSCXZ 1 6.0 13.600000 25.00
# 3: AESFDZC 1 7.0 3.960000 25.00
# 4: ZXCV 1 6.0 7.640000 18.37
# 5: AGS 1 6.0 6.120000 25.00
# 6: AGSFV 1 6.0 6.920000 25.00
# 7: FDVCZX 2 NA NA NA
# 8: ADSCXZ 2 NA NA NA
# 9: AESFDZC 2 7.1 5.926362 25.00
#10: ZXCV 2 8.0 4.890000 NA
#11: AGS 2 6.0 11.980000 NA
#12: AGSFV 2 NA NA NA
列。
elm