I'd like to be able to reshape a long-form data frame into a wide-form data frame using longitudinal clinical trial data. Below is an example of the long-form format I wish to change:
structure(list(study = structure(c(2L, 2L, 1L, 1L, 1L), .Label = c("Jones,
1996", "Smith, 1999"), class = "factor"), group_allocation =
structure(c(2L, 1L, 2L, 3L, 1L), .Label = c("control", "intervention_1",
"intervention_2"), class = "factor"), outcome = structure(c(2L, 2L, 1L,
1L, 1L), .Label = c("anxiety", "depression"), class = "factor"), bl_mean =
c(6.5, 4.5, 3.7, 4.2, 5.3), fu_timepoint = c(6L, 6L, 12L, 12L, 12L),
fu_mean = c(5.2, 7.5, 2.5, 2.7, 6.3), mean_diff = c(-2.3, NA, -3.8, -3.6,
NA)), class = "data.frame", row.names = c(NA, -5L))
study group_allocation outcome bl_mean fu_timepoint fu_mean mean_diff
1 Smith, 1999 intervention_1 depression 6.5 6 5.2 -2.3
2 Smith, 1999 control depression 4.5 6 7.5 NA
3 Jones, 1996 intervention_1 anxiety 3.7 12 2.5 -3.8
4 Jones, 1996 intervention_2 anxiety 4.2 12 2.7 -3.6
5 Jones, 1996 control anxiety 5.3 12 6.3 NA
My problem is that I need only one observation/row for every intervention group in the group_allocation column (labeled "intervention_1" and "intervention_2") for each study, and I need the control group data (labeled "control" in the group_allocation column) to move into separate columns in the same rows as the each intervention groups in order to analyze the data comparing the intervention groups with the control groups (across the data frame). Here is what I'm looking for:
structure(list(study = structure(c(2L, 1L, 1L), .Label = c("Jones, 1996",
"Smith, 1999"), class = "factor"), ig_group_allocation = structure(c(1L,
1L, 2L), .Label = c("intervention_1", "intervention_2"), class =
"factor"), outcome = structure(c(2L, 1L, 1L), .Label = c("anxiety",
"depression"), class = "factor"), ig_bl_mean = c(6.5, 3.7, 4.2),
fu_timepoint = c(6L, 12L, 12L), ig_fu_mean = c(5.2, 2.5, 2.7), mean_diff =
c(-2.3, -3.8, -3.6), cg_group_allocation = structure(c(1L, 1L, 1L), .Label
= "control", class = "factor"), cg_bl_mean = c(4.5, 5.3, 5.3), cg_fu_mean
= c(7.5, 6.3, 6.3)), class = "data.frame", row.names = c(NA, -3L))
study ig_group_allocation outcome ig_bl_mean fu_timepoint ig_fu_meanmean_diff cg_group_allocation cg_bl_mean cg_fu_mean
1 Smith, 1999 intervention_1 depression 6.5 6 5.2 -2.3 control 4.5 7.5
2 Jones, 1996 intervention_1 anxiety 3.7 12 2.5 -3.8 control 5.3 6.3
3 Jones, 1996 intervention_2 anxiety 4.2 12 2.7 -3.6 control 5.3 6.3
I have read through numerous other data reshaping questions on stack overflow, but have yet to find a solution to a problem similar to mine.
Thank you!
答案 0 :(得分:1)
将数据分为两个数据框,一个用于控件,一个用于干预,然后将它们合并回去。
df
study group_allocation outcome bl_mean fu_timepoint fu_mean mean_diff
1 Smith, 1999 intervention_1 depression 6.5 6 5.2 -2.3
2 Smith, 1999 control depression 4.5 6 7.5 NA
3 Jones, 1996 intervention_1 anxiety 3.7 12 2.5 -3.8
4 Jones, 1996 intervention_2 anxiety 4.2 12 2.7 -3.6
5 Jones, 1996 control anxiety 5.3 12 6.3 NA
interventions<-df[grep("intervention", df$group_allocation),]
interventions
study group_allocation outcome bl_mean fu_timepoint fu_mean mean_diff
1 Smith, 1999 intervention_1 depression 6.5 6 5.2 -2.3
3 Jones, 1996 intervention_1 anxiety 3.7 12 2.5 -3.8
4 Jones, 1996 intervention_2 anxiety 4.2 12 2.7 -3.6
controls<-df[grep("control", df$group_allocation),]
controls
study group_allocation outcome bl_mean fu_timepoint fu_mean mean_diff
2 Smith, 1999 control depression 4.5 6 7.5 NA
5 Jones, 1996 control anxiety 5.3 12 6.3 NA
names(controls)<-paste0("cg_", names(controls)) #add cg prefix to colnames
new_df<-merge(interventions, controls, by.x="study", by.y="cg_study", all.x=TRUE)
new_df
study group_allocation outcome bl_mean fu_timepoint fu_mean mean_diff cg_group_allocation cg_outcome cg_bl_mean cg_fu_timepoint cg_fu_mean cg_mean_diff
1 Jones, 1996 intervention_1 anxiety 3.7 12 2.5 -3.8 control anxiety 5.3 12 6.3 NA
2 Jones, 1996 intervention_2 anxiety 4.2 12 2.7 -3.6 control anxiety 5.3 12 6.3 NA
3 Smith, 1999 intervention_1 depression 6.5 6 5.2 -2.3 control depression 4.5 6 7.5 NA