Complex reshaping of data frame from long-form to wide-form using values in multiple “key” columns in R

时间:2019-01-18 18:42:40

标签: r reshape data-manipulation

I'd like to be able to reshape a long-form data frame into a wide-form data frame using longitudinal clinical trial data. Below is an example of the long-form format I wish to change:

structure(list(study = structure(c(2L, 2L, 1L, 1L, 1L), .Label = c("Jones, 
1996", "Smith, 1999"), class = "factor"), group_allocation = 
structure(c(2L, 1L, 2L, 3L, 1L), .Label = c("control", "intervention_1", 
"intervention_2"), class = "factor"), outcome = structure(c(2L, 2L, 1L, 
1L, 1L), .Label = c("anxiety", "depression"), class = "factor"), bl_mean = 
c(6.5, 4.5, 3.7, 4.2, 5.3), fu_timepoint = c(6L, 6L, 12L, 12L, 12L), 
fu_mean = c(5.2, 7.5, 2.5, 2.7, 6.3), mean_diff = c(-2.3, NA, -3.8, -3.6, 
NA)), class = "data.frame", row.names = c(NA, -5L))

  study       group_allocation outcome bl_mean fu_timepoint fu_mean mean_diff
1 Smith, 1999 intervention_1 depression  6.5            6     5.2      -2.3
2 Smith, 1999 control        depression  4.5            6     7.5       NA
3 Jones, 1996 intervention_1 anxiety     3.7           12     2.5      -3.8
4 Jones, 1996 intervention_2 anxiety     4.2           12     2.7      -3.6
5 Jones, 1996 control        anxiety     5.3           12     6.3       NA

My problem is that I need only one observation/row for every intervention group in the group_allocation column (labeled "intervention_1" and "intervention_2") for each study, and I need the control group data (labeled "control" in the group_allocation column) to move into separate columns in the same rows as the each intervention groups in order to analyze the data comparing the intervention groups with the control groups (across the data frame). Here is what I'm looking for:

structure(list(study = structure(c(2L, 1L, 1L), .Label = c("Jones, 1996", 
"Smith, 1999"), class = "factor"), ig_group_allocation = structure(c(1L, 
1L, 2L), .Label = c("intervention_1", "intervention_2"), class = 
"factor"), outcome = structure(c(2L, 1L, 1L), .Label = c("anxiety", 
"depression"), class = "factor"), ig_bl_mean = c(6.5, 3.7, 4.2), 
fu_timepoint = c(6L, 12L, 12L), ig_fu_mean = c(5.2, 2.5, 2.7), mean_diff = 
c(-2.3, -3.8, -3.6), cg_group_allocation = structure(c(1L, 1L, 1L), .Label 
= "control", class = "factor"), cg_bl_mean = c(4.5, 5.3, 5.3), cg_fu_mean 
= c(7.5, 6.3, 6.3)), class = "data.frame", row.names = c(NA, -3L))

study             ig_group_allocation outcome ig_bl_mean fu_timepoint ig_fu_meanmean_diff cg_group_allocation cg_bl_mean cg_fu_mean
1 Smith, 1999      intervention_1    depression     6.5            6        5.2      -2.3             control        4.5        7.5
2 Jones, 1996      intervention_1    anxiety        3.7           12        2.5      -3.8             control        5.3        6.3
3 Jones, 1996      intervention_2    anxiety        4.2           12        2.7      -3.6             control        5.3        6.3

I have read through numerous other data reshaping questions on stack overflow, but have yet to find a solution to a problem similar to mine.

Thank you!

1 个答案:

答案 0 :(得分:1)

将数据分为两个数据框,一个用于控件,一个用于干预,然后将它们合并回去。

df
        study group_allocation    outcome bl_mean fu_timepoint fu_mean mean_diff
1 Smith, 1999   intervention_1 depression     6.5            6     5.2      -2.3
2 Smith, 1999          control depression     4.5            6     7.5        NA
3 Jones, 1996   intervention_1    anxiety     3.7           12     2.5      -3.8
4 Jones, 1996   intervention_2    anxiety     4.2           12     2.7      -3.6
5 Jones, 1996          control    anxiety     5.3           12     6.3        NA

 interventions<-df[grep("intervention", df$group_allocation),]

 interventions
        study group_allocation    outcome bl_mean fu_timepoint fu_mean mean_diff
1 Smith, 1999   intervention_1 depression     6.5            6     5.2      -2.3
3 Jones, 1996   intervention_1    anxiety     3.7           12     2.5      -3.8
4 Jones, 1996   intervention_2    anxiety     4.2           12     2.7      -3.6


 controls<-df[grep("control", df$group_allocation),]

 controls
        study group_allocation    outcome bl_mean fu_timepoint fu_mean mean_diff
2 Smith, 1999          control depression     4.5            6     7.5        NA
5 Jones, 1996          control    anxiety     5.3           12     6.3        NA

 names(controls)<-paste0("cg_", names(controls)) #add cg prefix to colnames

 new_df<-merge(interventions, controls, by.x="study", by.y="cg_study", all.x=TRUE)

 new_df
        study group_allocation    outcome bl_mean fu_timepoint fu_mean mean_diff cg_group_allocation cg_outcome cg_bl_mean cg_fu_timepoint cg_fu_mean cg_mean_diff
1 Jones, 1996   intervention_1    anxiety     3.7           12     2.5      -3.8             control    anxiety        5.3              12        6.3           NA
2 Jones, 1996   intervention_2    anxiety     4.2           12     2.7      -3.6             control    anxiety        5.3              12        6.3           NA
3 Smith, 1999   intervention_1 depression     6.5            6     5.2      -2.3             control depression        4.5               6        7.5           NA