R重塑从长到长的数据

时间:2015-12-15 07:59:59

标签: r dplyr reshape

我有一个像这样的数据框:

 [1] "drugevent"    "prr"          "prr_lowerCI"  "prr_upperCI"  "EBGM"        
[6] "EBG_lowerCI"  "EBGM_upperCI" "strata.coded" "strata"       "Reference"   

我想用ggplot为每个drugevent制作一个情节。为了做到这一点,我需要像这样格式化我的DF:

[1] "drug", "event", "measurement"(prr or EBGM), "lowerCI"(for coresponding measurement), upperCI, strata

但是,尽管在SO或R教程上有很多帖子,我还是无法核心地重塑数据。在我的最后一次尝试中,我添加了一个ID,如下:

mutate(DF, count=1:n())

融化了数据

melt(DF, id.vars="count")

然后我做了几个DF子集了感兴趣的值

subset(melted, variable in c("prr","EBGM"))

然后是上下置信区间,分层和药物事件, 但是当我像这样合并它们时:

merge(measurement, lowerCI, by="count")

最后我有重复的值,每个计数有4行。 代码很乱,结果是错误的。你能帮帮我吗?

编辑示例: 初始数据:

drugevent       prr prr_lowerCI prr_upperCI
1 CLARITHROMYCIN-Erythema Multiforme 1.3539930   0.1903270    2.517659
2 CLARITHROMYCIN-Erythema Multiforme 1.7741342   0.6647390    2.883529
 EBGM EBG_lowerCI EBGM_upperCI      strata count
1 0.9003325   0.2128934     2.772558     Infants     1
2 1.4471096   0.5997188     3.053965    Children     2

期望的结果:

    measurement     value     upperCI  strata   drug
1           prr 1.353992979  2.51765895 Infants CLARITHROMYCIN
2          EBGM  0.9009       2.77      Infants CLARITHROMYCIN
 reaction              lowerCI
1 Erythema Multiforme  2.51765895
2 Erythema Multiforme  1.447

1 个答案:

答案 0 :(得分:0)

根据我的理解,你想要一个基于prr或ebgm的长格式原始数据帧分割

dfPRR <- cbind(df[, !grepl("EBG", colnames(df))], measurement="prr")
colnames(dfPRR)[2:4] <- c("value", "lowerCI", "upperCI")
dfEBGM <- cbind(df[, !grepl("prr", colnames(df))], measurement="EBGM")
colnames(dfEBGM)[2:4] <- c("value", "lowerCI", "upperCI")
rbind(dfPRR, dfEBGM)

使用的数据

structure(list(drugevent = structure(c(1L, 1L), .Label = "CLARITHROMYCIN-Erythema Multiforme", class = "factor"), 
prr = c(1.353993, 1.7741342), prr_lowerCI = c(0.190327, 0.664739
), prr_upperCI = c(2.517659, 2.883529), EBGM = c(0.9003325, 
1.4471096), EBG_lowerCI = c(0.2128934, 0.5997188), EBGM_upperCI = c(2.772558, 
3.053965), strata = structure(1:2, .Label = c("     Infants", 
"    Children"), class = "factor"), count = 1:2), .Names = c("drugevent", 
"prr", "prr_lowerCI", "prr_upperCI", "EBGM", "EBG_lowerCI", "EBGM_upperCI", 
"strata", "count"), class = "data.frame", row.names = c(NA, -2L
))