如何使用两列中的测量变量和相关标准偏差来融合数据框

时间:2016-02-17 14:32:13

标签: r ggplot2

我有一个预先制作的数据框,其中每个测量变量都有一个标准偏差的相邻列:

df <- 
structure(list(Factor = structure(1:3, .Label = c("K", "L", "M"
), class = "factor"), A = c(52127802.82, 63410325.61, 76455661.87
), SD = c(9124562.98, 21975533.21, 9864019.36), B = c(63752980.62, 
68303447.17, 73250794.15), SD.1 = c(34800000, 22600000, 6090000
), C = c(103512032.04, 65074190.8, 92686982.97), SD.2 = c(23900000, 
20800000, 38300000), D = c(100006463.22, NA, 37406494.3)), .Names = c("Factor", 
"A", "SD", "B", "SD.1", "C", "SD.2", "D"), class = "data.frame", row.names = c(NA, 
-3L))

(SD.1,SD.2是自动重命名的;最初它们都被称为&#34; SD&#34;)。 我希望通过因子融入长格式:

library(reshape)
df.melt <- melt(df, id.vars="Factor").

但是,我想让融化的对象保持SD列附加到它们的相关列:

Factor Variable value value.sd
K      A        52127802.82 9124562

所以,我可以在geom_errorbar(ymin=sd.value, ymax=sd.value)中致电ggplot(df.melt, aes(Factor, value)) + geom_bar(stat="identity") + facet_wrap(~variable)。 这是可能的,即使SD的行名称不同吗?

1 个答案:

答案 0 :(得分:4)

首先,我会从数据集中删除df$D因为我认为这是df$D <- NULL的错误:

#   Factor        A       SD        B     SD.1         C     SD.2
# 1      K 52127803  9124563 63752981 34800000 103512032 23900000
# 2      L 63410326 21975533 68303447 22600000  65074191 20800000
# 3      M 76455662  9864019 73250794  6090000  92686983 38300000

然后,我会重命名列(这看起来比它复杂得多,我鼓励反馈/建议,使这部分更直接) - 我重命名列的原因是我可以使用{{1来自包separate的<}和spread

tidyr

这使我能够names(df)[-1][seq(2, length(names(df)) - 1, 2)] <- paste0(names(df)[-1][seq(1, length(names(df)) - 1, 2)], "-SD") names(df)[-1][seq(1, length(names(df)) - 1, 2)] <- paste0(names(df)[-1][seq(1, length(names(df)) - 1, 2)], "-measure") df # Factor A-measure A-SD B-measure B-SD C-measure C-SD # 1 K 52127803 9124563 63752981 34800000 103512032 23900000 # 2 L 63410326 21975533 68303447 22600000 65074191 20800000 # 3 M 76455662 9864019 73250794 6090000 92686983 38300000

df_clean

既然我们的数据集干净整洁,我们可以相应地进行绘图:

df_clean <- df %>%
  gather(measure, value, -Factor) %>%
  separate(measure, c("measure_letter", "temp_var")) %>%
  spread(temp_var, value)

df_clean
#   Factor measure_letter   measure       SD
# 1      K              A  52127803  9124563
# 2      K              B  63752981 34800000
# 3      K              C 103512032 23900000
# 4      L              A  63410326 21975533
# 5      L              B  68303447 22600000
# 6      L              C  65074191 20800000
# 7      M              A  76455662  9864019
# 8      M              B  73250794  6090000
# 9      M              C  92686983 38300000

Plot