Question

我在其他小组图的对话中找不到答案。每个重命名（或站点名称）的总和应为100％，但横条的总和不止于此。我想知道我的数据设置不正确。

我也想添加误差线，但是也许一旦我正确地获得了复制品，我就可以弄清楚了。

testData <- read.csv("composition.csv")
testData$id <- as.factor(testData$rename) 
testDataMelt <- reshape2::melt(testData, rename.vars = "rename")
ggplot(testDataMelt, 
       aes(x = rename, y =value, group = replicate, fill = replicate)) + 
  geom_bar(stat = "identity", position = "dodge") +
  xlab("Lake") + 
  ylab("% of Sediment Mass") +
  labs(fill = "") + 
  scale_fill_grey()

Answer 1

如@PoGibas所建议，这是在将数据传递给ggplot之前汇总数据的示例。

由于我没有易于使用的格式的数据，因此我将为3个站点制作一些虚假数据；与原始数据一样，每一行的砾石，沙子，粉砂和粘土总和最高为100％。

set.seed(2018)
df <- data.frame(rename = c("HOG", "MAR", "MO BH"),
                 gravel = sample(20:40, 9),
                 sand   = sample(40:50, 9),
                 silt   = sample(0:10, 9))
df$clay = as.integer(100 - rowSums(df[,2:4]))

这是一个data.table（此软件包需要更多广告）的解决方案，用于计算均值和标准误差（用于误差线）。

library(ggplot2)
library(data.table) # for aggregations 

# Convert to data.table object and 
# calculate the means and standard errors of each variable per site.
setDT(df)
testDataMelt <- melt(df, id.vars = "rename")
testDataMelt_agg <- testDataMelt[, .(mean = mean(value), 
                                     se = sd(value)/.N), 
                                 by = .(rename, variable)]
# The mean percent of sediments sum up to 100% for each site.
# We are ready to make the graph.

ggplot(testDataMelt_agg, 
       aes(x = rename, y = mean, fill = variable)) + 
  geom_bar(stat = "identity", position = "dodge") +
  # Add error bars (here +/- 1.96 SE)
  geom_errorbar(aes(ymax = mean + 1.96*se, 
                    ymin = mean - 1.96*se),
                position = "dodge") +
  xlab("Lake") + 
  ylab("% of Sediment Mass") +
  labs(fill = "") + 
  scale_fill_grey()

使用复制分组条形图

1 个答案: