我在其他小组图的对话中找不到答案。每个重命名(或站点名称)的总和应为100%,但横条的总和不止于此。我想知道我的数据设置不正确。
我也想添加误差线,但是也许一旦我正确地获得了复制品,我就可以弄清楚了。
testData <- read.csv("composition.csv")
testData$id <- as.factor(testData$rename)
testDataMelt <- reshape2::melt(testData, rename.vars = "rename")
ggplot(testDataMelt,
aes(x = rename, y =value, group = replicate, fill = replicate)) +
geom_bar(stat = "identity", position = "dodge") +
xlab("Lake") +
ylab("% of Sediment Mass") +
labs(fill = "") +
scale_fill_grey()
答案 0 :(得分:1)
如@PoGibas所建议,这是在将数据传递给ggplot
之前汇总数据的示例。
由于我没有易于使用的格式的数据,因此我将为3个站点制作一些虚假数据;与原始数据一样,每一行的砾石,沙子,粉砂和粘土总和最高为100%。
set.seed(2018)
df <- data.frame(rename = c("HOG", "MAR", "MO BH"),
gravel = sample(20:40, 9),
sand = sample(40:50, 9),
silt = sample(0:10, 9))
df$clay = as.integer(100 - rowSums(df[,2:4]))
这是一个data.table
(此软件包需要更多广告)的解决方案,用于计算均值和标准误差(用于误差线)。
library(ggplot2)
library(data.table) # for aggregations
# Convert to data.table object and
# calculate the means and standard errors of each variable per site.
setDT(df)
testDataMelt <- melt(df, id.vars = "rename")
testDataMelt_agg <- testDataMelt[, .(mean = mean(value),
se = sd(value)/.N),
by = .(rename, variable)]
# The mean percent of sediments sum up to 100% for each site.
# We are ready to make the graph.
ggplot(testDataMelt_agg,
aes(x = rename, y = mean, fill = variable)) +
geom_bar(stat = "identity", position = "dodge") +
# Add error bars (here +/- 1.96 SE)
geom_errorbar(aes(ymax = mean + 1.96*se,
ymin = mean - 1.96*se),
position = "dodge") +
xlab("Lake") +
ylab("% of Sediment Mass") +
labs(fill = "") +
scale_fill_grey()