Question

如果following dataframe我已成功创建以下图表：

library(ggplot2)

df = read.csv("http://pastebin.com/raw.php?i=MLTKev3z")

ggplot(df,
       aes(x = factor(Identificación.con.el.barrio),
           fill = Nombre.barrio)
) +
  geom_histogram(position="dodge") +
  ggtitle("¿Te identificas con tu barrio?") +
  labs(x="Grado de identificación con el barrio", fill="Barrios")

导致以下情节：

但是，我想添加一个新列，其中每个观察点的平均结果为“Grado”变量（每个邻域没有分层 - 也就是“barrio”），所以我可以将每个邻域结果与城市的。

有人可以帮助我实现这个目标吗？

Answer 1

不优雅，但有效。涉及更改原始数据以生成频率表，然后添加组平均值。这需要在ggplot中使用geom_bar（）而不是geom_histogram（）。结果完全相同。

# Make a frequency table of your data
library('plyr')
df2 <- ddply(df, .(Barrio,Grado), summarise, Freq=length(Grado))

# Make a table of averages
avg <- data.frame(as.data.frame(table(df2$Grado)/3,stringsAsFactors=F))
names(avg)[1] <- "Grado"
avg$Barrio <- "Average"

# Combine the tables
df2 <- rbind(df2, avg)
df2$Grado <- as.character(df2$Grado)
df2[is.na(df2$Grado),"Grado"] <- "N/A"

# Plot using a barplot instead of a histogram
ggplot(df2, aes(x=Grado,y=Freq,fill=Barrio)) +
  geom_bar(stat="identity",position=position_dodge()) +
  scale_x_discrete("Grado de identificación con el barrio") +
  scale_y_continuous("Count")

注意：我还将变量名称更改为更简单的名称，因此缩放标签。

结果如下：

将绘图变量与平均结果进行比较

1 个答案: