Question

如果从问卷调查表中获得this dataframe来自不同社区的人，我想创建一个显示每个社区识别程度的条形图。

事实上，我设法使用以下代码：

library(ggplot2)
df = read.csv("http://pastebin.com/raw.php?i=77QPBc5T")

ggplot(df,
       aes(x = factor(Identificación.con.el.barrio),
           fill = Nombre.barrio)
) +
  geom_histogram(position="dodge") +
  ggtitle("¿Te identificas con tu barrio?") +
  labs(x="Grado de identificación con el barrio", fill="Barrios")

导致以下情节：

然而，由于每个社区的人口数量不同，每个邻域的样本也确实不同（例如：Arcosur只有24个响应者，而Arrabal有69个），因此，结果可能会产生误导（见下文）

library(dplyr)

df = tbl_df(df)

df %>%
  group_by(Nombre.barrio) %>%
  summarise(Total = n())

Source: local data frame [10 x 2]

   Nombre.barrio Total
1       Almozara    68
2        Arcosur    24
3        Arrabal    69
4       Bombarda    20
5       Delicias    68
6          Jesús    69
7      La Bozada    32
8    Las fuentes    64
9         Oliver    68
10      Picarral    68

出于这个原因，我希望在y轴上有相对值，显示每个邻域的响应者百分比，回答每个可能的答案。不幸的是，我对如何实现这一目标一无所知，因为我对R很新。

Answer 1

library(ggplot2)
library(dplyr)
df = read.csv("http://pastebin.com/raw.php?i=77QPBc5T")

df = tbl_df(df)

d <- df %>%
  group_by(Nombre.barrio,Identificación.con.el.barrio) %>%
  summarise(Total = n()) %>%
  mutate(freq=Total/sum(Total))

ggplot(d,
       aes(x = factor(Identificación.con.el.barrio),
           y=freq,
           fill = Nombre.barrio)
) +
  geom_bar(position="dodge",stat="identity") +
  ggtitle("¿Te identificas con tu barrio?") +
  labs(x="Grado de identificación con el barrio", fill="Barrios")

ggplot中的相对Y值而不是绝对值

1 个答案: