绘制具有不同长度的2个变量以显示比例,而不是绝对数字

时间:2018-05-31 12:02:04

标签: r ggplot2

我想比较两个不同调查中提出的同一问题的回答。两次调查的结果分为两个数据框,DF1和DF2,问题的答案在变量V1中:

DF1 <- data.frame(V1 = factor(c("Option1", "Option1", "Option1", "Option2", NA)),
                  ID1 = factor(c("Resp1", "Resp1", "Resp3", "Resp4", "Resp5")))
DF2 <- data.frame(V1 = factor(c("Option1", "Option1", "Option1", "Option2", "Option2", NA, "Option1")),
                  ID2 = factor(c("PersonA", "PersonB", "PersonC", "PersonD", "PersonE", "PersonF", "PersonG")))

由于对两次调查的回复数量不同,当我将两个调查的回答一个接一个地绘制出来时,得到的条形图可能会令人困惑,难以理解:

library(ggplot2)
library(dplyr)
DF1 <- DF1 %>% group_by(V1) %>% summarize(DF="DF1", n=n())
DF2 <- DF2 %>% group_by(V1) %>% summarize(DF="DF2", n=n())
DF <- rbind(DF1, DF2) %>% 
  filter(!is.na(V1))
ggplot(DF, aes(x=V1, y=n, fill=DF)) + geom_bar(stat="identity", position="dodge")

我想更改代码,以便条形图包含每个调查选择每个选项的受访者的比例,而不是他们的数字。怎么办呢?

1 个答案:

答案 0 :(得分:2)

DF1 <- data.frame(V1 = factor(c("Option1", "Option1", "Option1", "Option2", NA)),
                                ID1 = factor(c("Resp1", "Resp1", "Resp3", "Resp4", "Resp5")))

DF2 <- data.frame(V1 = factor(c("Option1", "Option1", "Option1", "Option2", "Option2", NA, "Option1")),
                                    ID2 = factor(c("PersonA", "PersonB", "PersonC", "PersonD", "PersonE", "PersonF", "PersonG")))

DF1 <- DF1 %>% group_by(V1) %>% summarize(DF="DF1", n=n()) %>% mutate(total = sum(n))

DF2 <- DF2 %>% group_by(V1) %>% summarize(DF="DF2", n=n()) %>% mutate(total = sum(n))

DF <- rbind(DF1, DF2) %>% 
        filter(!is.na(V1))


ggplot(DF, aes(x=V1, y=n/total, fill=DF)) + geom_bar(stat="identity", position="dodge")