我想比较两个不同调查中提出的同一问题的回答。两次调查的结果分为两个数据框,DF1和DF2,问题的答案在变量V1中:
DF1 <- data.frame(V1 = factor(c("Option1", "Option1", "Option1", "Option2", NA)),
ID1 = factor(c("Resp1", "Resp1", "Resp3", "Resp4", "Resp5")))
DF2 <- data.frame(V1 = factor(c("Option1", "Option1", "Option1", "Option2", "Option2", NA, "Option1")),
ID2 = factor(c("PersonA", "PersonB", "PersonC", "PersonD", "PersonE", "PersonF", "PersonG")))
由于对两次调查的回复数量不同,当我将两个调查的回答一个接一个地绘制出来时,得到的条形图可能会令人困惑,难以理解:
library(ggplot2)
library(dplyr)
DF1 <- DF1 %>% group_by(V1) %>% summarize(DF="DF1", n=n())
DF2 <- DF2 %>% group_by(V1) %>% summarize(DF="DF2", n=n())
DF <- rbind(DF1, DF2) %>%
filter(!is.na(V1))
ggplot(DF, aes(x=V1, y=n, fill=DF)) + geom_bar(stat="identity", position="dodge")
我想更改代码,以便条形图包含每个调查选择每个选项的受访者的比例,而不是他们的数字。怎么办呢?
答案 0 :(得分:2)
DF1 <- data.frame(V1 = factor(c("Option1", "Option1", "Option1", "Option2", NA)),
ID1 = factor(c("Resp1", "Resp1", "Resp3", "Resp4", "Resp5")))
DF2 <- data.frame(V1 = factor(c("Option1", "Option1", "Option1", "Option2", "Option2", NA, "Option1")),
ID2 = factor(c("PersonA", "PersonB", "PersonC", "PersonD", "PersonE", "PersonF", "PersonG")))
DF1 <- DF1 %>% group_by(V1) %>% summarize(DF="DF1", n=n()) %>% mutate(total = sum(n))
DF2 <- DF2 %>% group_by(V1) %>% summarize(DF="DF2", n=n()) %>% mutate(total = sum(n))
DF <- rbind(DF1, DF2) %>%
filter(!is.na(V1))
ggplot(DF, aes(x=V1, y=n/total, fill=DF)) + geom_bar(stat="identity", position="dodge")