使用ggplot2的多选变量

时间:2014-03-15 17:57:17

标签: r ggplot2

我想我在这里看不到明显的东西。我有一个多项选择题(date here),有5个答案类别。

我想将所有5个变量融合在一起,以获得一个带ggplot2的图形。这是我的代码:

mydata <- data.frame(data$Q006_01, data$Q006_02, data$Q006_03, data$Q006_04, data$Q006_05) # multiple choice question
md <- melt(mydata, id=c("data.Q006_01", "data.Q006_02", "data.Q006_03", "data.Q006_04", "data.Q006_05"))
luogo_lavoro <- factor(md[,1]) # error here?
ggplot(data, aes(x=luogo_lavoro)) + geom_histogram() + xlab("") + ylab("Number of participants") + ggtitle("If you had to choose now, where would you be willing to accept a job?") + theme(axis.text.y = element_text(colour = "black"), axis.text.x = element_text(colour = "black")) + scale_x_discrete(labels=str_wrap(c("in the district I live in", "in another district as long as reachable within a dayride", "in the north of Italy", "in the rest of Italy", "abroad", "NA"), width=30)) + ggsave((filename="luogo_lavoro.pdf"), scale = 1, width = par("din")[1], height = par("din")[2], units = c("in", "cm", "mm"), dpi = 300, limitsize = TRUE)

我在这里错了什么?

2 个答案:

答案 0 :(得分:3)

喜欢这个吗?

library(ggplot2)
library(reshape2)
library(stringr)
data <- data.frame(id=1:nrow(data),data)
md <- melt(data,id="id")
ggplot(subset(md,value & !is.na(value)), aes(x=variable)) + 
  geom_histogram(colour="grey50",fill="lightgreen") + xlab("") + ylab("Number of participants") + 
  ggtitle("If you had to choose now, where would you be willing to accept a job?") + 
  theme(axis.text.y = element_text(colour = "black"), 
        axis.text.x = element_text(colour = "black")) + 
  scale_x_discrete(labels=str_wrap(c("in the district I live in", 
                                     "in another district as long as reachable within a dayride", 
                                     "in the north of Italy", "in the rest of Italy", "abroad", "NA"), width=30)) +
  coord_flip()+
  ggsave((filename="luogo_lavoro.pdf"), scale = 1, width = par("din")[1], height = par("din")[2], 
         units = c("in", "cm", "mm"), dpi = 300, limitsize = TRUE)

melt(...)中,id=...参数必须指定一个区分不同行的列(相当于rownames)。所以我在数据中添加了一个id列并将其融合在一起。现在md有三列:idvariablevaluevariable包含以前为列名称的内容,因此Q006_01等,value包含TF,具体取决于响应。如果没有答案,value也可以包含NA

因此,在调用ggplot(...)时,我们使用md的子集,其中响应(value)为TRUE,而不是NA。执行此操作,geom_hist(...)计算TRUEs的数量。我在最后添加了coord_flip(),以便标签更具可读性。

答案 1 :(得分:0)

您可能需要将md传递给ggplot而不是rawdata。此外,最好将luogo_lavoro作为md的一部分:

md$luogo_lavoro <- factor(md[,1])