如何为百分比的多个变量创建堆积条形图

时间:2012-12-28 00:51:33

标签: r ggplot2

我正在尝试使用多个变量创建堆积条形图,但我遇到两个问题:

1)我似乎无法使旋转的y轴显示百分比而不是计数。

2)我想根据“非常同意”的回复百分比对变量(desc)进行排序。

以下是我到目前为止的一个例子:

require(scales)
require(ggplot2)
require(reshape2)

# create data frame
  my.df <- data.frame(replicate(10, sample(1:4, 200, rep=TRUE)))
  my.df$id <- seq(1, 200, by = 1)

# melt
  melted <- melt(my.df, id.vars="id")

# factors
  melted$value <- factor(melted$value, 
                         levels=c(1,2,3,4),
                         labels=c("strongly disagree", 
                                  "disagree", 
                                  "agree", 
                                  "strongly agree"))
# plot
  ggplot(melted) + 
    geom_bar(aes(variable, fill=value, position="fill")) +
    scale_fill_manual(name="Responses",
                      values=c("#EFF3FF", "#BDD7E7", "#6BAED6",
                               "#2171B5"),
                      breaks=c("strongly disagree", 
                               "disagree", 
                               "agree", 
                               "strongly agree"),
                      labels=c("strongly disagree", 
                               "disagree", 
                               "agree", 
                               "strongly agree")) +
    labs(x="Items", y="Percentage (%)", title="my title") +
    coord_flip()

我要感谢几个人帮助我们做到这一点。以下是Google提供的众多网页中的一小部分:

http://www.r-bloggers.com/fumblings-with-ranked-likert-scale-data-in-r/

Create stacked barplot where each stack is scaled to sum to 100%

sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] reshape2_1.2.2  ggplot2_0.9.2.1 scales_0.2.2   

loaded via a namespace (and not attached):
 [1] colorspace_1.2-0    dichromat_1.2-4     digest_0.6.0        grid_2.15.0         gtable_0.1.1        HH_2.3-23          
 [7] labeling_0.1        lattice_0.20-10     latticeExtra_0.6-24 MASS_7.3-22         memoise_0.1         munsell_0.4        
[13] plyr_1.7.1          proto_0.3-9.2       RColorBrewer_1.0-5  rstudio_0.97.237    stringr_0.6.1       tools_2.15.0       

2 个答案:

答案 0 :(得分:4)

由于您正在使用Likert数据,因此您可能需要考虑HH包中的likert()函数。 (希望你可以指出另一个方向,因为已经有一个很好的答案来解决你原来的ggplot2方法。)

正如人们可能希望的那样,likert()以适当的方式绘制,只需要很少的斗争。 PositiveOrder=TRUE会根据项目向正方向延伸的距离对项目进行排序。 ReferenceZero参数将允许您在中性项目的中间零中心(下面不需要shown here)。并且as.percent=TRUE会将计数转换为百分数并列出边距中的实际计数(除非我们将其关闭)。

library(reshape2)
library(HH)

# create data as before
my.df <- data.frame(replicate(10, sample(1:4, 200, rep=TRUE)))
my.df$id <- seq(1, 200, by = 1)

# melt() and dcast() with reshape2 package
melted <- melt(my.df,id.var="id", na.rm=TRUE)
summd <- dcast(data=melted,variable~value, length) # note: length()
                                                   # not robust if NAs present

# give names to cols and rows for likert() to use
names(summd) <- c("Question", "strongly disagree", 
                              "disagree", 
                              "agree", 
                              "strongly agree")
rownames(summd) <- summd[,1]  # question number as rowname
summd[,1] <- NULL             

# plot
likert(summd,
       as.percent=TRUE,       # automatically scales
       main = NULL,           # or give "title",
       xlab = "Percent",      # label axis
       positive.order = TRUE, # orders by furthest right
       ReferenceZero = 2.5,   # zero point btwn levels 2&3
       ylab = "Question",     # label for left side
       auto.key = list(space = "right", columns = 1,
                     reverse = TRUE)) # make positive items on top of legend

enter image description here

答案 1 :(得分:3)

对于(1)
要获得百分比,您必须从data.frame创建melted。至少那是我能想到的方式。

# 200 is the total sum always. Using that to get the percentage
require(plyr)
df <- ddply(melted, .(variable, value), function(x) length(x$value)/200 * 100)

然后在weights中将计算出的百分比作为geom_bar提供,如下所示:

ggplot(df) + 
geom_bar(aes(variable, fill=value, weight=V1, position="fill")) +
scale_fill_manual(name="Responses",
                  values=c("#EFF3FF", "#BDD7E7", "#6BAED6",
                           "#2171B5"),
                  breaks=c("strongly disagree", 
                           "disagree", 
                           "agree", 
                           "strongly agree"),
                  labels=c("strongly disagree", 
                           "disagree", 
                           "agree", 
                           "strongly agree")) +
labs(x="Items", y="Percentage (%)", title="my title") +
coord_flip()

我不太明白(2)。你想(a)计算relative percentages(参考为“非常同意”?或者(b)你是否希望情节总是显示“非常同意”,然后“同意”等等。你可以完成(b)仅通过重新排序df中的因子,

df$value <- factor(df$value, levels=c("strongly agree", "agree", "disagree", 
                 "strongly disagree"), ordered = TRUE)

Edit:您可以按照以下顺序将variablevalue的级别重新排序:

variable.order <- names(sort(daply(df, .(variable), 
                     function(x) x$V1[x$value == "strongly agree"] ), 
                     decreasing = TRUE))
value.order <- c("strongly agree", "agree", "disagree", "strongly disagree")
df$variable <- factor(df$variable, levels = variable.order, ordered = TRUE)
df$value <- factor(df$value, levels = value.order, ordered = TRUE)