R中的堆积条形图与比率线过度绘图

时间:2015-05-17 03:12:25

标签: r bar-chart stacked-chart

我的数据每行有一个观察结果:

rm(list = ls(all = TRUE))
mydf <- data.frame(kind = sample(c("good", "bad"), 100, replace = TRUE), var1 = sample(c("yes", "no", "yes"), 100, replace = TRUE), var2 = sample(c("yes", "no"), 100, replace = TRUE), var3 = sample(c( "yes", "no"), 100, replace = TRUE), var4 = sample(c( "yes", "no", "yes", "no", "NA"), 100, replace = TRUE), var5 = sample(c( "yes", "no", "yes", "no", "NA"), 100, replace = TRUE), var6 = sample(c( "yes", "no", "yes", "no", "NA"), 100, replace = TRUE))

我需要:制作一个带有并排条形图对的堆积条形图,每种类型一条(好与坏),显示每种类型中有多少有0&#34;是&#34 ; vars,有多少人有1&#34;是&#34; var等等,直到&#34;是&#34;所有6个变种。 Y轴=计数,X轴=七个类别(0是vars,1是var等)。每个条形应该是一个颜色编码的堆叠条,显示每个var对条形总高度的贡献。 NA被视为&#34; no&#34;。此外,过喷线显示七个X轴类别中每一个的计数(好)/计数(坏)的比率

1 个答案:

答案 0 :(得分:1)

根据您的描述,这是我理解您尝试实现的目标。它包括三个步骤:

  1. 用&#34; no&#34;。
  2. 替换所有NA
  3. 加上所有&#34;是&#34;以行方式。
  4. 实际绘制图表。
  5. 所以解决每一点。

    让我们假设您的数据如下:

    mydf <- data.frame(kind = sample(c("good", "bad"), 100, replace = TRUE), 
                       var1 = sample(c("yes", "no", "yes"), 100, replace = TRUE), 
                       var2 = sample(c("yes", "no"), 100, replace = TRUE), 
                       var3 = sample(c( "yes", "no"), 100, replace = TRUE), 
                       var4 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE), 
                       var5 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE), 
                       var6 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE))
    

    <强> 1

    用&#34; no&#34;替换所有NA?只会是:

    mydf[is.na(mydf)] <- "no"
    

    这里我们正在搜索data.frame并使用赋值运算符替换所有na

    <强> 2

    为了以行方式添加所有内容,我使用了apply函数。在apply函数中,您可以使用?apply来确定参数,但简而言之,您(第1个arg)只需指定data.frame,(第2个arg)指定方向,1,用于行方式列为2,(第3个arg)指定要应用于方向的函数。

    mydf$total.yes <- apply(mydf, 1, function(x) {
      return(length(x[x=="yes"]))
    })
    

    第3

    最后的情节。制作情节的最简单和美观的方法是使用ggplot。通过键入install.packages("ggplot2")来安装它。对于条形图,我将参考此[文档](此处:http://docs.ggplot2.org/0.9.3.1/geom_bar.html),否则代码将如下所示。

    library(ggplot2)
    
    ggplot(mydf, aes(total.yes, fill=kind)) +
      geom_bar(position="dodge")
    

    将产生下图:

    enter image description here

    我希望这能回答你所追求的问题。完整代码如下:

    mydf <- data.frame(kind = sample(c("good", "bad"), 100, replace = TRUE), 
                       var1 = sample(c("yes", "no", "yes"), 100, replace = TRUE), 
                       var2 = sample(c("yes", "no"), 100, replace = TRUE), 
                       var3 = sample(c( "yes", "no"), 100, replace = TRUE), 
                       var4 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE), 
                       var5 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE), 
                       var6 = sample(c( "yes", "no", "yes", "no", NA), 100, replace = TRUE))
    
    library(ggplot2)
    
    # replace all NA values to no, this step seems redundant because you're only 
    # counting yes's
    mydf[is.na(mydf)] <- "no"
    
    # for each row figure out how many "yes" there are...
    mydf$total.yes <- apply(mydf, 1, function(x) {
      return(length(x[x=="yes"]))
    })
    
    # see example here: http://docs.ggplot2.org/0.9.3.1/geom_bar.html
    #using your data
    
    
    ggplot(mydf, aes(total.yes, fill=kind)) +
      geom_bar(position="dodge")
    

    geom_bar实际上是默认堆叠的(请参阅[文档](此处:http://docs.ggplot2.org/0.9.3.1/geom_bar.html),如果它是堆叠的,它将类似于以下内容:

    ggplot(mydf, aes(total.yes, fill=kind)) +
      geom_bar()
    

    enter image description here