Question

我正在绘制一个分类变量，而不是显示每个类别值的计数。

我正在寻找一种让ggplot显示该类别中值的百分比的方法。当然，有可能用计算出的百分比创建另一个变量并绘制一个变量，但我必须做几十次，我希望在一个命令中实现它。

我正在尝试像

这样的东西

qplot(mydataf) +
  stat_bin(aes(n = nrow(mydataf), y = ..count../n)) +
  scale_y_continuous(formatter = "percent")

但我必须错误地使用它，因为我遇到了错误。

为了轻松重现设置，这是一个简化的例子：

mydata <- c ("aa", "bb", NULL, "bb", "cc", "aa", "aa", "aa", "ee", NULL, "cc");
mydataf <- factor(mydata);
qplot (mydataf); #this shows the count, I'm looking to see % displayed.

在实际案例中，我可能会使用ggplot而不是qplot，但使用stat_bin的正确方法仍然无法实现。

我也试过这四种方法：

ggplot(mydataf, aes(y = (..count..)/sum(..count..))) + 
  scale_y_continuous(formatter = 'percent');

ggplot(mydataf, aes(y = (..count..)/sum(..count..))) + 
  scale_y_continuous(formatter = 'percent') + geom_bar();

ggplot(mydataf, aes(x = levels(mydataf), y = (..count..)/sum(..count..))) + 
  scale_y_continuous(formatter = 'percent');

ggplot(mydataf, aes(x = levels(mydataf), y = (..count..)/sum(..count..))) + 
  scale_y_continuous(formatter = 'percent') + geom_bar();

但所有4都给出了：

Error: ggplot2 doesn't know how to deal with data of class factor

的简单情况会出现同样的错误

ggplot (data=mydataf, aes(levels(mydataf))) +
  geom_bar()

所以显然ggplot如何与单个向量进行交互。我正在挠头，谷歌搜索该错误给出一个result。

Answer 1

由于答案已经解决，ggplot语法有了一些有意义的更改。总结上述评论中的讨论：

 require(ggplot2)
 require(scales)

 p <- ggplot(mydataf, aes(x = foo)) +  
        geom_bar(aes(y = (..count..)/sum(..count..))) + 
        ## version 3.0.0
        scale_y_continuous(labels=percent)

以下是使用mtcars：

的可重现示例

 ggplot(mtcars, aes(x = factor(hp))) +  
        geom_bar(aes(y = (..count..)/sum(..count..))) + 
        scale_y_continuous(labels = percent) ## version 3.0.0

这个问题目前是谷歌搜索'ggplot count vs百分比直方图'的第一名，所以希望这有助于提取当前所有关于已接受答案的评论中的信息。

备注：如果未将hp设置为因子，ggplot将返回：

Answer 2

此修改后的代码应该可以正常工作

p = ggplot(mydataf, aes(x = foo)) + 
    geom_bar(aes(y = (..count..)/sum(..count..))) + 
    scale_y_continuous(formatter = 'percent')

如果您的数据有NA并且您不希望它们包含在图中，请将na.omit（mydataf）作为参数传递给ggplot。

希望这会有所帮助。

Answer 3

使用ggplot2版本2.1.0，它是

+ scale_y_continuous(labels = scales::percent)

Answer 4

截至2017年3月，ggplot2 2.2.1我认为最佳解决方案在Hadley Wickham的R for data science book中有所解释：

ggplot(mydataf) + stat_count(mapping = aes(x=foo, y=..prop.., group=1))

stat_count计算两个变量：默认使用count，但您可以选择使用显示比例的prop。

Answer 5

如果您想在条形图上标记y轴和的百分比：

library(ggplot2)
library(scales)
ggplot(mtcars, aes(x = as.factor(am))) +
  geom_bar(aes(y = (..count..)/sum(..count..))) +
  geom_text(aes(y = ((..count..)/sum(..count..)), label = scales::percent((..count..)/sum(..count..))), stat = "count", vjust = -0.25) +
  scale_y_continuous(labels = percent) +
  labs(title = "Manual vs. Automatic Frequency", y = "Percent", x = "Automatic Transmission")

添加条形标签时，您可以通过添加到结尾来省略y轴以获得更清晰的图表：

  theme(
        axis.text.y=element_blank(), axis.ticks=element_blank(),
        axis.title.y=element_blank()
  )

Answer 6

如果你想要百分比标签但是在y轴上有实际的Ns，试试这个：

    library(scales)
perbar=function(xx){
      q=ggplot(data=data.frame(xx),aes(x=xx))+
      geom_bar(aes(y = (..count..)),fill="orange")
       q=q+    geom_text(aes(y = (..count..),label = scales::percent((..count..)/sum(..count..))), stat="bin",colour="darkgreen") 
      q
    }
    perbar(mtcars$disp)

Answer 7

这是分面数据的解决方法。（@Andrew接受的答案在这种情况下不起作用。）想法是使用dplyr计算百分比值，然后使用geom_col创建绘图。

library(ggplot2)
library(scales)
library(magrittr)
library(dplyr)

binwidth <- 30

mtcars.stats <- mtcars %>%
  group_by(cyl) %>%
  mutate(bin = cut(hp, breaks=seq(0,400, binwidth), 
               labels= seq(0+binwidth,400, binwidth)-(binwidth/2)),
         n = n()) %>%
  group_by(cyl, bin) %>%
  summarise(p = n()/n[1]) %>%
  ungroup() %>%
  mutate(bin = as.numeric(as.character(bin)))

ggplot(mtcars.stats, aes(x = bin, y= p)) +  
  geom_col() + 
  scale_y_continuous(labels = percent) +
  facet_grid(cyl~.)

这是情节：

Answer 8

从 version 3.3 of ggplot2 开始，我们可以使用方便的 after_stat() 函数。

我们可以做一些类似于@Andrew 的回答，但不使用 .. 语法：

# original example data
mydata <- c("aa", "bb", NULL, "bb", "cc", "aa", "aa", "aa", "ee", NULL, "cc")

# display percentages
library(ggplot2)
ggplot(mapping = aes(x = mydata,
                     y = after_stat(count/sum(count)))) +
  geom_bar() +
  scale_y_continuous(labels = scales::percent)

您可以在 geom_ 和 stat_ 函数的文档中找到所有可用的“计算变量”。例如，对于 geom_bar()，您可以访问 count 和 prop 变量。（请参阅documentation for computed variables。）

关于您的 NULL 值的一个评论：当您创建向量时，它们会被忽略（即您最终得到长度为 9，而不是 11 的向量）。如果您真的想跟踪丢失的数据，则必须改用 NA（ggplot2 会将 NA 放在图的右端）：

# use NA instead of NULL
mydata <- c("aa", "bb", NA, "bb", "cc", "aa", "aa", "aa", "ee", NA, "cc")
length(mydata)
#> [1] 11

# display percentages
library(ggplot2)
ggplot(mapping = aes(x = mydata,
                     y = after_stat(count/sum(count)))) +
  geom_bar() +
  scale_y_continuous(labels = scales::percent)

^{由 reprex package (v1.0.0) 于 2021 年 2 月 9 日创建}

（请注意，使用 chr 或 fct 数据不会对您的示例产生影响。）

Answer 9

请注意，如果变量是连续的，则必须使用geom_histogram（），因为该函数会将变量按“ bins”分组。

df <- data.frame(V1 = rnorm(100))

ggplot(df, aes(x = V1)) +  
  geom_histogram(aes(y = (..count..)/sum(..count..))) 

# if you use geom_bar(), with factor(V1), each value of V1 will be treated as a
# different category. In this case this does not make sense, as the variable is 
# really continuous. With the hp variable of the mtcars (see previous answer), it 
# worked well since hp was not really continuous (check unique(mtcars$hp)), and one 
# can want to see each value of this variable, and not to group it in bins.
ggplot(df, aes(x = factor(V1))) +  
  geom_bar(aes(y = (..count..)/sum(..count..)))

在分类变量图表中显示％而不是计数

9 个答案: