使用ggplot2绘制具有百分比的多个数据集的直方图

时间:2013-08-30 20:37:40

标签: r ggplot2 histogram percentage

我有四个数据集,我想在同一个图上绘制数据的直方图。我已将所有数据放入一个数据框中。我也可以在一个图上绘制直方图。但是,我无法绘制百分比而非计数。当我使用下面的代码时,它将百分比绘制为所有计数的总和,但我希望百分比相对于每个数据集。这可能吗?

all <- rbind(data.frame(fill = "A", Events = A$Events), 
    data.frame(fill = "B", Events = B$Events), 
    data.frame(fill = "C", Events = C$Events), 
    data.frame(fill = "D", Events = D$Events)
ggplot(all,aes(x=Events, fill = fill)) + 
 geom_histogram(aes(y = ..count../sum(..count..)), position = 'dodge')

修改

以下是一些示例数据:

fill Events  
1   A   1  
2   A   1  
3   A   3  
4   A   1  
5   A   1  
6   A   6  
7   A   2  
8   A   1  
9   A   1  
10  A   2  
11  A   1  
12  A   1  
13  A   1  
14  A   1  
15  A   5  
16  A   1  
17  A   2  
18  A   2  
19  A   1  
20  A   1  
21  A   1  
22  A   1  
23  A   2  
24  A   1  
25  A   2  
26  A   1  
27  B   2  
28  B   3  
29  B   1  
30  B   3  
31  B   2  
32  B   5  
33  B   1  
34  B   1  
35  B   1  
36  B   2  
37  B   1  
38  B   1  
39  B   1  
40  B   1  
41  B   1  
42  B   1  
43  B   1  
44  B   1  
45  B   1  
46  B   4  
47  B   3  
48  B   3  
49  B   5  
50  B   3  
51  C   1  
52  C   2  
53  C   2  
54  C   3  
55  C   3  
56  C   9  
57  C   8  
58  C   1  
59  C   1  
60  C   2  
61  C   2  
62  C   1  
63  C   2  
64  C  39  
65  C  43  
66  C 194  
67  C 129  
68  C 186  
69  C   1  
70  C   2  
71  C   7  
72  C   4  
73  C   1   
74  D  12  
75  D   3  
76  D   2  
77  D   3  
78  D   8  
79  D  20  
80  D   5  
81  D   1  
82  D   4  
83  D   9  
84  D  51  
85  D  12  
86  D   7  
87  D   6  
88  D   7  
89  D   7  
90  D   9  
91  D  17  
92  D  18  
93  D   8  
94  D   7  
95  D   6  
96  D  10  
97  D  27  
98  D  11  
99  D  21  
100 D  89  
101 D  47  
102 D   1  

2 个答案:

答案 0 :(得分:1)

您很接近,但需要使用(..density..)*binwidth而不是..count../sum(..count..)

# Your data:
all <- data.frame(fill=rep(LETTERS[1:4],c(26,24,23,29)),
                  Events=c(1,1,3,1,1,6,2,1,1,2,1,1,1,1,5,1,2,2,1,1,1,1,2,1,2,1,2,3,1,3,2,5,1,1,1,2,1,1,1,1,1,1,1,1,1,4,3,3,5,3,1,2,2,3,3,9,8,1,1,2,2,1,2,39,43,194,129,186,1,2,7,4,1,12,3,2,3,8,20,5,1,4,9,51,12,7,6,7,7,9,17,18,8,7,6,10,27,11,21,89,47,1))

bw <- 20 # set the binwidth

# plot
p1<-ggplot(all,aes(x=Events, fill=fill)) + 
  geom_histogram(aes(y=(..density..)*bw), position='dodge', binwidth=bw)
p1

desired output

这是检查以确保值添加到1:

aggregate(ymax ~ group, data = as.data.frame(print(p1)$data[[1]]), FUN = sum)
#  group ymax
#1     1    1
#2     2    1
#3     3    1
#4     4    1

旧答案

以下是一个例子:

library(ggplot2)

ggplot(mtcars,aes(x=mpg, fill = as.factor(cyl))) +
  geom_histogram(aes(y = ..density..), position = 'dodge', binwidth=5)

作为检查,将binwidth调整为100,每列的值为0.01(100%/ 100 = 0.01)。

编辑)这是另一个例子,使用过度简化的数据集来突出显示结果:

library(data.table)
# Calculate the average miles per gallon by number of cylinders
mtcars_avg <- as.data.table(mtcars)[,
                                    list(mpg_avg=mean(mpg)),
                                    by=list(cyl=as.factor(cyl))][order(cyl)][order(cyl)]
mtcars_avg
#   cyl  mpg_avg
#1:   4 26.66364
#2:   6 19.74286
#3:   8 15.10000

# OP version, with unwanted results of 33% per color (cyl)
ggplot(mtcars_avg, aes(x=mpg_avg, fill=cyl)) +
  geom_histogram(aes(y = ..count../sum(..count..)), position = 'dodge', binwidth=1)

original

# ..density.. version, which shows the desired results of 100% per color (cyl)
ggplot(mtcars_avg, aes(x=mpg_avg, fill=cyl)) +
  geom_histogram(aes(y = ..density..), position = 'dodge', binwidth=1)

solution

您可能还想考虑使用geom_density

ggplot(mtcars,aes(x=mpg, fill = as.factor(cyl))) + geom_density(alpha=0.5)

答案 1 :(得分:-1)

此binwidth是必需的,因为根据定义,整数的总和为1。 基本上,x的binwidth的增加导致y的变化为1/x-binwidth较大,因此必须降低高度才能获得相同的面积。

因此,要恢复百分比,您必须通过将y乘以bw来更正此百分比。

一个简单的例子,想像一下:

  • 基本的“ ..density..”代码将为您提供百分比,因为bw * p = 11*1=1
  • 如果将bw更改为2,则“ ..density..”代码将在y轴上为您提供:bw * y = 1 => y = 1/bw = 0.5
  • 为了获得y轴上的百分比,您必须乘以bw