我有四个数据集,我想在同一个图上绘制数据的直方图。我已将所有数据放入一个数据框中。我也可以在一个图上绘制直方图。但是,我无法绘制百分比而非计数。当我使用下面的代码时,它将百分比绘制为所有计数的总和,但我希望百分比相对于每个数据集。这可能吗?
all <- rbind(data.frame(fill = "A", Events = A$Events),
data.frame(fill = "B", Events = B$Events),
data.frame(fill = "C", Events = C$Events),
data.frame(fill = "D", Events = D$Events)
ggplot(all,aes(x=Events, fill = fill)) +
geom_histogram(aes(y = ..count../sum(..count..)), position = 'dodge')
修改
以下是一些示例数据:
fill Events
1 A 1
2 A 1
3 A 3
4 A 1
5 A 1
6 A 6
7 A 2
8 A 1
9 A 1
10 A 2
11 A 1
12 A 1
13 A 1
14 A 1
15 A 5
16 A 1
17 A 2
18 A 2
19 A 1
20 A 1
21 A 1
22 A 1
23 A 2
24 A 1
25 A 2
26 A 1
27 B 2
28 B 3
29 B 1
30 B 3
31 B 2
32 B 5
33 B 1
34 B 1
35 B 1
36 B 2
37 B 1
38 B 1
39 B 1
40 B 1
41 B 1
42 B 1
43 B 1
44 B 1
45 B 1
46 B 4
47 B 3
48 B 3
49 B 5
50 B 3
51 C 1
52 C 2
53 C 2
54 C 3
55 C 3
56 C 9
57 C 8
58 C 1
59 C 1
60 C 2
61 C 2
62 C 1
63 C 2
64 C 39
65 C 43
66 C 194
67 C 129
68 C 186
69 C 1
70 C 2
71 C 7
72 C 4
73 C 1
74 D 12
75 D 3
76 D 2
77 D 3
78 D 8
79 D 20
80 D 5
81 D 1
82 D 4
83 D 9
84 D 51
85 D 12
86 D 7
87 D 6
88 D 7
89 D 7
90 D 9
91 D 17
92 D 18
93 D 8
94 D 7
95 D 6
96 D 10
97 D 27
98 D 11
99 D 21
100 D 89
101 D 47
102 D 1
答案 0 :(得分:1)
您很接近,但需要使用(..density..)*binwidth
而不是..count../sum(..count..)
。
# Your data:
all <- data.frame(fill=rep(LETTERS[1:4],c(26,24,23,29)),
Events=c(1,1,3,1,1,6,2,1,1,2,1,1,1,1,5,1,2,2,1,1,1,1,2,1,2,1,2,3,1,3,2,5,1,1,1,2,1,1,1,1,1,1,1,1,1,4,3,3,5,3,1,2,2,3,3,9,8,1,1,2,2,1,2,39,43,194,129,186,1,2,7,4,1,12,3,2,3,8,20,5,1,4,9,51,12,7,6,7,7,9,17,18,8,7,6,10,27,11,21,89,47,1))
bw <- 20 # set the binwidth
# plot
p1<-ggplot(all,aes(x=Events, fill=fill)) +
geom_histogram(aes(y=(..density..)*bw), position='dodge', binwidth=bw)
p1
这是检查以确保值添加到1:
aggregate(ymax ~ group, data = as.data.frame(print(p1)$data[[1]]), FUN = sum)
# group ymax
#1 1 1
#2 2 1
#3 3 1
#4 4 1
旧答案
以下是一个例子:
library(ggplot2)
ggplot(mtcars,aes(x=mpg, fill = as.factor(cyl))) +
geom_histogram(aes(y = ..density..), position = 'dodge', binwidth=5)
作为检查,将binwidth调整为100,每列的值为0.01(100%/ 100 = 0.01)。
(编辑)这是另一个例子,使用过度简化的数据集来突出显示结果:
library(data.table)
# Calculate the average miles per gallon by number of cylinders
mtcars_avg <- as.data.table(mtcars)[,
list(mpg_avg=mean(mpg)),
by=list(cyl=as.factor(cyl))][order(cyl)][order(cyl)]
mtcars_avg
# cyl mpg_avg
#1: 4 26.66364
#2: 6 19.74286
#3: 8 15.10000
# OP version, with unwanted results of 33% per color (cyl)
ggplot(mtcars_avg, aes(x=mpg_avg, fill=cyl)) +
geom_histogram(aes(y = ..count../sum(..count..)), position = 'dodge', binwidth=1)
# ..density.. version, which shows the desired results of 100% per color (cyl)
ggplot(mtcars_avg, aes(x=mpg_avg, fill=cyl)) +
geom_histogram(aes(y = ..density..), position = 'dodge', binwidth=1)
您可能还想考虑使用geom_density
:
ggplot(mtcars,aes(x=mpg, fill = as.factor(cyl))) + geom_density(alpha=0.5)
答案 1 :(得分:-1)
此binwidth是必需的,因为根据定义,整数的总和为1。
基本上,x
的binwidth的增加导致y
的变化为1/x
-binwidth较大,因此必须降低高度才能获得相同的面积。
因此,要恢复百分比,您必须通过将y
乘以bw
来更正此百分比。
一个简单的例子,想像一下:
..density..
”代码将为您提供百分比,因为bw * p = 1
即1*1=1
bw
更改为2,则“ ..density..
”代码将在y轴上为您提供:bw * y = 1 => y = 1/bw = 0.5
y
轴上的百分比,您必须乘以bw