从分箱数据生成直方图和密度图

时间:2015-04-29 16:07:41

标签: r ggplot2 histogram density-plot

我已经对一些数据进行了分类,目前有一个数据框由两列组成,一列指定一个bin范围,另一列指定频率,如下所示: -

> head(data)
      binRange Frequency
1    (0,0.025]        88
2 (0.025,0.05]        72
3 (0.05,0.075]        92
4  (0.075,0.1]        38
5  (0.1,0.125]        20
6 (0.125,0.15]        16

我想使用此绘制直方图和密度图,但我似乎无法找到这样做而无需生成新的bin等。使用此解决方案here我尝试执行以下操作: -

p <- ggplot(data, aes(x= binRange, y=Frequency)) + geom_histogram(stat="identity")

但它崩溃了。有谁知道如何处理这个?

谢谢

1 个答案:

答案 0 :(得分:3)

问题是ggplot不像你输入它那样理解数据,你需要像这样重塑它(我不是一个正则表达式的主人,所以肯定还有更好的方法):

df <- read.table(header = TRUE, text = "
                 binRange Frequency
1    (0,0.025]        88
2 (0.025,0.05]        72
3 (0.05,0.075]        92
4  (0.075,0.1]        38
5  (0.1,0.125]        20
6 (0.125,0.15]        16")

library(stringr)
library(splitstackshape)
library(ggplot2)
# extract the numbers out,
df$binRange <- str_extract(df$binRange, "[0-9].*[0-9]+")

# split the data using the , into to columns:
# one for the start-point and one for the end-point
df <- cSplit(df, "binRange")

# plot it, you actually dont need the second column
ggplot(df, aes(x = binRange_1, y = Frequency, width = 0.025)) +
    geom_bar(stat = "identity", breaks=seq(0,0.125, by=0.025))

或者如果您不希望数字以数字方式解释,您只需执行以下操作:

df <- read.table(header = TRUE, text = "
                 binRange Frequency
1    (0,0.025]        88
2 (0.025,0.05]        72
3 (0.05,0.075]        92
4  (0.075,0.1]        38
5  (0.1,0.125]        20
6 (0.125,0.15]        16")

library(ggplot2)
ggplot(df, aes(x = binRange, y = Frequency)) + geom_bar(stat = "identity")

你将无法用你的数据绘制密度图,因为它不是连续的而是相当明确的,这就是为什么我实际上更喜欢第二种显示方式,