我已经对一些数据进行了分类,目前有一个数据框由两列组成,一列指定一个bin范围,另一列指定频率,如下所示: -
> head(data)
binRange Frequency
1 (0,0.025] 88
2 (0.025,0.05] 72
3 (0.05,0.075] 92
4 (0.075,0.1] 38
5 (0.1,0.125] 20
6 (0.125,0.15] 16
我想使用此绘制直方图和密度图,但我似乎无法找到这样做而无需生成新的bin等。使用此解决方案here我尝试执行以下操作: -
p <- ggplot(data, aes(x= binRange, y=Frequency)) + geom_histogram(stat="identity")
但它崩溃了。有谁知道如何处理这个?
谢谢
答案 0 :(得分:3)
问题是ggplot不像你输入它那样理解数据,你需要像这样重塑它(我不是一个正则表达式的主人,所以肯定还有更好的方法):
df <- read.table(header = TRUE, text = "
binRange Frequency
1 (0,0.025] 88
2 (0.025,0.05] 72
3 (0.05,0.075] 92
4 (0.075,0.1] 38
5 (0.1,0.125] 20
6 (0.125,0.15] 16")
library(stringr)
library(splitstackshape)
library(ggplot2)
# extract the numbers out,
df$binRange <- str_extract(df$binRange, "[0-9].*[0-9]+")
# split the data using the , into to columns:
# one for the start-point and one for the end-point
df <- cSplit(df, "binRange")
# plot it, you actually dont need the second column
ggplot(df, aes(x = binRange_1, y = Frequency, width = 0.025)) +
geom_bar(stat = "identity", breaks=seq(0,0.125, by=0.025))
或者如果您不希望数字以数字方式解释,您只需执行以下操作:
df <- read.table(header = TRUE, text = "
binRange Frequency
1 (0,0.025] 88
2 (0.025,0.05] 72
3 (0.05,0.075] 92
4 (0.075,0.1] 38
5 (0.1,0.125] 20
6 (0.125,0.15] 16")
library(ggplot2)
ggplot(df, aes(x = binRange, y = Frequency)) + geom_bar(stat = "identity")
你将无法用你的数据绘制密度图,因为它不是连续的而是相当明确的,这就是为什么我实际上更喜欢第二种显示方式,