Question

对于某些离散数据，我想要一个很好的密度（总和为1）直方图。我尝试了几种方法来做到这一点，但没有一种方法是完全令人满意的。

生成一些数据：

#data
set.seed(-999)
d.test = data.frame(score = round(rnorm(100,1)))
mean.score = mean(d.test[,1])
d1 = as.data.frame(prop.table(table(d.test)))

第一个给出正确的条形位置 - 以数字为中心 - 但vline()的位置错误。这是因为x轴是离散的（因子），因此使用级别数而不是值来绘制平均值。平均值为.89。

ggplot(data=d1, aes(x=d.test, y=Freq)) +
  geom_bar(stat="identity", width=.5) +
  geom_vline(xintercept=mean.score, color="blue", linetype="dashed")

enter image description here

第二个给出了正确的vline()位置（因为x轴是连续的），但是当x轴是连续的时，错误的条形放置和width参数似乎不可修改（ see here）。我也尝试了size参数也没有效果。同上hjust。

ggplot(d.test, aes(x=score)) +
  geom_histogram(aes(y=..count../sum(..count..)), width=.5) +
  geom_vline(xintercept=mean.score, color="blue", linetype="dashed")

enter image description here

有什么想法吗？我的坏主意是重新调整均值以使其符合因子水平并使用第一种解决方案。如果某些因素水平“缺失”，例如，这将无法正常工作，例如， 1,2,4没有因子3，因为没有数据点具有该值。如果均值为3.5，则重新调整为奇数（x轴不再是interval scale）。

另一个想法是：

ggplot(d.test, aes(x=score)) +
  stat_bin(binwidth=.5, aes(y= ..density../sum(..density..)), hjust=-.5) +
  scale_x_continuous(breaks = -2:5) + #add ticks back
  geom_vline(xintercept=mean.score, color="blue", linetype="dashed")

但这需要调整休息时间，并且条形图仍处于错误位置（不居中）。不幸的是，hjust似乎不起作用。

enter image description here

我如何得到我想要的一切？

密度总和为1
栏位于值
vline()使用正确的号码
宽度= 0.5

使用基本图形，可以通过在x轴上绘制两次来解决此问题。这里有类似的方法吗？

Answer 1

听起来你只是想确保你的x轴值是数字而不是因子

ggplot(data=d1, aes(x=as.numeric(as.character(d.test)), y=Freq)) +
  geom_bar(stat="identity", width=.5) +
  geom_vline(xintercept=mean.score, color="blue", linetype="dashed") + 
  scale_x_continuous(breaks=-2:3)

给出了

enter image description here

ggplot2密度直方图，宽度= .5，vline和居中条位置

1 个答案: