我有一个大型数据集,其中包含原始值和估算值以及两者之间的比例差异。比例差异的分位数为:
> quantile(p$prdif, probs=c(0, 0.1, 0.2, 0.3, .4, .5,0.6, 0.7, 0.8, 0.9, 1))
0% 10% 20% 30% 40% 50% 60% 70% 80%
-0.99269227 -0.43367924 -0.22983182 -0.07498240 0.06285345 0.20829226 0.39253900 0.65837197 1.18619469
90% 100%
11.25010211 Inf
我用以下命令绘制比例差异的直方图:
# Calculate means
mu <- p %>% filter(orig != 0) %>%
summarise(mu1 = mean(orig), mu2 = mean(imp), mu3 = mean(dif), mu4 = mean(prdif) )
ggplot(p %>% filter(orig != 0), aes(x= prdif) )+
geom_histogram(aes(y=..density..), position="identity", alpha=0.4, fill = 'blue')+
geom_density(alpha=0.6, size = 2)+
geom_vline(data=mu, aes(xintercept=mu4, color= "red" ),
linetype="dashed", size = 1.5)+
labs(title="Differences between imputed and original values",x="Proportional Difference", y = "Density")
结果如下:
为了集中在大多数值为-i.e的区域。 -1,+ 2--我以下列方式使用coord_cartesian函数:
ggplot(p %>% filter(orig != 0), aes(x= prdif) )+
geom_histogram(aes(y=..density..), position="identity", alpha=0.4, fill = 'blue')+
geom_density(alpha=0.6, size = 2)+
geom_vline(data=mu, aes(xintercept=mu4, color= "red" ),
linetype="dashed", size = 1.5)+
labs(title="Differences between imputed and original values",x="Proportional Difference", y = "Density") +
coord_cartesian(xlim = c(-1, 2))
该情节的结果如下:
我无法理解为什么情节是空的。显然,值范围为-1,+ 2。
编辑:
根据以下评论,我将代码过滤值更改为2以上并将容器数量增加到300.代码和输出如下:
ggplot(p %>% filter(orig != 0 & prdif < 2), aes(x= prdif) )+
geom_histogram(aes(y=..density..), position="identity", alpha=0.4, fill = 'blue', bins = 300)+
geom_density(alpha=0.6, size = 2, color = "yellow")+
geom_vline(data=mu, aes(xintercept=mu4), color= "red" ,
linetype="dashed", size = 1.5)+
labs(title="Differences between imputed and original values",x="Proportional Difference", y = "Density") +
coord_cartesian(xlim = c(-1, 2))
再次输出对我来说很奇怪。我希望看到这样的事情:
您的建议将不胜感激。