Question

我在R中使用密度函数，然后根据获得的密度计算一些结果。之后，我使用ggplot2显示相同数据的PDF。

然而，结果与相应图中显示的结果略有不同 - 通过直接绘制密度输出（使用plot {graphics}）确认。

知道为什么吗？我怎样才能纠正它，所以结果和图（来自ggplot2）是否与完全相同的数据相匹配？

此示例（代码和图像）：

srcdata = data.frame("Value" = c(4.6228, 1.7942, 4.2738, 2.1502, 2.2665, 5.1717, 4.1015, 2.5126, 4.4270, 4.4729, 2.5112, 2.3493, 2.2787, 2.0114, 4.6931, 4.6582, 3.3162, 2.2995, 4.3954, 1.8488), "Type" = c("Positive", "Negative", "Positive", "Negative", "Negative", "Positive", "Positive", "Negative", "Positive", "Positive", "Negative", "Negative", "Negative", "Negative", "Positive", "Positive", "Positive", "Negative", "Positive", "Negative"))

bwidth <- ( density ( srcdata$Value ))$bw

sample <- split ( srcdata$Value, srcdata$Type )[ 1:2 ]

xmin = min(srcdata$Value) - 0.2 * abs(min(srcdata$Value))
xmax = max(srcdata$Value) + 0.2 * abs(max(srcdata$Value))

densities <- lapply ( sample, density, bw = bwidth, n = 512, from = xmin, to = xmax )

#plotting densities result
plot( densities [[ 1 ]], xlim = c(xmin,xmax), col = "steelblue", main = "" )
lines ( densities [[ 2 ]], col = "orange" )

#plot using ggplot2
ggplot(data = srcdata, aes(x=Value)) + geom_density(aes(group=Type, colour=Type)) + xlim(xmin, xmax)

#or with ggplot2 (using easyGgplot2)
ggplot2.density(data=srcdata, xName='Value', groupName='Type', alpha=0.5, xlim=c(xmin,xmax))

图片：

Answer 1

当前评论正确地确定您使用两个不同的带宽来计算两个图中的密度：plot()图表使用您指定为带宽的bwidth和ggplot()图使用默认带宽。理想情况下，您会将bwidth传递给ggplot图表并解决所有问题，但是围绕SO问题here的评论表明您无法将带宽参数传递给stat_density或geom_density。

在两个图中获得相同输出的最简单方法是让density()确定手动密度计算（下面）和ggplot图中的最佳带宽（使用您已有的相同代码））

densities <- lapply ( sample, density, n = 512, from = xmin, to = xmax )

或者，geom / stat_density中使用的实际binwidth是预先确定的binwidth乘以adjust参数（density documentation），因此您可以在adjust中指定stat_density值（{{ 3}}）尝试调整ggplot binwidth以匹配您的bwidth变量。我发现4.5的调整值给出了与计算密度一起生成的原始图形的类似（但不是精确）版本：

ggplot(data = srcdata, aes(x=Value)) + 
    geom_density(aes(group=Type, colour=Type), adjust = 4.5) +
    xlim(xmin, xmax)

Adjusted ggplot density graph

修改如果您想要专门调整ggplot图表，以便在密度平滑中使用bwidth变量作为binwidth，您可能会发现此问题的答案很有用：stat_density documentation

R绘图密度ggplot vs plot

1 个答案: