我正在尝试对密度图的x轴进行对数转换并获得意外结果。没有转换的代码工作正常:
library(ggplot2)
data = data.frame(x=c(1,2,10,11,1000))
dens = density(data$x)
densy = sapply(data$x, function(x) { dens$y[findInterval(x, dens$x)] })
ggplot(data, aes(x = x)) +
geom_density() +
geom_point(y = densy)
如果我添加scale_x_log10()
,我会得到以下结果:
除了已经重新调整的y值之外,x值似乎也发生了一些事情 - 密度函数的峰值并不是点的位置。
我在这里错误地使用了日志转换吗?
答案 0 :(得分:2)
The shape of the density curve changes after the transformation because the distribution of the data has changed and the bandwidths are different. If you set a bandwidth of (bw=1000
) prior to the transformation and 10 afterward, you will get two normal looking densities (with different y-axis values because the support will be much larger in the first case). Here is an example showing how varying bandwidths change the shape of the density.
data = data.frame(x=c(1,2,10,11,1000), y=0)
## Examine how changing bandwidth changes the shape of the curve
par(mfrow=c(2,1))
greys <- colorRampPalette(c("black", "red"))(10)
plot(density(data$x), main="No Transform")
points(data, pch=19)
plot(density(log10(data$x)), ylim=c(0,2), main="Log-transform w/ varying bw")
points(log10(data$x), data$y, pch=19)
for (i in 1:10)
points(density(log10(data$x), bw=0.02*i), col=greys[i], type="l")
legend("topright", paste(0.02*1:10), col=greys, lty=2, cex=0.8)