我使用ggplot
/ easyGgplot2
创建两组密度图。我想要一个度量或指示两条曲线之间有多少交叉点。我甚至可以使用没有曲线的任何其他解决方案,只要它允许我测量哪些组更明显(几个不同的数据组)。
在R中有没有简单的方法呢?
例如,使用此示例生成此图
我如何估算两者共有的面积百分比?
ggplot2.density(data=weight, xName='weight', groupName='sex',
legendPosition="top",
alpha=0.5, fillGroupDensity=TRUE )
答案 0 :(得分:4)
首先,制作一些数据。在这里,我们将从内置的iris
数据集中查看两种植物的花瓣宽度。
## Some sample data from iris
dat <- droplevels(with(iris, iris[Species %in% c("versicolor", "virginica"), ]))
## make a similar graph
library(ggplot2)
ggplot(dat, aes(Petal.Width, fill=Species)) +
geom_density(alpha=0.5)
要查找交叉区域,可以使用approxfun
来近似描述重叠的函数。然后,将它整合到获得区域。由于这些是密度曲线,因此它们的面积为1(ish),因此积分将是重叠百分比。
## Get density curves for each species
ps <- lapply(split(dat, dat$Species), function(x) {
dens <- density(x$Petal.Width)
data.frame(x=dens$x, y=dens$y)
})
## Approximate the functions and find intersection
fs <- sapply(ps, function(x) approxfun(x$x, x$y, yleft=0, yright=0))
f <- function(x) fs[[1]](x) - fs[[2]](x) # function to minimize (difference b/w curves)
meet <- uniroot(f, interval=c(1, 2))$root # intersection of the two curves
## Find overlapping x, y values
ps1 <- is.na(cut(ps[[1]]$x, c(-Inf, meet)))
ps2 <- is.na(cut(ps[[2]]$x, c(Inf, meet)))
shared <- rbind(ps[[1]][ps1,], ps[[2]][ps2,])
## Approximate function of intersection
f <- with(shared, approxfun(x, y, yleft=0, yright=0))
## have a look
xs <- seq(0, 3, len=1000)
plot(xs, f(xs), type="l", col="blue", ylim=c(0, 2))
points(ps[[1]], col="red", type="l", lty=2, lwd=2)
points(ps[[2]], col="blue", type="l", lty=2, lwd=2)
polygon(c(xs, rev(xs)), y=c(f(xs), rep(0, length(xs))), col="orange", density=40)
## Integrate it to get the value
integrate(f, lower=0, upper=3)$value
# [1] 0.1548127
答案 1 :(得分:2)
我喜欢上一个答案,但这可能更直观一些,我也确保使用通用带宽:
library ( "caTools" )
# Extract common bandwidth
Bw <- ( density ( iris$Petal.Width ))$bw
# Get iris data
Sample <- with ( iris, split ( Petal.Width, Species ))[ 2:3 ]
# Estimate kernel densities using common bandwidth
Densities <- lapply ( Sample, density,
bw = bw,
n = 512,
from = -1,
to = 3 )
# Plot
plot( Densities [[ 1 ]], xlim = c ( -1, 3 ),
col = "steelblue",
main = "" )
lines ( Densities [[ 2 ]], col = "orange" )
# Overlap
X <- Densities [[ 1 ]]$x
Y1 <- Densities [[ 1 ]]$y
Y2 <- Densities [[ 2 ]]$y
Overlap <- pmin ( Y1, Y2 )
polygon ( c ( X, X [ 1 ]), c ( Overlap, Overlap [ 1 ]),
lwd = 2, col = "hotpink", border = "n", density = 20)
# Integrate
Total <- trapz ( X, Y1 ) + trapz ( X, Y2 )
(Surface <- trapz ( X, Overlap ) / Total)
SText <- paste ( sprintf ( "%.3f", 100*Surface ), "%" )
text ( X [ which.max ( Overlap )], 1.2 * max ( Overlap ), SText )