如何从R

时间:2018-09-18 18:05:04

标签: r density-plot

不确定如何解决此问题-我有一个数据分布,其中基于标准差的数据选择不包括所有数据点(数据在一端比另一端更具可变性)。但是,在绘制密度图时,我可以看到第8个蓝色环以外的所有数据都是我要选择的。

示例代码:

x <- sort(rnorm(1300, mean = 0, sd = 1))
y <- rnorm(1300, mean = 0, sd = 1)
x <- c(x, rnorm(300, mean = 4, sd = 2), rnorm(600, mean = -2, sd = 2))
y <- c(y, rnorm(300, mean = 3, sd = 4), rnorm(600, mean = -2, sd = 2))

mydata <- data.frame(x,y)

ggplot(data = mydata, aes(x = x, y = y)) +
  geom_point(cex = 0.5) +
  geom_density_2d()

1 个答案:

答案 0 :(得分:2)

我改编自http://slowkow.com/notes/ggplot2-color-by-density/。 在内部,geom_density_2d使用了MASS::kde2d函数,因此我们还可以将其应用于基础数据以按密度进行子集化。

set.seed(42)
x <- sort(rnorm(1300, mean = 0, sd = 1))
y <- rnorm(1300, mean = 0, sd = 1)
x <- c(x, rnorm(300, mean = 4, sd = 2), rnorm(600, mean = -2, sd = 2))
y <- c(y, rnorm(300, mean = 3, sd = 4), rnorm(600, mean = -2, sd = 2))

mydata <- data.frame(x,y) 

# Copied from http://slowkow.com/notes/ggplot2-color-by-density/
get_density <- function(x, y, n = 100) {
  dens <- MASS::kde2d(x = x, y = y, n = n)
  ix <- findInterval(x, dens$x)
  iy <- findInterval(y, dens$y)
  ii <- cbind(ix, iy)
  return(dens$z[ii])
}
mydata$density <- get_density(mydata$x, mydata$y)

基于任意轮廓选择点

编辑:已更改为允许基于轮廓级别进行选择

# First create plot with geom_density
gg <- ggplot(data = mydata, aes(x = x, y = y)) +
  geom_point(cex = 0.5) +
  geom_density_2d(size = 1, n = 100)
gg

# Extract levels denoted by contours by going into the 
#   ggplot build object. I found these coordinates by 
#   examining the object in RStudio; Note, the coordinates 
#   would change if the layer order were altered.
gb <- ggplot_build(gg)
contour_levels <- unique(gb[["data"]][[2]][["level"]])
# contour_levels
# [1] 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08

# Add layer that relies on given contour level
gg2 <- gg +
  geom_point(data = mydata %>% 
               filter(density <= contour_levels[1]), 
             color = "red", size = 0.5)
gg2

enter image description here