检测数据集中的外部行

时间:2017-03-01 09:38:28

标签: r

我的数据集包含对象的位置:

so <- data.frame(x = rep(c(1:5), each = 5), y = rep(1:5, 5))
so1 <- so %>% mutate(x = x + 5, y = y +2)
so2 <- rbind(so, so1) %>% mutate(x = x + 13, y = y + 7)
so3 <- so2 %>% mutate(x = x + 10)
ggplot(aes(x = x, y = y), data = rbind(so, so1, so2, so3)) + geom_point()

我想知道的是,如果R中有一个方法可以检测到对象位于数据集的外行,因为我必须从分析中排除这些对象。我想在图片中排除红色对象 enter image description here

到目前为止,我使用了minmaxifelse,但这是一个很好的,我无法创建可以推广到具有不同x和y设计的不同数据集的内容。 有没有package做这件事?或/并且有可能解决这样的问题吗?

1 个答案:

答案 0 :(得分:4)

您可以使用“空间”方法吗? 将您的数据可视化为空间对象,您的问题将变为删除修补程序的边框...

使用包raster可以非常直接地完成此操作:相应地找到boundariesmask数据。

library(dplyr)
library(raster)

# Your reproducible example
myDF = rbind(so,so1,so2,so3)
myDF$z = 1 # there may actually be more 'z' variables

# Rasterize your data
r = rasterFromXYZ(myDF) # if there are more vars, this will be a RasterBrick
par(mfrow=c(2,2))
plot(r, main='Original data')

# Here I artificially add 1 row above and down and 1 column left and right,
# This is a trick needed to make sure to also remove the cells that are
# located at the border of your raster with `boundaries` in the next step.
newextent = extent(r) + c(-res(r)[1], res(r)[1], -res(r)[2], res(r)[2] )
r = extend(r, newextent)
plot(r, main='Artificially extended')
plot(rasterToPoints(r, spatial=T), add=T, col='blue', pch=20, cex=0.3)

# Get the cells to remove, i.e. the boundaries
bounds = boundaries(r[[1]], asNA=T) #[[1]]: in case r is a RasterBrick
plot(bounds, main='Cells to remove (where 1)')
plot(rasterToPoints(bounds, spatial=T), add=T, col='red', pch=20, cex=0.3)

# Then mask your data (i.e. subset to remove boundaries)
subr = mask(r, bounds, maskvalue=1)
plot(subr, main='Resulting data')
plot(rasterToPoints(subr, spatial=T), add=T, col='blue', pch=20, cex=0.3)

# This is your new data (the added NA's are not translated so it's OK)
myDF2 = rasterToPoints(subr)

enter image description here

对你有帮助吗?