我需要确定位于给定矩形内的一组xy坐标的分数。该矩形定义为边距离坐标系边缘给定距离的区域(在这种情况下,坐标系大致为(-50,-20),( - 50,20),(50,20) ),(50,-20)。另外,我希望能够测试距离边缘不同距离的矩形的结果。我的方法如下:
# set initial limits to the coordinate system
lim.xleft = -50
lim.xright = 50
lim.ybottom = -20
lim.ytop = 20
frac.near.edge <- function(coord.pairs, tolerance){
# set the coordinates of the rectangle of interest
exclude.xleft = lim.xleft + tolerance
exclude.xright = lim.xright - tolerance
exclude.ybottom = lim.ybottom + tolerance
exclude.ytop = lim.ytop - tolerance
out <- vector()
# loop through the pairs testing whether the point is inside the rectangle or outside
for(i in 1:nrow(coord.pairs)){
if(coord.pairs[i, 1] > exclude.xleft & coord.pairs[i, 1] < exclude.xright & coord.pairs[i, 2] > exclude.ybottom & coord.pairs[i, 2] < exclude.ytop){
out[i] <- "in"
} else {
out[i] <- "out"
}
}
# return how many points were inside the rectangle and how many were outside
return(table(out))
}
# try it out on something much bigger!
foo <- data.fram(x = runif(100), y = runif(100))
system.time(frac.near.edge(foo, tolerance = 5))
对于大型数据集,这是非常慢(我的包含大约10 ^ 5 xy对)。我怎样才能加快速度?循环的方式?
答案 0 :(得分:1)
这可能更适合SE Code Review(https://codereview.stackexchange.com/questions/tagged/r)。我不知道这段编码是否有用,但你的问题实际上不是关于编程它的代码改进。我还生成了一个更好的数据集,因为你们都生成了响应。
foo <- data.frame(x = sample(-100:100, 100, replace=TRUE),
y = sample(-100:100, 100, replace=TRUE))
xleft = -50
xright = 50
ybottom = -2
ytop = 20
foo$x >= xleft & foo$x <= xright & foo$y >= ybottom & foo$y <= ytop
答案 1 :(得分:1)
exclude.xleft = lim.xleft + tolerance
exclude.xright = lim.xright - tolerance
exclude.ybottom = lim.ybottom + tolerance
exclude.ytop = lim.ytop - tolerance
out <- c("out", "in")[1+( findInterval(coord.pairs[ , 1], c(exclude.xleft, exclude.xright))==1 &
findInterval(coord.pairs[ , 2], c(exclude.ybottom, exclude.ytop))==1)]
对于50K的测试用例,您的方法需要0.01秒而不是19秒:
coord.pairs<- cbind(rnorm(50000, 0,50), rnorm(1000,0,20)); tolerance=10