Question

我有两个数据框：

points包含一系列带有x, y坐标的点。
poly包含两个多边形的坐标（实际上我有100多个，但在这里保持简单）。

我希望能够在数据框points中添加一个名为Area的额外列，其中包含该点所在多边形的名称。

poly <- data.frame(
pol= c("P1", "P1","P1","P1","P1","P2","P2","P2","P2", "P2"),
x=c(4360, 7273, 7759, 4440, 4360, 8720,11959, 11440,8200, 8720),
y=c(1009, 9900,28559,28430,1009,9870,9740,28500,28040,9870))

points <- data.frame(
       object = c("P1", "P1","P1","P2","P2","P2"),
       timestamp= c(1485670023468,1485670023970, 1485670024565, 1485670025756,1485670045062, 1485670047366),
       x=c(6000, 6000, 6050, 10000, 10300, 8000),
       y=c(10000, 20000,2000,5000,20000,2000))

plot(poly$x, poly$y, type = 'l')
text(points$x, points$y, labels=points$object )

所以基本上在这个例子中前两行应该有Area= "P1"，而最后一个点应该是空白的，因为该点不包含在任何多边形中。

我尝试使用函数in.out，但我们已经无法按照我的描述构建数据框了。

非常感谢任何帮助！

Answer 1

虽然这是使用for循环，但它实际上非常快。

library(mgcv)

x <- split(poly$x, poly$pol)
y <- split(poly$y, poly$pol)

todo <- 1:nrow(points)
Area <- rep.int("", nrow(points))
pol <- names(x)

# loop through polygons
for (i in 1:length(x)) {
  # the vertices of i-th polygon
  bnd <- cbind(x[[i]], y[[i]])
  # points to allocate
  xy <- with(points, cbind(x[todo], y[todo]))
  inbnd <- in.out(bnd, xy)
  # allocation
  Area[todo[inbnd]] <- pol[i]
  # update 'todo'
  todo <- todo[!inbnd]
  }

points$Area <- Area

效率的两个原因：

for循环是通过多边形，而不是点。因此，如果要分配100个多边形和100000个点，则循环只有100次迭代而不是100000次。在每次迭代中，C函数in.out的矢量化能力被利用;
它以渐进的方式运作。一旦分配了一个点，它将在以后被排除在分配之外。 todo变量控制通过循环分配的点。随着时间的推移，工作集正在减少。

Answer 2

您可以使用包point.in.polygon中的sp函数：

points$Area = apply(points, 1, function(p)ifelse(point.in.polygon(p[3], p[4], poly$x[which(poly$pol==p[1])], poly$y[which(poly$pol==p[1])]), p[1], NA))

给你

  object   timestamp     x     y Area
1     P1 1.48567e+12  6000 10000   P1
2     P1 1.48567e+12  6000 20000   P1
3     P1 1.48567e+12  6050  2000 <NA>
4     P2 1.48567e+12 10000  5000 <NA>
5     P2 1.48567e+12 10300 20000   P2
6     P2 1.48567e+12  8000  2000 <NA>

将多边形分配给R数据帧中的数据点

2 个答案: