在比较R中的2个不同数据集时,查找具有相同值的点的最近邻居

时间:2016-02-04 14:41:02

标签: r nearest-neighbor spatstat

我有2个数据帧(df1和df2),由三列组成; x坐标,y坐标,类别(5级A-E)。所以我基本上有两组点数据,每个点都被分配到一个类别

e.g。

X    Y    Cat
1    1.5  A
2    1.5  B
3.3  1.9  C

...等 (虽然我的两个数据框都有100个点)

我想从第二个数据帧(df2)中找到第一个数据帧(df1)中每个点的相同类别的最近邻居。

我在包spatstat中使用了nncross,用df2找到df1中每个点的最近邻居,然后列出每个距离,如下所示;

# Convert the dataframes to ppp objects

df1.ppp <- ppp(df1$X,df1$Y,c(0,10),c(0,10),marks=df1$Cat)
df2.ppp <- ppp(df2$X,df2$Y,c(0,10),c(0,10),marks=df2$Cat)

# Produce anfrom output that lists the distance from each point in df1 to its nearest neighbour in df2

out<-nncross(X=df1.ppp,Y=df2.ppp,what=c("dist","which"))

但我正在努力弄清楚如何使用存储在ppp对象中的类别标签(由标记定义)来查找同一类别中最近的邻居。我相信它应该是相当直接的,但如果有人有任何建议或任何替代方法来实现相同的结果,我将非常感激。

2 个答案:

答案 0 :(得分:0)

首先使用一些人工数据:

# Separate patterns for each type:
X1list <- split(X1)
X2list <- split(X2)

# For each point in X1 find nearest neighbour of same type in X2:
out <- list()
for(i in 1:5){
  out[[i]] <- nncross(X1list[[i]], X2list[[i]], what=c("dist","which"))
}

然后是一个简单的解决方案(但它丢失了id信息):

# Make separate marks for pattern 1 and 2 and collect into one pattern
marks(X1) <- factor(paste0(marks(X1), "1"))
marks(X2) <- factor(paste0(marks(X2), "2"))
X <- superimpose(X1, X2)

# For each point get the nearest neighbour of each type from both X1 and X2
# (both dist and index)
nnd <- nndist(X, by = marks(X))
nnw <- nnwhich(X, by = marks(X))

# Type to look for. I.e. the mark with 1 and 2 swapped
# (with 0 as intermediate step)
type <- marks(X)
type <- gsub("1", "0", type)
type <- gsub("2", "1", type)
type <- gsub("0", "2", type)

# Result
rslt <- cbind(as.data.frame(X), dist = 0, which = 0)
for(i in 1:nrow(rslt)){
  rslt$dist[i] <- nnd[i, type[i]]
  rslt$which[i] <- nnw[i, type[i]]
}

# Separate results
rslt1 <- rslt[1:npoints(X1),]
rslt2 <- rslt[npoints(X1) + 1:npoints(X2),]
rslt1$which <- rslt1$which - npoints(X1)

最后,一个丑陋的解决方案,它恢复了邻居的身份:

{{1}}

答案 1 :(得分:0)

我还有另外一个方法来解决这个问题,但是通过使用包geosphere从原始数据框创建一个距离矩阵,找到了解决这个问题的简单方法。

before_validation