Question

我有一个如下所示的数据框：

C <- data.frame(A_Latitude  = c(48.4459, 48.7     , 49.0275, 49.0275,   49.0275, 49.0275,   48.4459),
            A_Longitude = c(9.989    , 8.15   , 8.7539 , 8.7539 ,   8.7539 , 8.7539 , 9.989  ),
            B_Latitude  = c(49.0275, 48.4734,   48.4459, 48.9602,   48.9602, 48.4459,   49.0275),
            B_Longitude = c(8.7539 , 9.227  ,   9.989    , 9.2058 , 9.2058 , 9.989  , 8.7539 ))

数据框由一组两个位置（A + B;即A_Latitude / A_Longitude，B_Latitude / B_Longitude）的纬度/经度坐标组成。

我想删除基于组合集的重复项（即删除位置A /位置B等同于位置B /位置A的行项;即具有A_Latitude / A_Longitude /的行B_Latitude / B_Longitude = B_Latitude / B_Longitude / A_Latitude / A_Longitude。

答案[Finding unique combinations irrespective of position [duplicate]]和[Removing duplicate combinations (irrespective of order)]没有帮助，因为这些解决方案不考虑组合的列组（在考虑全球各地的位置时这些相关（例如，纬度/经度坐标等效）一个地方））。

提前感谢您的帮助。

Answer 1

一个想法是将每个long / lat对视为字符串 - 对每行的两个long / lat对（现在是字符串）进行排序 - 然后对得到的2元素字符串向量进行排序。使用排序的字符串向量来检查重复项

toString(...)

这里是第1行的细分

ans <- C[!duplicated(lapply(1:nrow(C), function(i) sort(c(toString(C[i,1:2]), toString(C[i,3:4]))))), ]
  # A_Latitude A_Longitude B_Latitude B_Longitude
# 1    48.4459      9.9890    49.0275      8.7539
# 2    48.7000      8.1500    48.4734      9.2270
# 4    49.0275      8.7539    48.9602      9.2058

根据组合集

1 个答案: