找到重复行中的最高值 - R.

时间:2015-08-08 16:30:19

标签: r

我有一张地方,城市和房源数量表。一些地方 - 城市配对是错误的,那里有垃圾数据。有一个简单的算法来识别虚拟行:

  • 如果地点是重复;真正的城市是列表数量最多的城市。休息是假的

我想要一个干净的数据框,删除虚拟行。

以下示例数据:

Locality <- c("Aloc", "Bloc", "Cloc", "Dloc", "Aloc", "ALoc", "Bloc", "Bloc", "Bloc", "Cloc",  "Dloc",  "Dloc")

City <- c("A","B","C","D","B","C","A","C","D","D","A","B")

Listings <- c(25,100,150,30,2,1,2,3,2,1,1,1)

l <- data.frame(Locality=Locality, City = City,Listings=Listings )

我想要的结果是:

enter image description here

1 个答案:

答案 0 :(得分:1)

dups <- anyDuplicated(l$Locality)

while (dups != 0){
    target <- which(l$Locality == l$Locality[dups])
    if (l$Listings[target[1]] >= l$Listings[dups]){
        l <- l[-dups, ]
    } else {
        l <- l[-target, ]
    }
    dups <- anyDuplicated(l$Locality)
}

产量

> l
  Locality City Listings
1     Aloc    A       25
2     Bloc    B      100
3     Cloc    C      150
4     Dloc    D       30
6     ALoc    C        1