我有一张地方,城市和房源数量表。一些地方 - 城市配对是错误的,那里有垃圾数据。有一个简单的算法来识别虚拟行:
我想要一个干净的数据框,删除虚拟行。
以下示例数据:
Locality <- c("Aloc", "Bloc", "Cloc", "Dloc", "Aloc", "ALoc", "Bloc", "Bloc", "Bloc", "Cloc", "Dloc", "Dloc")
City <- c("A","B","C","D","B","C","A","C","D","D","A","B")
Listings <- c(25,100,150,30,2,1,2,3,2,1,1,1)
l <- data.frame(Locality=Locality, City = City,Listings=Listings )
我想要的结果是:
答案 0 :(得分:1)
dups <- anyDuplicated(l$Locality)
while (dups != 0){
target <- which(l$Locality == l$Locality[dups])
if (l$Listings[target[1]] >= l$Listings[dups]){
l <- l[-dups, ]
} else {
l <- l[-target, ]
}
dups <- anyDuplicated(l$Locality)
}
产量
> l
Locality City Listings
1 Aloc A 25
2 Bloc B 100
3 Cloc C 150
4 Dloc D 30
6 ALoc C 1