Question

不知何故，我在过滤专利数据时心理困难。所以想象你有：

expl <- data.frame(PatNr=c(1,1,1,2,2,2,2,2), Country=c("AZ","AZ","PE","AZ","PS","HQ","HQ","PV"))

#>   PatNr    Country
#> 1       1        AZ
#> 2       1        AZ
#> 3       1        PE
#> 4       2        AZ
#> 5       2        PS
#> 6       2        HQ
#> 7       2        HQ
#> 8       2        PV

我想要的是只在我的data.frame中包含AZ和PS的那些PatNr个案。可以删除所有其他PatNr案例。所以在给定的例子中，我希望脚本删除所有PatNr = 1行并保持PatNr = 2行。

在这种情况下将行子集化为两行将是棘手的，因为实际数据附加了九个更重要的变量，每行不同。

Answer 1

使用dplyr：

library(dplyr)


expl2 <- expl %>% 
  group_by(PatNr) %>% 
  filter(all(c("AZ","PS") %in% Country)) 
expl2

Answer 2

使用基础R

res <- lapply(split(expl, expl$PatNr), lvls = c("AZ", "PS"), function(y, lvls)     { 
   y[all(lvls %in% y$Country)]
})
do.call(rbind, res)
    PatNr Country
2.4     2      AZ
2.5     2      PS
2.6     2      HQ
2.7     2      HQ
2.8     2      PV

Answer 3

这是一个混乱的for循环，可以解决这个问题：我确信有更好的方法，但这应该有效

todelete=numeric(0)
for(i in unique(expl$PatNr)){
  countries = as.character(unique(expl$Country[expl$PatNr==i]))
  if(!all(c("AZ", "PS") %in% countries)){
    todelete=c(todelete, i)
  }
}


expl[!expl$PatNr %in% todelete,]

过滤单个data.frame中的匹配行

3 个答案: