过滤组中是否存在某些字符串值

时间:2018-05-09 14:21:30

标签: r filter grouping

这看起来很简单,但不知何故无法弄清楚如何解决这个问题。检测组内是否存在另一列中的两个特定字符串值的最佳方法是什么。

示例df:

library(tidyverse)

tribble(
  ~city, ~var,
  "A", "PVDA",
  "A", "GL",
  "A", "GMBL",
  "B", "GL",
  "B", "VVD",
  "C", "CDA",
  "C", "VVD"
)

我想做的是这样的事情:

join_anp_vgn_sf %>%
  group_by(city) %>%
  filter(grepl("^PVDA$&^GL$", var))

但这不起作用,因为该代码正在查看每个单独的值。

期望的输出:

  city  var  
  <chr> <chr>
1 A     PVDA 
2 A     GL
3 A     GMBL 

3 个答案:

答案 0 :(得分:3)

使用dplyr

df <- tribble(
  ~city, ~var,
  "A", "PVDA",
  "A", "GL",
  "B", "GL",
  "B", "VVD",
  "C", "CDA",
  "C", "VVD"
)

df %>% 
  group_by(city) %>% 
  filter(all(c("PVDA","GL") %in% var))

# A tibble: 2 x 2
# Groups:   city [1]
#   city  var  
#   <chr> <chr>
# 1 A     PVDA 
# 2 A     GL   

修改

使用更新的示例

df <- tribble(
  ~city, ~var,
  "A", "PVDA",
  "A", "GL",
  "A", "GMBL",
  "B", "GL",
  "B", "VVD",
  "C", "CDA",
  "C", "VVD"
)

df %>% 
  group_by(city) %>% 
  filter(all(c("PVDA","GL") %in% var))

# A tibble: 3 x 2
# Groups:   city [1]
#   city  var  
#   <chr> <chr>
# 1 A     PVDA 
# 2 A     GL   
# 3 A     GMBL 

答案 1 :(得分:1)

使用grepl功能查找同时拥有PVDA和PVDA的城市GL值,之后选择原始三角形中的值。

PVDA<-as.character(unlist(df[grepl("^PVDA", df$var),"city"]))
GL<-as.character(unlist(df[grepl("^GL", df$var),"city"]))

df[df$city==PVDA[PVDA %in% GL],]
# A tibble: 2 x 2
  city  var  
  <chr> <chr>
1 A     PVDA 
2 A     GL 

答案 2 :(得分:1)

如果您愿意,仍然可以使用df %>% group_by(city) %>% filter(sum(grepl("PVDA|GL", unique(var))) >= 2) # A tibble: 2 x 2 # Groups: city [1] # city var # <chr> <chr> #1 A PVDA #2 A GL ,这样您就可以使用部分字符串匹配:

<强> Dplyr:

df[ave(df$var, df$city, FUN = function(x) sum(grepl("PVDA|GL", unique(x))) >= 2) %>% as.logical, ]

基地R:

zones