Question

我有两个数据集，我试图根据这两个数据集中两个字段的部分匹配来创建一个数据集。

我正在使用// File AAA.h #ifndef AAA_H #define AAA_H #include "BBB.h" class AAA { BBB *bbb; /* ... */ }; #else class AAA; // Forward declaration #endif //+++++++++++++++++++++++++++++++++++++++ // File BBB.h #ifndef BBB_H #define BBB_H #include "AAA.h" class BBB { AAA *aaa; /* ... */ }; #else class BBB; // Forward declaration #endif过滤器，需要在每一行中查找部分局部。我尝试使用dplyr，但在table $ col上似乎不起作用。

最小的可复制示例：

str_replace()

下面是我尝试使用str_replace（）

library(dplyr)

id <- c('1','2','3')
code<- c('a1231','b3211','c9871985')

tbl<- data.frame(id,code)

other_cd <- c('a123','b321','c987')
other_cd <- data.frame(other_cd)



match <- tbl %>% dplyr::filter(code %in% other_cd$other_cd) %>%
  dplyr::summarise(count = n_distinct(id))

我希望fuzzy_match <- tbl %>% dplyr::filter(code %in% str_detect(other_cd$other_cd, "^[other_cd$other_cd]")) %>% dplyr::summarise(count = n_distinct(id))包含3行，其中部分项目匹配，因此输出可能类似于：

fuzzy_match

Answer 1

我们可以paste |分隔的“ other_cd”元素匹配任何元素

library(dplyr)
library(stringr)
tbl %>%
     filter(str_detect(code, str_c(other_cd$other_cd, collapse="|"))) %>%
      summarise(count = n_distinct(id))

更新

在更新的帖子中，OP希望从other_cd创建一个新列。在这种情况下，我们可以使用str_extract

tbl %>% 
   mutate(other_cd = str_extract(code, str_c(other_cd$other_cd, collapse="|")))
#   id     code other_cd
#1  1    a1231     a123
#2  2    b3211     b321
#3  3 c9871985     c987

或者如果行数相同

tbl %>% 
    filter(str_detect(code, as.character(other_cd$other_cd)))

我如何使用dplyr在两个R数据集中的部分字段匹配上进行匹配

1 个答案:

更新