为了解决标签迁移问题,我必须在两个字符列之间进行比较,并评估两个列之间是否存在重合。
总而言之,给定这样的数据框:
old_tags new_tags
burger burger, american
italian, pizza italian
latin, peruvian peruvian, latin
french pizza
我想添加第三列,像这样:
old_tags new_tags match
burger burger, american TRUE
italian, pizza italian TRUE
latin, peruvian peruvian, latin TRUE
french pizza FALSE
直到现在,我还没有尝试使用str_match
,str_detect
等功能。在比较实际上应为FALSE
的成对字符串时,通常会返回我TRUE
,例如我在[3,]
中输入的示例。
非常感谢。
答案 0 :(得分:2)
一种基本的R方法可能是用逗号分割字符串。如果存在至少一个相交的值,请使用Map
查找相交的单词并创建一个逻辑值。
df$match <- lengths(Map(intersect, strsplit(df$old_tags, ", "),
strsplit(df$new_tags, ", "))) > 0
df
# old_tags new_tags match
#1 burger burger, american TRUE
#2 italian, pizza italian TRUE
#3 latin, peruvian peruvian, latin TRUE
#4 french pizza FALSE
数据
df <- structure(list(old_tags = c("burger", "italian, pizza", "latin, peruvian",
"french"), new_tags = c("burger, american", "italian", "peruvian, latin",
"pizza")), row.names = c(NA, -4L), class = "data.frame")
答案 1 :(得分:1)
tidyverse
-base
的可能性:
library(dplyr)
library(stringr)
df %>%
mutate(patterns = map_chr(strsplit(old_tags, ", "),paste,collapse="|"),
Match = str_detect(new_tags, patterns)) %>%
select(-patterns)
old_tags new_tags Match
1 burger burger, american TRUE
2 italian, pizza italian TRUE
3 latin, peruvian peruvian, latin TRUE
4 french pizza FALSE
答案 2 :(得分:0)
或者我们可以用str_extract
做any
library(tidyverse)
df %>%
mutate(match = map2_lgl(str_extract_all(old_tags, "\\w+"),
str_extract_all(new_tags, "\\w+"), ~ any(.x %in% .y)))
# old_tags new_tags match
#1 burger burger, american TRUE
#2 italian, pizza italian TRUE
#3 latin, peruvian peruvian, latin TRUE
#4 french pizza FALSE
df <- structure(list(old_tags = c("burger", "italian, pizza", "latin, peruvian",
"french"), new_tags = c("burger, american", "italian", "peruvian, latin",
"pizza")), row.names = c(NA, -4L), class = "data.frame")