我有一个字符串向量:
keywords <- c("kw 1", "kw2", "kw3", "kw4", "kw5", "kw6", "kw7", "kw8",
"kw 9 kw", "kw10", "kw11", "kw12", "kw13", "kw14", "kw15")
一个空列关键字的数据框:
df <- data.frame("Description" = c("blabla kw10", "blabla kw15","blabla kw 1",
"blabla kw13", "blabla kw7", "kw2 bla", "kw8 blabla","bla kw11 bla",
"blabla kw10","blakw 9 kw", "blablakw4", "blakw 1bla"),
"Keyword" = NA)
我需要找到一种方法来查找 keywords vector中的字符串,该字符串与 Description 变量中的值部分匹配,并从 keywords <返回匹配的字符串< / em> vector作为 df 数据框中 Keywords 列的值。
我需要结果:
df <- data.frame("Description" = c("blabla kw10", "blabla kw15","blabla kw 1",
"blabla kw13", "blabla kw7", "kw2 bla", "kw8 blabla","bla kw11 bla",
"blabla kw10","blakw 9 kw", "blablakw4", "blakw 1bla"),
"Keyword" = c("kw10", "kw15", "kw 1", "kw13", "kw7", "kw2", "kw8", "kw11", "kw10", "kw 9 kw", "kw4", "kw 1"))
请你为此提出任何解决方案吗?
已编辑:
关键字2矢量和df2数据框的可重复示例:
keywords2 <- c("cartucho", "MOLDE", "FILTRO", "BOMBA", "MOTOR")
df2 <- data.frame("Description" = c("CULATA PARA MOTOR", "BOMBA CENTRIFUGA PARA LIQUIDOS",
" CARTUCHO FILTRANTE", "APARATO FILTRO MONITOR", "MOLDES PARA QUESO",
"BOMBA PERISTALTICA", "MOLDE CON TAPA Y DESUERADOR",
"APARATO FILTRO DE MEMBRANA", "BOMBA DE VACIO"),
"Keyword" = NA)
预期结果:
df2 <- data.frame("Description" = c("CULATA PARA MOTOR", "BOMBA CENTRIFUGA PARA LIQUIDOS",
" CARTUCHO FILTRANTE", "APARATO FILTRO MONITOR", "MOLDES PARA QUESO",
"BOMBA PERISTALTICA", "MOLDE CON TAPA Y DESUERADOR",
"APARATO FILTRO DE MEMBRANA", "BOMBA DE VACIO"),
"Keyword" = c("MOTOR", "BOMBA", "cartucho", "FILTRO", "MOLDE", "BOMBA", "MOLDE", "FILTRO", "BOMBA")
答案 0 :(得分:1)
我们可以使用str_extract
library(stringr)
df$Keyword <- str_extract(df$Description, paste(keywords, collapse='|'))
df$Keyword
#[1] "kw10" "kw15" "kw 1" "kw13" "kw7" "kw2" "kw8"
#[8] "kw11" "kw10" "kw 9 kw" "kw4" "kw 1"
使用新的数据集和关键字,将'keywords2'转换为大写,然后将paste
转换为pattern
的{{1}}
str_extract