从部分匹配变量值的向量返回一个字符串

时间:2018-06-04 18:13:29

标签: r regex string match

我有一个字符串向量:

keywords <- c("kw 1", "kw2", "kw3", "kw4", "kw5", "kw6", "kw7", "kw8", 
              "kw 9 kw", "kw10", "kw11", "kw12", "kw13", "kw14", "kw15")

一个空列关键字的数据框:

df <- data.frame("Description" = c("blabla kw10", "blabla kw15","blabla kw 1", 
                                   "blabla kw13", "blabla kw7", "kw2 bla", "kw8 blabla","bla kw11 bla", 
                                   "blabla kw10","blakw 9 kw", "blablakw4", "blakw 1bla"),
                 "Keyword" = NA)

我需要找到一种方法来查找 keywords vector中的字符串,该字符串与 Description 变量中的值部分匹配,并从 keywords <返回匹配的字符串< / em> vector作为 df 数据框中 Keywords 列的值。

我需要结果

df <- data.frame("Description" = c("blabla kw10", "blabla kw15","blabla kw 1", 
                                   "blabla kw13", "blabla kw7", "kw2 bla", "kw8 blabla","bla kw11 bla", 
                                   "blabla kw10","blakw 9 kw", "blablakw4", "blakw 1bla"),
                 "Keyword" = c("kw10", "kw15", "kw 1", "kw13", "kw7", "kw2", "kw8", "kw11", "kw10", "kw 9 kw", "kw4", "kw 1"))

请你为此提出任何解决方案吗?

已编辑:

关键字2矢量和df2数据框的可重复示例:

keywords2 <- c("cartucho", "MOLDE", "FILTRO", "BOMBA", "MOTOR")

df2 <- data.frame("Description" = c("CULATA PARA MOTOR", "BOMBA CENTRIFUGA PARA LIQUIDOS", 
    " CARTUCHO FILTRANTE", "APARATO FILTRO MONITOR", "MOLDES PARA QUESO", 
    "BOMBA PERISTALTICA", "MOLDE CON TAPA Y DESUERADOR", 
    "APARATO FILTRO DE MEMBRANA", "BOMBA DE VACIO"),
              "Keyword" = NA)

预期结果:

    df2 <- data.frame("Description" = c("CULATA PARA MOTOR", "BOMBA CENTRIFUGA PARA LIQUIDOS", 
" CARTUCHO FILTRANTE", "APARATO FILTRO MONITOR", "MOLDES PARA QUESO", 
"BOMBA PERISTALTICA", "MOLDE CON TAPA Y DESUERADOR", 
"APARATO FILTRO DE MEMBRANA", "BOMBA DE VACIO"),
"Keyword" = c("MOTOR", "BOMBA", "cartucho", "FILTRO", "MOLDE", "BOMBA", "MOLDE", "FILTRO", "BOMBA")

1 个答案:

答案 0 :(得分:1)

我们可以使用str_extract

library(stringr)
df$Keyword <- str_extract(df$Description, paste(keywords, collapse='|'))
df$Keyword
#[1] "kw10"    "kw15"    "kw 1"    "kw13"    "kw7"     "kw2"     "kw8"    
#[8] "kw11"    "kw10"    "kw 9 kw" "kw4"     "kw 1"   

更新

使用新的数据集和关键字,将'keywords2'转换为大写,然后将paste转换为pattern的{​​{1}}

str_extract