如何将部分匹配的字符串提取到新列中?

时间:2014-08-26 09:36:43

标签: r

我有这个data.table with strings:

dt = tbl_dt(data.table(x=c("book|ball|apple", "flower|orange|cup", "banana|bandana|pen")))

                   x
1    book|ball|apple
2  flower|orange|cup
3 banana|bandana|pen

..我还有一个引用字符串,我希望与data.table中的字符串匹配,提取单词,如果它在那里,就像这样..

fruits = "apple|banana|orange"

str_match(fruits, "flower|orange|cup")
>"orange"

如何为整个data.table执行此操作?

require(dplyr)
require(stringr)

dt %>%
   mutate (fruit = str_match(fruits, x))

Error in rep(NA_character_, n) : invalid 'times' argument
In addition: Warning message:
In regexec(c("book|ball|apple", "flower|orange|cup", "banana|bandana|pen" :
argument 'pattern' has length > 1 and only the first element will be used

我想要的是什么:

                   x       fruit
1    book|ball|apple       apple
2  flower|orange|cup      orange
3 banana|bandana|pen      banana

3 个答案:

答案 0 :(得分:2)

或者(为了避免警告,最好是使用tbl_dt来代替data.table而不是{/ 1}}

dt[, fruits := mapply(str_match, fruits, x)]
dt
##                     x fruits
## 1:    book|ball|apple  apple
## 2:  flower|orange|cup orange
## 3: banana|bandana|pen banana

或者你可以做类似@ akrun的回答,比如

dt[, fruits := lapply(x, str_match, fruits)]

答案 1 :(得分:1)

 dt$fruit <- unlist(lapply(dt$x, str_match, fruits))

 dt
 #Source: local data table [3 x 2]
 #
 #                  x  fruit
#1    book|ball|apple  apple
#2  flower|orange|cup orange
#3 banana|bandana|pen banana

答案 2 :(得分:0)

使用基数R且没有str_match的解决方案:

fruit=NULL
reflist = unlist(strsplit(fruits, '\\|'))
for(xx in ddf$x){
    ss = unlist(strsplit(xx,'\\|'))
    for(s in ss) if(s %in% reflist) fruit[length(fruit)+1]=s
}
ddf$fruit = fruit
ddf
#                   x  fruit
#1    book|ball|apple  apple
#2  flower|orange|cup orange
#3 banana|bandana|pen banana