我有这个data.table with strings:
dt = tbl_dt(data.table(x=c("book|ball|apple", "flower|orange|cup", "banana|bandana|pen")))
x
1 book|ball|apple
2 flower|orange|cup
3 banana|bandana|pen
..我还有一个引用字符串,我希望与data.table中的字符串匹配,提取单词,如果它在那里,就像这样..
fruits = "apple|banana|orange"
str_match(fruits, "flower|orange|cup")
>"orange"
如何为整个data.table执行此操作?
require(dplyr)
require(stringr)
dt %>%
mutate (fruit = str_match(fruits, x))
Error in rep(NA_character_, n) : invalid 'times' argument
In addition: Warning message:
In regexec(c("book|ball|apple", "flower|orange|cup", "banana|bandana|pen" :
argument 'pattern' has length > 1 and only the first element will be used
我想要的是什么:
x fruit
1 book|ball|apple apple
2 flower|orange|cup orange
3 banana|bandana|pen banana
答案 0 :(得分:2)
或者(为了避免警告,最好是使用tbl_dt
来代替data.table
而不是{/ 1}}
dt[, fruits := mapply(str_match, fruits, x)]
dt
## x fruits
## 1: book|ball|apple apple
## 2: flower|orange|cup orange
## 3: banana|bandana|pen banana
或者你可以做类似@ akrun的回答,比如
dt[, fruits := lapply(x, str_match, fruits)]
答案 1 :(得分:1)
dt$fruit <- unlist(lapply(dt$x, str_match, fruits))
dt
#Source: local data table [3 x 2]
#
# x fruit
#1 book|ball|apple apple
#2 flower|orange|cup orange
#3 banana|bandana|pen banana
答案 2 :(得分:0)
使用基数R且没有str_match的解决方案:
fruit=NULL
reflist = unlist(strsplit(fruits, '\\|'))
for(xx in ddf$x){
ss = unlist(strsplit(xx,'\\|'))
for(s in ss) if(s %in% reflist) fruit[length(fruit)+1]=s
}
ddf$fruit = fruit
ddf
# x fruit
#1 book|ball|apple apple
#2 flower|orange|cup orange
#3 banana|bandana|pen banana