RStudio中的字符串匹配

时间:2018-09-06 16:23:36

标签: r string

我在以下情况下遇到麻烦。我有一个数据框df,它在var1中有多字字符串。如果该单词在var1中,我只希望保留chr中的单词。例如,var1的第一行有“汽车电视狗”,我想删除单词“ dog”,因为它不在chr中。

我的数据框:

id <- c(1,2,3)
var1 <- c("car tv dog","cat water mouse","pen wire fish")
df <- data.frame(id,var1)

我要保留的单词:

chr<-"car aaa bbb ccc ddd qqq www eee rrr pen cat ttt fish tv"

所需结果:

want <- c("car tv","cat","pen fish")
dfWant <- data.frame(id, var1, want) 

任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:1)

代码:

# example data
df <- data.frame(
    id = 1:3,
    var1 = c("car tv dog", "cat water mouse", "pen wire fish"),
    stringsAsFactors = FALSE
)

# strings to search for (save each word as an element of a vector)
chr <- "car aaa bbb ccc ddd qqq www eee rrr pen cat ttt fish tv"
chr_vec <- unique(unlist(strsplit(chr, " ")))

# split var1 into words, check if word is in chr_vec, 
# keep only if true, re-combine into multi-word string
df$result <- unlist(lapply(strsplit(df$var1, " "), function(x) paste(x[x %in% chr_vec], collapse = " ")))

结果:

> df
  id            var1   result
1  1      car tv dog   car tv
2  2 cat water mouse      cat
3  3   pen wire fish pen fish