我在以下情况下遇到麻烦。我有一个数据框df
,它在var1
中有多字字符串。如果该单词在var1
中,我只希望保留chr
中的单词。例如,var1
的第一行有“汽车电视狗”,我想删除单词“ dog”,因为它不在chr
中。
我的数据框:
id <- c(1,2,3)
var1 <- c("car tv dog","cat water mouse","pen wire fish")
df <- data.frame(id,var1)
我要保留的单词:
chr<-"car aaa bbb ccc ddd qqq www eee rrr pen cat ttt fish tv"
所需结果:
want <- c("car tv","cat","pen fish")
dfWant <- data.frame(id, var1, want)
任何帮助将不胜感激。
答案 0 :(得分:1)
代码:
# example data
df <- data.frame(
id = 1:3,
var1 = c("car tv dog", "cat water mouse", "pen wire fish"),
stringsAsFactors = FALSE
)
# strings to search for (save each word as an element of a vector)
chr <- "car aaa bbb ccc ddd qqq www eee rrr pen cat ttt fish tv"
chr_vec <- unique(unlist(strsplit(chr, " ")))
# split var1 into words, check if word is in chr_vec,
# keep only if true, re-combine into multi-word string
df$result <- unlist(lapply(strsplit(df$var1, " "), function(x) paste(x[x %in% chr_vec], collapse = " ")))
结果:
> df
id var1 result
1 1 car tv dog car tv
2 2 cat water mouse cat
3 3 pen wire fish pen fish