保持部分匹配的字符串

时间:2020-09-04 12:21:05

标签: r

我只想在向量中保留部分与另一个向量中的字符串部分匹配的字符串。

看看这个例子:

> dput(cc)
c("BLANK_0", "Greg_10", "Luke_40", "Luke_10", "Mark_10", "NA_40", "BLANK_10", "Joe_15", "Jane_10", "BLANK_40", "Greg_40", "Hvserk_40", "NA_10")

而且我想让字符串像下面的向量中的元素一样开始:

> dput(vec_all_compounds)
c("Greg", "Luke", "Mark", "Joe", "Jane", "Hvserk")

这意味着Greg_10Luke_10Hvserk_40etc均应保留并保持不变。可行吗?

3 个答案:

答案 0 :(得分:4)

我建议使用grepl()为向量建立索引的下一种方法:

#Code
cc[grepl(pattern = paste0(vec_all_compounds,collapse = '|'),cc)]

输出:

[1] "Greg_10"   "Luke_40"   "Luke_10"   "Mark_10"   "Joe_15"    "Jane_10"   "Greg_40"   "Hvserk_40"

答案 1 :(得分:3)

您也可以将grepvalue = TRUE结合使用:

grep(paste0(vec_all_compounds, collapse = "|"), cc, value = TRUE)
#[1] "Greg_10" "Luke_40" "Luke_10" "Mark_10" "Joe_15" "Jane_10"  "Greg_40"   "Hvserk_40"

stringr::str_subset相同:

stringr::str_subset(cc, paste0(vec_all_compounds, collapse = "|"))

答案 2 :(得分:2)

您可以使用gsub + %in%

> cc[gsub("_.*","",cc) %in% vec_all_compounds]
[1] "Greg_10"   "Luke_40"   "Luke_10"   "Mark_10"   "Joe_15"    "Jane_10"
[7] "Greg_40"   "Hvserk_40"