Question

例如，在此示例中，我想删除text中包含http和america的元素。

> text <- c("One word@", "112a httpSentenceamerica", "you and meamerica", "three two one")

因此，我会使用逻辑运算符|。

> pattern <- "http|america"

哪个有效，因为这被认为是一种模式。

> grep(pattern, text, invert = TRUE, value = TRUE)
[1] "One word@"     "three two one"

如果我想在模式中使用一长串单词，该怎么办？我该怎么做？我不认为我可以继续使用逻辑运算符很多次。

提前谢谢！

Answer 1

一般来说，正如@akrun所说：

text <- c("One word@", "112a httpSentenceamerica", "you and meamerica", "three two one")
pattern = c("http", "america")
grep(paste(pattern, collapse = "|"), text, invert = TRUE, value = TRUE)
# [1] "One word@"     "three two one"

你写道，你的单词列表是＃34; long。＆＃34;毫无疑问，这个解决方案无法无限扩展：

long_pattern = paste(rep(pattern, 1300), collapse = "|")
nchar(long_pattern)
# [1] 16899
grep(long_pattern, text, invert = TRUE, value = TRUE)
# Error in grep(long_pattern, text, invert = TRUE, value = TRUE) :

但如果有必要，你可以使用MapReduce，从以下内容开始：

text[Reduce(`&`, Map(function(p) !grepl(p, text), long_pattern))]
# [1] "One word@"     "three two one"

R：grep（）可以包含多个模式吗？

1 个答案: