Question

我需要提取整个句子，这些句子以R中的特定单词开头。下面是我尝试使用的代码，但无法获得所需的结果。我是R中正则表达概念的新手。我想提取以'database'开头的句子。

 sent <- c("database connection","connection database fail", "fail connection database","database connection is good")
 m <- gregexpr('database.*', sent)
 regmatches(sent, m)

上面的代码在“数据库”之后给出了剩余的单词。但我想要的输出是：

 "database connection", "database connection is good"

感谢您的帮助！

Answer 1

使用stringr

sent <- c("database connection","connection database fail", "fail connection database","database connection is good")
stringr::str_subset(sent, "^database.*")
#> [1] "database connection"         "database connection is good"

以基地R：

sent <- c("database connection","connection database fail", "fail connection database","database connection is good")
grep("^database.*", sent, value = T)
#> [1] "database connection"         "database connection is good"

Answer 2

你没有将正则表达式固定在线的前面。如果您使用前锚（^），您将获得所需的结果。以下是您的代码应该是什么样的：

sent <- c("database connection","connection database fail", "fail connection database","database connection is good")
m <- gregexpr('^database.*', sent)
regmatches(sent, m)

如果要从结果中删除character(0)元素，可以使最后一行为

r <- regmatches(sent, m)
r <- r[lapply(r,length)>0]

需要提取以R中的特定单词开头的整个句子

2 个答案: