需要提取以R中的特定单词开头的整个句子

时间:2016-11-03 16:15:38

标签: r regex

我需要提取整个句子,这些句子以R中的特定单词开头。下面是我尝试使用的代码,但无法获得所需的结果。我是R中正则表达概念的新手。我想提取以'database'开头的句子。

 sent <- c("database connection","connection database fail", "fail connection database","database connection is good")
 m <- gregexpr('database.*', sent)
 regmatches(sent, m)

上面的代码在“数据库”之后给出了剩余的单词。但我想要的输出是:

 "database connection", "database connection is good"

感谢您的帮助!

2 个答案:

答案 0 :(得分:3)

使用stringr

sent <- c("database connection","connection database fail", "fail connection database","database connection is good")
stringr::str_subset(sent, "^database.*")
#> [1] "database connection"         "database connection is good"

以基地R:

sent <- c("database connection","connection database fail", "fail connection database","database connection is good")
grep("^database.*", sent, value = T)
#> [1] "database connection"         "database connection is good"

答案 1 :(得分:1)

你没有将正则表达式固定在线的前面。如果您使用前锚(^),您将获得所需的结果。以下是您的代码应该是什么样的:

sent <- c("database connection","connection database fail", "fail connection database","database connection is good")
m <- gregexpr('^database.*', sent)
regmatches(sent, m)

如果要从结果中删除character(0)元素,可以使最后一行为

r <- regmatches(sent, m)
r <- r[lapply(r,length)>0]