我需要提取整个句子,这些句子以R中的特定单词开头。下面是我尝试使用的代码,但无法获得所需的结果。我是R中正则表达概念的新手。我想提取以'database'开头的句子。
sent <- c("database connection","connection database fail", "fail connection database","database connection is good")
m <- gregexpr('database.*', sent)
regmatches(sent, m)
上面的代码在“数据库”之后给出了剩余的单词。但我想要的输出是:
"database connection", "database connection is good"
感谢您的帮助!
答案 0 :(得分:3)
使用stringr
sent <- c("database connection","connection database fail", "fail connection database","database connection is good")
stringr::str_subset(sent, "^database.*")
#> [1] "database connection" "database connection is good"
以基地R:
sent <- c("database connection","connection database fail", "fail connection database","database connection is good")
grep("^database.*", sent, value = T)
#> [1] "database connection" "database connection is good"
答案 1 :(得分:1)
你没有将正则表达式固定在线的前面。如果您使用前锚(^
),您将获得所需的结果。以下是您的代码应该是什么样的:
sent <- c("database connection","connection database fail", "fail connection database","database connection is good")
m <- gregexpr('^database.*', sent)
regmatches(sent, m)
如果要从结果中删除character(0)
元素,可以使最后一行为
r <- regmatches(sent, m)
r <- r[lapply(r,length)>0]