Question

我有一个文本对象，我只想提取以大写字母开头的连续单词（例如，John Rye）。我试图使用regmatches（）和gregexpr（）但我得到一个错误。我怎么能解决这个问题？

txt<-"This is John Rye walking."

regmatches(txt, gregexpr('(.*)\s(.*)', txt, perl=T))[[1]]
Error: '\s' is an unrecognized escape in character string starting "'(.*)\s"

我也尝试过：

regmatches(txt, gregexpr('(^[A-Z][-a-zA-Z]+$)', txt, perl=T))[[1]]

但得到了这个结果：

character(0)

Answer 1

^和$是字符串锚点的开头/结尾，您可能会将它们与字边界混淆（\b，\\b如果已转义）。 -看起来不合适。

因此，正则表达式应该改为

\\b[A-Z][a-zA-Z]+\\b

Answer 2

这也可以完成这项工作：

 txt<-"This is John Rye walking."
 regmatches(txt, gregexpr('(([A-Z])\\w+\\b ){2}', txt))[[1]]
 [1] "John Rye "

使用正则表达式函数时出错

2 个答案: