Question

继续从前一个问题中找出正则表达式工具的行为

R regmatches() and stringr str_extract() dragging whitespaces along

当我将模式更改为提取以提取第一个数字和日期时间时

pattern="^ *(\\d+) +(\\d+ +\\d+ +\\d+ +\\d+ +\\d+ +[\\.\\d]+) +"
test=" 2 15  3 23 12  0  0.18 "

str_match()给出了预期的结果：

f=str_match(test,pattern)
>f
     [,1]                       [,2] [,3]                  
[1,] " 2 15  3 23 12  0  0.18 " "2"  "15  3 23 12  0  0.18"

虽然regmatches没有找到匹配

d=regmatches(test,gregexpr(pattern,test))
> d
[[1]]
character(0)

输出

gregexpr

 > gregexpr(pattern[2],test[1])
[[1]]
[1] -1
attr(,"match.length")
[1] -1
attr(,"useBytes")
[1] TRUE

我不知道为什么会这样。将gregexp替换为regexec和regexpr会得到相同的结果。

Answer 1

关键是您需要将POSIX 1003.2扩展正则表达式传递给pattern="^ *([[:digit:]]+) +([[:digit:]]+ +[[:digit:]]+ +[[:digit:]]+ +[[:digit:]]+ +[[:digit:]]+ +[.[:digit:]]+) +"函数：

pattern="^ *(\\d+) +(\\d+ +\\d+ +\\d+ +\\d+ +\\d+ +[.0-9]+) +"

或

pattern="^ *(\\d+) +((?:\\d+ +){5}[.0-9]+) +"

或带有限制量词和分组的缩短版本：

由于在字符类中使用了转义\d和简写类regexec，因此出现了问题。这只能在Perl风格（PCRE）正则表达式中实现。 perl=T does not support regexec选项：

这些功能中的每一个（除了> test=" 2 15 3 23 12 0 0.18 " > pattern="^ *(\\d+) +(\\d+ +\\d+ +\\d+ +\\d+ +\\d+ +[.0-9]+) +" > regmatches(test,regexec(pattern,test)) [[1]] [1] " 2 15 3 23 12 0 0.18 " "2" [3] "15 3 23 12 0 0.18"，目前不支持Perl风格的正则表达式）以三种模式之一运行

perl=T

仅供参考，gregexpr（允许regexec选项）不返回子匹配的索引，因此不能在此处使用它而不是Template.myTemplate.onRendered();。

R str_match（）做什么regmatches（）不要

1 个答案: