perl兼容的正则表达式可以比较两个数字吗?

时间:2016-04-22 17:43:06

标签: regex pcre

找到本杰明按钮

鉴于......

Born,Died
1852,1891
1862,1862
1902,1785

..,perl兼容的正则表达式中是否有与第四行匹配的语法,其中第一个值大于第二个值?

我猜是结合......

(\d+),(\d+)

......和......

(??{$1>$2})

..,但也许这是不可能的,因为正则表达式是词法而且匹配是算术的。

编辑:这是受限于pcre-regex,因为环境接受pcre模式但禁止perl程序。

2 个答案:

答案 0 :(得分:1)

摘要

此正则表达式假定您的源数字是4位数字符串。它将找到第一个逗号分隔数字在数字上大于第二个数字的情况 正如所写,这个正则表达式假定你使用“x”标志忽略空格或换行符。

的正则表达式

^(?=\d{4},\d{4}(?:\D|\Z))(?:(?:
[9]\d*,[012345678]\d*|
[89]\d*,[01234567]\d*|
[789]\d*,[0123456]\d*|
[6789]\d*,[012345]\d*|
[56789]\d*,[01234]\d*|
[456789]\d*,[0123]\d*|
[3456789]\d*,[012]\d*|
[23456789]\d*,[01]\d*|
[123456789]\d*,[0]\d*
)|
(?<a>\d{1})(?:
[9]\d*,\k<a>[012345678]\d*|
[89]\d*,\k<a>[01234567]\d*|
[789]\d*,\k<a>[0123456]\d*|
[6789]\d*,\k<a>[012345]\d*|
[56789]\d*,\k<a>[01234]\d*|
[456789]\d*,\k<a>[0123]\d*|
[3456789]\d*,\k<a>[012]\d*|
[23456789]\d*,\k<a>[01]\d*|
[123456789]\d*,\k<a>[0]\d*
)|
(?<b>\d{2})(?:
[9]\d*,\k<b>[012345678]\d*|
[89]\d*,\k<b>[01234567]\d*|
[789]\d*,\k<b>[0123456]\d*|
[6789]\d*,\k<b>[012345]\d*|
[56789]\d*,\k<b>[01234]\d*|
[456789]\d*,\k<b>[0123]\d*|
[3456789]\d*,\k<b>[012]\d*|
[23456789]\d*,\k<b>[01]\d*|
[123456789]\d*,\k<b>[0]\d*
)|
(?<c>\d{3})(?:
[9]\d*,\k<c>[012345678]\d*|
[89]\d*,\k<c>[01234567]\d*|
[789]\d*,\k<c>[0123456]\d*|
[6789]\d*,\k<c>[012345]\d*|
[56789]\d*,\k<c>[01234]\d*|
[456789]\d*,\k<c>[0123]\d*|
[3456789]\d*,\k<c>[012]\d*|
[23456789]\d*,\k<c>[01]\d*|
[123456789]\d*,\k<c>[0]\d*
))

实施例

http://www.rubular.com/r/XjBNBQIzGP

示例文本

Born,Died
1852,1891
1862,1862
1902,1785
1111,1111
1111,1110
2222,2202
3333,3033
4444,0444
123,456
1234,567
123,4567
456,123
4567,123
456,1234
4567,1234

样本捕获

[0][0] = 1902,1785
[0][a] = 1
[0][b] = 
[0][c] = 
[1][0] = 1111,1110
[1][a] = 
[1][b] = 
[1][c] = 111
[2][0] = 2222,2202
[2][a] = 
[2][b] = 22
[2][c] = 
[3][0] = 3333,3033
[3][a] = 3
[3][b] = 
[3][c] = 
[4][0] = 4444,0444
[4][a] = 
[4][b] = 
[4][c] = 
[5][0] = 4567,1234
[5][a] = 
[5][b] = 
[5][c] = 

简短说明

正则表达式的开头有一个先行,以验证我们确实有两个4位数字。

四个代码块测试每个位置以验证一个数字是否大于另一个数字。第二,第三和第四块包含命名的反向引用(分别为a,b,c)。这种反向引用确保了前导数字相同。

更详细的说明

NODE                     EXPLANATION
----------------------------------------------------------------------
  ^                        the beginning of a "line"
----------------------------------------------------------------------
  (?=                      look ahead to see if there is:
----------------------------------------------------------------------
    \d{4}                    digits (0-9) (4 times)
----------------------------------------------------------------------
    ,                        ','
----------------------------------------------------------------------
    \d{4}                    digits (0-9) (4 times)
----------------------------------------------------------------------
    (?:                      group, but do not capture:
----------------------------------------------------------------------
      \D                       non-digits (all but 0-9)
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      \Z                       before an optional \n, and the end of
                               the string
----------------------------------------------------------------------
    )                        end of grouping
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    (?:                      group, but do not capture:
----------------------------------------------------------------------
      [9]                      any character of: '9'
----------------------------------------------------------------------
      \d*                      digits (0-9) (0 or more times
                               (matching the most amount possible))
----------------------------------------------------------------------
      ,                        ','
----------------------------------------------------------------------
      [012345678]              any character of: '0', '1', '2', '3',
                               '4', '5', '6', '7', '8'
----------------------------------------------------------------------
      \d*                      digits (0-9) (0 or more times
                               (matching the most amount possible))
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      [89]                     any character of: '8', '9'
----------------------------------------------------------------------
      \d*                      digits (0-9) (0 or more times
                               (matching the most amount possible))
----------------------------------------------------------------------
      ,                        ','
----------------------------------------------------------------------
      [01234567]               any character of: '0', '1', '2', '3',
                               '4', '5', '6', '7'
----------------------------------------------------------------------
      \d*                      digits (0-9) (0 or more times
                               (matching the most amount possible))
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      [789]                    any character of: '7', '8', '9'
----------------------------------------------------------------------
      \d*                      digits (0-9) (0 or more times
                               (matching the most amount possible))
----------------------------------------------------------------------
      ,                        ','
----------------------------------------------------------------------
      [0123456]                any character of: '0', '1', '2', '3',
                               '4', '5', '6'
----------------------------------------------------------------------
      \d*                      digits (0-9) (0 or more times
                               (matching the most amount possible))
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      [6789]                   any character of: '6', '7', '8', '9'
----------------------------------------------------------------------
      \d*                      digits (0-9) (0 or more times
                               (matching the most amount possible))
----------------------------------------------------------------------
      ,                        ','
----------------------------------------------------------------------
      [012345]                 any character of: '0', '1', '2', '3',
                               '4', '5'
----------------------------------------------------------------------
      \d*                      digits (0-9) (0 or more times
                               (matching the most amount possible))
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      [56789]                  any character of: '5', '6', '7', '8',
                               '9'
----------------------------------------------------------------------
      \d*                      digits (0-9) (0 or more times
                               (matching the most amount possible))
----------------------------------------------------------------------
      ,                        ','
----------------------------------------------------------------------
      [01234]                  any character of: '0', '1', '2', '3',
                               '4'
----------------------------------------------------------------------
      \d*                      digits (0-9) (0 or more times
                               (matching the most amount possible))
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      [456789]                 any character of: '4', '5', '6', '7',
                               '8', '9'
----------------------------------------------------------------------
      \d*                      digits (0-9) (0 or more times
                               (matching the most amount possible))
----------------------------------------------------------------------
      ,                        ','
----------------------------------------------------------------------
      [0123]                   any character of: '0', '1', '2', '3'
----------------------------------------------------------------------
      \d*                      digits (0-9) (0 or more times
                               (matching the most amount possible))
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      [3456789]                any character of: '3', '4', '5', '6',
                               '7', '8', '9'
----------------------------------------------------------------------
      \d*                      digits (0-9) (0 or more times
                               (matching the most amount possible))
----------------------------------------------------------------------
      ,                        ','
----------------------------------------------------------------------
      [012]                    any character of: '0', '1', '2'
----------------------------------------------------------------------
      \d*                      digits (0-9) (0 or more times
                               (matching the most amount possible))
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      [23456789]               any character of: '2', '3', '4', '5',
                               '6', '7', '8', '9'
----------------------------------------------------------------------
      \d*                      digits (0-9) (0 or more times
                               (matching the most amount possible))
----------------------------------------------------------------------
      ,                        ','
----------------------------------------------------------------------
      [01]                     any character of: '0', '1'
----------------------------------------------------------------------
      \d*                      digits (0-9) (0 or more times
                               (matching the most amount possible))
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      [123456789]              any character of: '1', '2', '3', '4',
                               '5', '6', '7', '8', '9'
----------------------------------------------------------------------
      \d*                      digits (0-9) (0 or more times
                               (matching the most amount possible))
----------------------------------------------------------------------
      ,                        ','
----------------------------------------------------------------------
      [0]                      any character of: '0'
----------------------------------------------------------------------
      \d*                      digits (0-9) (0 or more times
                               (matching the most amount possible))
----------------------------------------------------------------------
    )                        end of grouping
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
)                        end of grouping

答案 1 :(得分:1)

使用Perl“延迟执行断言”(??{code})的工作模式是:

^(\d{4}),(\d{4})(??{ $1 > $2 ? "" : "(?!)"})$

延迟执行断言将代码返回的值放入正则表达式中。在这种情况下的效果是,如果第一个数字大于第二个数字(因此匹配),则模式变为^(\d{4}),(\d{4})$,否则为^(\d{4}),(\d{4})(?!)$(?!)是一个负面的先行断言,永远不会匹配,因为Perl认为空模式总是匹配。

Perl的另一个选择是使用“条件表达式”,(?(condition)yes-pattern)和“代码评估断言”,(?{code})

^(\d{4}),(\d{4})(?(?{ $1 <= $2 })(?!))$

如果第一个数字小于或等于第二个数字,则具有向模式添加永不匹配的(?!)的效果。我的测试显示这比上面的第一个模式要快得多。

有关所有Perl正则表达式功能的详细信息,请参阅perlre手册页。

请参阅Jeff Pinyan撰写的Regex Arcana文章,了解有关(??{code})(?{code})模式的优秀教程。

但是,上述模式不适用于PCRE库。 @Sebastian的评论提出了一个可能的解决方案(哪些不起作用):

^(\d*)(\d)\d*,\1[^\D\2-9]\d*$

这会尝试查找数字对具有相同前缀的数字对,第二个数字中的第一个不同字符不是非数字(即 是一个数字)并且不等于或者大于第一个数字中的相应数字(即 比另一个数字<)>。不幸的是,它不起作用。原因在General approach for (equivalent of) “backreferences within character class”?中解释。基本上,反向引用在字符类中不起作用。可以通过使用延迟执行断言(^(\d*)(\d)\d*,\1(??{"[^\D${2}-9]"})\d*$)来实现这个想法,但这仍然对PCRE没有好处。

一个与PCRE兼容的选项是使用蛮力版本的首先检查不同数字的想法。查找1后跟0,或2后跟0或1,或3后跟0或1或2,....这段Bash代码生成正则表达式:

regex='^(?=\d{4},\d{4}$)'   # Match only lines of the form 'dddd,dddd'
regex+='(\d*)'              # Prefix of both numbers
regex+='(1\d*,\1[0]'        # 1 (followed by digits+','+prefix) followed by 0
for (( i=2 ; i<=9 ; i++ )) ; do
    regex+="|$i\d*,\1[0-$((i-1))]"  # or $i (...) followed by a lesser digit
done
regex+=')\d*$'
printf '%s\n' "$regex"

它还在@Denomales在第一个发布的答案中使用的正则表达式的开头添加了相同的正向前瞻断言。生成的正则表达式为:

^(?=\d{4},\d{4}$)(\d*)(1\d*,\1[0]|2\d*,\1[0-1]|3\d*,\1[0-2]|4\d*,\1[0-3]|5\d*,\1[0-4]|6\d*,\1[0-5]|7\d*,\1[0-6]|8\d*,\1[0-7]|9\d*,\1[0-8])\d*$

由于@ThisSuitIsBlackNot在评论中指出,正则表达式不是最好的方法。另请参阅What is meant by “Now you have two problems”?