“\ b”字边界如何影响perl中的输出?

时间:2015-12-02 09:45:45

标签: regex perl

我想修改一个字符串。我的正则表达式应修改数字

12365478965412365

,放入3个集合中。将数字转换为3个集合,使输出看起来像

12,365,478,965,412,365

我们可以使用前瞻和后视来实现这个目标

s/(?<=\d)(?=(\d\d\d)+\b)/\,/g

但是当我删除\b

s/(?<=\d)(?=(\d\d\d)+)/\,/g

我输出为

1,2,3,6,5,4,7,8,9,6,5,4,1,2,365.

\b如何影响后面的位置以应用“,”?

regex是否会在测试背后的字边界结束之前进行测试?

1 个答案:

答案 0 :(得分:4)

\b的作用与单词之间的边界相匹配。否则为零宽度。来自perlre

  

单词边界(\b)是两个字符之间的一个点,其一侧有\w,另一侧有\W(按任意顺序) ,将字符串开头和结尾的虚数字符计算为匹配\W

您尝试做的事情的问题在于,逗号的定位是从右到左的操作 - 您不知道它应该是10,000或100,000,直到您&#39 ;已经看到了字符串中的总位数。

所以我建议如果你不做直接&#39;那么这会容易得多。 regex和lookaheads,而不是reverse

my $str =  '12365478965412365';    
my $comma_sep_str = reverse ( reverse ($str) =~ s/(\d{3})/$1,/rg );
print $comma_sep_str;

将其反转,从左到右分组,然后再将其反转。

如果你对正则表达式正在做什么有问题,那么正常的技巧就是打开use re 'debug';

我不会重现输出,因为它很长。但正在发生的是该模式使用\b锚定在行尾。

如果你拿走g标志,你可以更清楚地看到这一点:

Compiling REx "(?<=\d)(?=(\d\d\d)+\b)"
Final program:
   1: IFMATCH[-1] (6)
   3:   POSIXU[\d] (4)
   4:   SUCCEED (0)
   5: TAIL (6)
   6: IFMATCH[0] (22)
   8:   CURLYM[1] {1,32767} (19)
  12:     POSIXU[\d] (13)
  13:     POSIXU[\d] (14)
  14:     POSIXU[\d] (17)
  17:     SUCCEED (0)
  18:   NOTHING (19)
  19:   BOUND (20)
  20:   SUCCEED (0)
  21: TAIL (22)
  22: END (0)
minlen 0 
Matching REx "(?<=\d)(?=(\d\d\d)+\b)" against "12365478965412365"
   0 <> <1236547896>         |  1:IFMATCH[-1](6)
                                  failed...
   1 <1> <2365478965>        |  1:IFMATCH[-1](6)
   0 <> <1236547896>         |  3:  POSIXU[\d](4)
   1 <1> <2365478965>        |  4:  SUCCEED(0)
                                    subpattern success...
   1 <1> <2365478965>        |  6:IFMATCH[0](22)
   1 <1> <2365478965>        |  8:  CURLYM[1] {1,32767}(19)
   1 <1> <2365478965>        | 12:    POSIXU[\d](13)
   2 <12> <3654789654>       | 13:    POSIXU[\d](14)
   3 <123> <6547896541>      | 14:    POSIXU[\d](17)
   4 <1236> <5478965412>     | 17:    SUCCEED(0)
                                      subpattern success...
                                    CURLYM now matched 1 times, len=3...
   4 <1236> <5478965412>     | 12:    POSIXU[\d](13)
   5 <12365> <4789654123>    | 13:    POSIXU[\d](14)
   6 <23654> <7896541236>    | 14:    POSIXU[\d](17)
   7 <36547> <8965412365>    | 17:    SUCCEED(0)
                                      subpattern success...
                                    CURLYM now matched 2 times, len=3...
   7 <36547> <8965412365>    | 12:    POSIXU[\d](13)
   8 <65478> <965412365>     | 13:    POSIXU[\d](14)
   9 <54789> <65412365>      | 14:    POSIXU[\d](17)
  10 <47896> <5412365>       | 17:    SUCCEED(0)
                                      subpattern success...
                                    CURLYM now matched 3 times, len=3...
  10 <47896> <5412365>       | 12:    POSIXU[\d](13)
  11 <478965> <412365>       | 13:    POSIXU[\d](14)
  12 <4789654> <12365>       | 14:    POSIXU[\d](17)
  13 <47896541> <2365>       | 17:    SUCCEED(0)
                                      subpattern success...
                                    CURLYM now matched 4 times, len=3...
  13 <47896541> <2365>       | 12:    POSIXU[\d](13)
  14 <478965412> <365>       | 13:    POSIXU[\d](14)
  15 <4789654123> <65>       | 14:    POSIXU[\d](17)
  16 <47896541236> <5>       | 17:    SUCCEED(0)
                                      subpattern success...
                                    CURLYM now matched 5 times, len=3...
  16 <47896541236> <5>       | 12:    POSIXU[\d](13)
  17 <478965412365> <>       | 13:    POSIXU[\d](14)
                                      failed...
                                    CURLYM trying tail with matches=5...
  16 <47896541236> <5>       | 19:    BOUND(20)
                                      failed...
                                    CURLYM trying tail with matches=4...
  13 <47896541> <2365>       | 19:    BOUND(20)
                                      failed...
                                    CURLYM trying tail with matches=3...
  10 <47896> <5412365>       | 19:    BOUND(20)
                                      failed...
                                    CURLYM trying tail with matches=2...
   7 <36547> <8965412365>    | 19:    BOUND(20)
                                      failed...
                                    CURLYM trying tail with matches=1...
   4 <1236> <5478965412>     | 19:    BOUND(20)
                                      failed...
                                    failed...
                                  failed...
   2 <12> <3654789654>       |  1:IFMATCH[-1](6)
   1 <1> <2365478965>        |  3:  POSIXU[\d](4)
   2 <12> <3654789654>       |  4:  SUCCEED(0)
                                    subpattern success...
   2 <12> <3654789654>       |  6:IFMATCH[0](22)
   2 <12> <3654789654>       |  8:  CURLYM[1] {1,32767}(19)
   2 <12> <3654789654>       | 12:    POSIXU[\d](13)
   3 <123> <6547896541>      | 13:    POSIXU[\d](14)
   4 <1236> <5478965412>     | 14:    POSIXU[\d](17)
   5 <12365> <4789654123>    | 17:    SUCCEED(0)
                                      subpattern success...
                                    CURLYM now matched 1 times, len=3...
   5 <12365> <4789654123>    | 12:    POSIXU[\d](13)
   6 <23654> <7896541236>    | 13:    POSIXU[\d](14)
   7 <36547> <8965412365>    | 14:    POSIXU[\d](17)
   8 <65478> <965412365>     | 17:    SUCCEED(0)
                                      subpattern success...
                                    CURLYM now matched 2 times, len=3...
   8 <65478> <965412365>     | 12:    POSIXU[\d](13)
   9 <54789> <65412365>      | 13:    POSIXU[\d](14)
  10 <47896> <5412365>       | 14:    POSIXU[\d](17)
  11 <478965> <412365>       | 17:    SUCCEED(0)
                                      subpattern success...
                                    CURLYM now matched 3 times, len=3...
  11 <478965> <412365>       | 12:    POSIXU[\d](13)
  12 <4789654> <12365>       | 13:    POSIXU[\d](14)
  13 <47896541> <2365>       | 14:    POSIXU[\d](17)
  14 <478965412> <365>       | 17:    SUCCEED(0)
                                      subpattern success...
                                    CURLYM now matched 4 times, len=3...
  14 <478965412> <365>       | 12:    POSIXU[\d](13)
  15 <4789654123> <65>       | 13:    POSIXU[\d](14)
  16 <47896541236> <5>       | 14:    POSIXU[\d](17)
  17 <478965412365> <>       | 17:    SUCCEED(0)
                                      subpattern success...
                                    CURLYM now matched 5 times, len=3...
  17 <478965412365> <>       | 12:    POSIXU[\d](13)
                                      failed...
                                    CURLYM trying tail with matches=5...
  17 <478965412365> <>       | 19:    BOUND(20)
  17 <478965412365> <>       | 20:    SUCCEED(0)
                                      subpattern success...
   2 <12> <3654789654>       | 22:END(0)
Match successful!
Freeing REx: "(?<=\d)(?=(\d\d\d)+\b)"

12,365478965412365

由于正在进行外观断言,在正则表达式的这一次迭代中有很多步骤,因为它首先匹配的是:

 (\d\d\d)+\b

由&#39;边界&#39;锚定的3个或更多个数字的1个或多个实例。但是没有,所以它只使用了行尾。

这里不清楚的是\b实际上就像它是$一样。它充当了模式右侧的锚点。您的模式必须读取那么远,然后回溯,以便它可以从右侧匹配(\d\d\d)+。没有它,你的模式不会被锚定,因此匹配任何4位数的子字符串 - 但由于它不消耗,它将匹配除最后3个之外的每个数字。(这是什么&#39;发生了)

如果您使用$,您的模式也会一样。希望这能让我更清楚发生什么事情?

my $str =  '12365478965412365';    
$str =~ s/(?<=\d)(?=(\d\d\d)+$)/\,/g;    
print $str;