Question

我想修改一个字符串。我的正则表达式应修改数字

12365478965412365

将,放入3个集合中。将数字转换为3个集合，使输出看起来像

12,365,478,965,412,365

我们可以使用前瞻和后视来实现这个目标

s/(?<=\d)(?=(\d\d\d)+\b)/\,/g

但是当我删除\b

时

s/(?<=\d)(?=(\d\d\d)+)/\,/g

我输出为

1,2,3,6,5,4,7,8,9,6,5,4,1,2,365.

\b如何影响后面的位置以应用“，”？

regex是否会在测试背后的字边界结束之前进行测试？

Answer 1

\b的作用与单词之间的边界相匹配。否则为零宽度。来自perlre：

单词边界（\b）是两个字符之间的一个点，其一侧有\w，另一侧有\W（按任意顺序），将字符串开头和结尾的虚数字符计算为匹配\W。

您尝试做的事情的问题在于，逗号的定位是从右到左的操作 - 您不知道它应该是10,000或100,000，直到您＆＃39 ;已经看到了字符串中的总位数。

所以我建议如果你不做直接＆＃39;那么这会容易得多。 regex和lookaheads，而不是reverse：

my $str =  '12365478965412365';    
my $comma_sep_str = reverse ( reverse ($str) =~ s/(\d{3})/$1,/rg );
print $comma_sep_str;

将其反转，从左到右分组，然后再将其反转。

如果你对正则表达式正在做什么有问题，那么正常的技巧就是打开use re 'debug';。

我不会重现输出，因为它很长。但正在发生的是该模式使用\b锚定在行尾。

如果你拿走g标志，你可以更清楚地看到这一点：

Compiling REx "(?<=\d)(?=(\d\d\d)+\b)"
Final program:
   1: IFMATCH[-1] (6)
   3:   POSIXU[\d] (4)
   4:   SUCCEED (0)
   5: TAIL (6)
   6: IFMATCH[0] (22)
   8:   CURLYM[1] {1,32767} (19)
  12:     POSIXU[\d] (13)
  13:     POSIXU[\d] (14)
  14:     POSIXU[\d] (17)
  17:     SUCCEED (0)
  18:   NOTHING (19)
  19:   BOUND (20)
  20:   SUCCEED (0)
  21: TAIL (22)
  22: END (0)
minlen 0 
Matching REx "(?<=\d)(?=(\d\d\d)+\b)" against "12365478965412365"
   0 <> <1236547896>         |  1:IFMATCH[-1](6)
                                  failed...
   1 <1> <2365478965>        |  1:IFMATCH[-1](6)
   0 <> <1236547896>         |  3:  POSIXU[\d](4)
   1 <1> <2365478965>        |  4:  SUCCEED(0)
                                    subpattern success...
   1 <1> <2365478965>        |  6:IFMATCH[0](22)
   1 <1> <2365478965>        |  8:  CURLYM[1] {1,32767}(19)
   1 <1> <2365478965>        | 12:    POSIXU[\d](13)
   2 <12> <3654789654>       | 13:    POSIXU[\d](14)
   3 <123> <6547896541>      | 14:    POSIXU[\d](17)
   4 <1236> <5478965412>     | 17:    SUCCEED(0)
                                      subpattern success...
                                    CURLYM now matched 1 times, len=3...
   4 <1236> <5478965412>     | 12:    POSIXU[\d](13)
   5 <12365> <4789654123>    | 13:    POSIXU[\d](14)
   6 <23654> <7896541236>    | 14:    POSIXU[\d](17)
   7 <36547> <8965412365>    | 17:    SUCCEED(0)
                                      subpattern success...
                                    CURLYM now matched 2 times, len=3...
   7 <36547> <8965412365>    | 12:    POSIXU[\d](13)
   8 <65478> <965412365>     | 13:    POSIXU[\d](14)
   9 <54789> <65412365>      | 14:    POSIXU[\d](17)
  10 <47896> <5412365>       | 17:    SUCCEED(0)
                                      subpattern success...
                                    CURLYM now matched 3 times, len=3...
  10 <47896> <5412365>       | 12:    POSIXU[\d](13)
  11 <478965> <412365>       | 13:    POSIXU[\d](14)
  12 <4789654> <12365>       | 14:    POSIXU[\d](17)
  13 <47896541> <2365>       | 17:    SUCCEED(0)
                                      subpattern success...
                                    CURLYM now matched 4 times, len=3...
  13 <47896541> <2365>       | 12:    POSIXU[\d](13)
  14 <478965412> <365>       | 13:    POSIXU[\d](14)
  15 <4789654123> <65>       | 14:    POSIXU[\d](17)
  16 <47896541236> <5>       | 17:    SUCCEED(0)
                                      subpattern success...
                                    CURLYM now matched 5 times, len=3...
  16 <47896541236> <5>       | 12:    POSIXU[\d](13)
  17 <478965412365> <>       | 13:    POSIXU[\d](14)
                                      failed...
                                    CURLYM trying tail with matches=5...
  16 <47896541236> <5>       | 19:    BOUND(20)
                                      failed...
                                    CURLYM trying tail with matches=4...
  13 <47896541> <2365>       | 19:    BOUND(20)
                                      failed...
                                    CURLYM trying tail with matches=3...
  10 <47896> <5412365>       | 19:    BOUND(20)
                                      failed...
                                    CURLYM trying tail with matches=2...
   7 <36547> <8965412365>    | 19:    BOUND(20)
                                      failed...
                                    CURLYM trying tail with matches=1...
   4 <1236> <5478965412>     | 19:    BOUND(20)
                                      failed...
                                    failed...
                                  failed...
   2 <12> <3654789654>       |  1:IFMATCH[-1](6)
   1 <1> <2365478965>        |  3:  POSIXU[\d](4)
   2 <12> <3654789654>       |  4:  SUCCEED(0)
                                    subpattern success...
   2 <12> <3654789654>       |  6:IFMATCH[0](22)
   2 <12> <3654789654>       |  8:  CURLYM[1] {1,32767}(19)
   2 <12> <3654789654>       | 12:    POSIXU[\d](13)
   3 <123> <6547896541>      | 13:    POSIXU[\d](14)
   4 <1236> <5478965412>     | 14:    POSIXU[\d](17)
   5 <12365> <4789654123>    | 17:    SUCCEED(0)
                                      subpattern success...
                                    CURLYM now matched 1 times, len=3...
   5 <12365> <4789654123>    | 12:    POSIXU[\d](13)
   6 <23654> <7896541236>    | 13:    POSIXU[\d](14)
   7 <36547> <8965412365>    | 14:    POSIXU[\d](17)
   8 <65478> <965412365>     | 17:    SUCCEED(0)
                                      subpattern success...
                                    CURLYM now matched 2 times, len=3...
   8 <65478> <965412365>     | 12:    POSIXU[\d](13)
   9 <54789> <65412365>      | 13:    POSIXU[\d](14)
  10 <47896> <5412365>       | 14:    POSIXU[\d](17)
  11 <478965> <412365>       | 17:    SUCCEED(0)
                                      subpattern success...
                                    CURLYM now matched 3 times, len=3...
  11 <478965> <412365>       | 12:    POSIXU[\d](13)
  12 <4789654> <12365>       | 13:    POSIXU[\d](14)
  13 <47896541> <2365>       | 14:    POSIXU[\d](17)
  14 <478965412> <365>       | 17:    SUCCEED(0)
                                      subpattern success...
                                    CURLYM now matched 4 times, len=3...
  14 <478965412> <365>       | 12:    POSIXU[\d](13)
  15 <4789654123> <65>       | 13:    POSIXU[\d](14)
  16 <47896541236> <5>       | 14:    POSIXU[\d](17)
  17 <478965412365> <>       | 17:    SUCCEED(0)
                                      subpattern success...
                                    CURLYM now matched 5 times, len=3...
  17 <478965412365> <>       | 12:    POSIXU[\d](13)
                                      failed...
                                    CURLYM trying tail with matches=5...
  17 <478965412365> <>       | 19:    BOUND(20)
  17 <478965412365> <>       | 20:    SUCCEED(0)
                                      subpattern success...
   2 <12> <3654789654>       | 22:END(0)
Match successful!
Freeing REx: "(?<=\d)(?=(\d\d\d)+\b)"

12,365478965412365

由于正在进行外观断言，在正则表达式的这一次迭代中有很多步骤，因为它首先匹配的是：

 (\d\d\d)+\b

由＆＃39;边界＆＃39;锚定的3个或更多个数字的1个或多个实例。但是没有，所以它只使用了行尾。

这里不清楚的是\b实际上就像它是$一样。它充当了模式右侧的锚点。您的模式必须读取那么远，然后回溯，以便它可以从右侧匹配(\d\d\d)+。没有它，你的模式不会被锚定，因此匹配任何4位数的子字符串 - 但由于它不消耗，它将匹配除最后3个之外的每个数字。（这是什么＆＃39;发生了）

如果您使用$，您的模式也会一样。希望这能让我更清楚发生什么事情？

my $str =  '12365478965412365';    
$str =~ s/(?<=\d)(?=(\d\d\d)+$)/\,/g;    
print $str;

“\ b”字边界如何影响perl中的输出？

1 个答案: