我想修改一个字符串。我的正则表达式应修改数字
12365478965412365
将,
放入3个集合中。将数字转换为3个集合,使输出看起来像
12,365,478,965,412,365
我们可以使用前瞻和后视来实现这个目标
s/(?<=\d)(?=(\d\d\d)+\b)/\,/g
但是当我删除\b
s/(?<=\d)(?=(\d\d\d)+)/\,/g
我输出为
1,2,3,6,5,4,7,8,9,6,5,4,1,2,365.
\b
如何影响后面的位置以应用“,”?
regex是否会在测试背后的字边界结束之前进行测试?
答案 0 :(得分:4)
\b
的作用与单词之间的边界相匹配。否则为零宽度。来自perlre
:
单词边界(
\b
)是两个字符之间的一个点,其一侧有\w
,另一侧有\W
(按任意顺序) ,将字符串开头和结尾的虚数字符计算为匹配\W
。
您尝试做的事情的问题在于,逗号的定位是从右到左的操作 - 您不知道它应该是10,000或100,000,直到您&#39 ;已经看到了字符串中的总位数。
所以我建议如果你不做直接&#39;那么这会容易得多。 regex和lookaheads,而不是reverse
:
my $str = '12365478965412365';
my $comma_sep_str = reverse ( reverse ($str) =~ s/(\d{3})/$1,/rg );
print $comma_sep_str;
将其反转,从左到右分组,然后再将其反转。
如果你对正则表达式正在做什么有问题,那么正常的技巧就是打开use re 'debug';
。
我不会重现输出,因为它很长。但正在发生的是该模式使用\b
锚定在行尾。
如果你拿走g
标志,你可以更清楚地看到这一点:
Compiling REx "(?<=\d)(?=(\d\d\d)+\b)"
Final program:
1: IFMATCH[-1] (6)
3: POSIXU[\d] (4)
4: SUCCEED (0)
5: TAIL (6)
6: IFMATCH[0] (22)
8: CURLYM[1] {1,32767} (19)
12: POSIXU[\d] (13)
13: POSIXU[\d] (14)
14: POSIXU[\d] (17)
17: SUCCEED (0)
18: NOTHING (19)
19: BOUND (20)
20: SUCCEED (0)
21: TAIL (22)
22: END (0)
minlen 0
Matching REx "(?<=\d)(?=(\d\d\d)+\b)" against "12365478965412365"
0 <> <1236547896> | 1:IFMATCH[-1](6)
failed...
1 <1> <2365478965> | 1:IFMATCH[-1](6)
0 <> <1236547896> | 3: POSIXU[\d](4)
1 <1> <2365478965> | 4: SUCCEED(0)
subpattern success...
1 <1> <2365478965> | 6:IFMATCH[0](22)
1 <1> <2365478965> | 8: CURLYM[1] {1,32767}(19)
1 <1> <2365478965> | 12: POSIXU[\d](13)
2 <12> <3654789654> | 13: POSIXU[\d](14)
3 <123> <6547896541> | 14: POSIXU[\d](17)
4 <1236> <5478965412> | 17: SUCCEED(0)
subpattern success...
CURLYM now matched 1 times, len=3...
4 <1236> <5478965412> | 12: POSIXU[\d](13)
5 <12365> <4789654123> | 13: POSIXU[\d](14)
6 <23654> <7896541236> | 14: POSIXU[\d](17)
7 <36547> <8965412365> | 17: SUCCEED(0)
subpattern success...
CURLYM now matched 2 times, len=3...
7 <36547> <8965412365> | 12: POSIXU[\d](13)
8 <65478> <965412365> | 13: POSIXU[\d](14)
9 <54789> <65412365> | 14: POSIXU[\d](17)
10 <47896> <5412365> | 17: SUCCEED(0)
subpattern success...
CURLYM now matched 3 times, len=3...
10 <47896> <5412365> | 12: POSIXU[\d](13)
11 <478965> <412365> | 13: POSIXU[\d](14)
12 <4789654> <12365> | 14: POSIXU[\d](17)
13 <47896541> <2365> | 17: SUCCEED(0)
subpattern success...
CURLYM now matched 4 times, len=3...
13 <47896541> <2365> | 12: POSIXU[\d](13)
14 <478965412> <365> | 13: POSIXU[\d](14)
15 <4789654123> <65> | 14: POSIXU[\d](17)
16 <47896541236> <5> | 17: SUCCEED(0)
subpattern success...
CURLYM now matched 5 times, len=3...
16 <47896541236> <5> | 12: POSIXU[\d](13)
17 <478965412365> <> | 13: POSIXU[\d](14)
failed...
CURLYM trying tail with matches=5...
16 <47896541236> <5> | 19: BOUND(20)
failed...
CURLYM trying tail with matches=4...
13 <47896541> <2365> | 19: BOUND(20)
failed...
CURLYM trying tail with matches=3...
10 <47896> <5412365> | 19: BOUND(20)
failed...
CURLYM trying tail with matches=2...
7 <36547> <8965412365> | 19: BOUND(20)
failed...
CURLYM trying tail with matches=1...
4 <1236> <5478965412> | 19: BOUND(20)
failed...
failed...
failed...
2 <12> <3654789654> | 1:IFMATCH[-1](6)
1 <1> <2365478965> | 3: POSIXU[\d](4)
2 <12> <3654789654> | 4: SUCCEED(0)
subpattern success...
2 <12> <3654789654> | 6:IFMATCH[0](22)
2 <12> <3654789654> | 8: CURLYM[1] {1,32767}(19)
2 <12> <3654789654> | 12: POSIXU[\d](13)
3 <123> <6547896541> | 13: POSIXU[\d](14)
4 <1236> <5478965412> | 14: POSIXU[\d](17)
5 <12365> <4789654123> | 17: SUCCEED(0)
subpattern success...
CURLYM now matched 1 times, len=3...
5 <12365> <4789654123> | 12: POSIXU[\d](13)
6 <23654> <7896541236> | 13: POSIXU[\d](14)
7 <36547> <8965412365> | 14: POSIXU[\d](17)
8 <65478> <965412365> | 17: SUCCEED(0)
subpattern success...
CURLYM now matched 2 times, len=3...
8 <65478> <965412365> | 12: POSIXU[\d](13)
9 <54789> <65412365> | 13: POSIXU[\d](14)
10 <47896> <5412365> | 14: POSIXU[\d](17)
11 <478965> <412365> | 17: SUCCEED(0)
subpattern success...
CURLYM now matched 3 times, len=3...
11 <478965> <412365> | 12: POSIXU[\d](13)
12 <4789654> <12365> | 13: POSIXU[\d](14)
13 <47896541> <2365> | 14: POSIXU[\d](17)
14 <478965412> <365> | 17: SUCCEED(0)
subpattern success...
CURLYM now matched 4 times, len=3...
14 <478965412> <365> | 12: POSIXU[\d](13)
15 <4789654123> <65> | 13: POSIXU[\d](14)
16 <47896541236> <5> | 14: POSIXU[\d](17)
17 <478965412365> <> | 17: SUCCEED(0)
subpattern success...
CURLYM now matched 5 times, len=3...
17 <478965412365> <> | 12: POSIXU[\d](13)
failed...
CURLYM trying tail with matches=5...
17 <478965412365> <> | 19: BOUND(20)
17 <478965412365> <> | 20: SUCCEED(0)
subpattern success...
2 <12> <3654789654> | 22:END(0)
Match successful!
Freeing REx: "(?<=\d)(?=(\d\d\d)+\b)"
12,365478965412365
由于正在进行外观断言,在正则表达式的这一次迭代中有很多步骤,因为它首先匹配的是:
(\d\d\d)+\b
由&#39;边界&#39;锚定的3个或更多个数字的1个或多个实例。但是没有,所以它只使用了行尾。
这里不清楚的是\b
实际上就像它是$
一样。它充当了模式右侧的锚点。您的模式必须读取那么远,然后回溯,以便它可以从右侧匹配(\d\d\d)+
。没有它,你的模式不会被锚定,因此匹配任何4位数的子字符串 - 但由于它不消耗,它将匹配除最后3个之外的每个数字。(这是什么&#39;发生了)
如果您使用$
,您的模式也会一样。希望这能让我更清楚发生什么事情?
my $str = '12365478965412365';
$str =~ s/(?<=\d)(?=(\d\d\d)+$)/\,/g;
print $str;