正则表达式替换在perl中是如何工作的?

时间:2017-09-13 09:57:10

标签: regex perl

我在删除结果"a","b","b","a","c"后尝试删除字符串"a","b","c",中的重复项。我已经实现了这一点,但我对正则表达式替换的工作存在疑问

use warnings;
use strict;
my $s = q+"a","b","b","a","c"+;

 $s=~s/ ("\w"),? / ($s=~s|($1)||g)?"$1,":"" /xge;
#^                   ^
#|                   Consider this as s2
#Consider this as s1

print "\n$s\n\n";

s1值包含字符串"a","b","b","a","c"

第1步

替换后:

猜猜,数据包含来自以下s1"a","b","b","c""a","b","b","a","c"数据的,"b","b",,"c"变量。?

我已经使用eval分组运行正则表达式

$s=~s/ ("\w"),? (?{print "$s\n"})/ ($s=~s|($1)||g)?"$1,":"" /xge;

结果是

"a","b","b","a","c"
,"b","b",,"c"  #This is from after substitution
,,,,"c"
,,,,"c"
,,,,"c"

现在我的dobut是s2变量$s为什么它不与s1连接,这意味着在第二步结果应该是"a","b","b","c"(所有字符串) "a"替换为空,a中添加了$s

被修改

评估分组的结果是(?{print $s})

"a","b","b","a","c"
,"b","b",,"c" 
,,,,"c"
,,,,"c"
,,,,"c"

在替换行之后,我打印了$s变量,它正在给出"a","b","c",这个输出是如何产生的。?

2 个答案:

答案 0 :(得分:6)

正则表达式(在我看来)是在这里使用的错误工具。我会

  • split逗号
  • 上的字符串
  • split
  • 返回的列表中删除重复项
  • join列表返回字符串

像这样:

#!/usr/bin/perl

use strict;
use warnings;
use feature 'say';

my $str = q["a","b","b","a","c"];

my %seen;

$str = join ',',
       grep { ! $seen{$_}++ }
       split /,/, $str;

say $str;

答案 1 :(得分:2)

对此的正确解决方案是拆分,过滤,重新加入,如@Dave Cross已经证明的那样。

...

然而,以下正则表达式解决方案确实有效,并且有希望证明Dave的解决方案优越性

#!/usr/bin/env perl

use v5.10;
use strict;
use warnings;

my $str = q{"a","b","b","a","c"};

1 while $str =~ s{
    \A
    (?: (?&element) , )*
    ( (?&element) )           # Capture in \1
    (?: , (?&element) )*
    \K
    ,
    \1                        # Remove the duplicate along with preceding comma
    (?= \z | , )

    (?(DEFINE)
        (?<element>
            "
            \w
            "
        )
    )
}{}xg;

say $str;

输出:

"a","b","c"