Question

我有一个逗号分隔的字符串，其字段用双引号括起来，如下所示，并且希望使用Perl和regex拆分文本。

我从第二版《 Perl Cookbook》中获得了前2个regex捕获组，我试图修改该regex以在行的开头，中间和结尾捕获NULL值。

最后应该有2个NULL值，但是我只能得到1个，因为我使用后向检查来检查是否存在前面的逗号。

是否有可能将第5个捕获组模式化为，\ s *（$）（逗号+也许是一些空格+行尾），但只能得到行尾？

#!/usr/bin/perl
use strict;
use warnings;

my $text = qq(,,," test ",ing,,"hello "", world","some "","" text w/ comma",,,end_test,,);
my @colArr;

while ( $text =~ /([^",]+)|"((?:[^"]|"")*)"|(?<=^)([,])|(?<=[,])([,])/gx ) {

    my $field = '';

    if ( defined $1 ) {
        $field = $1;
    }
    elsif ( defined $2  ) {
        ( $field = $2 ) =~ s/""/"/g;
    }

    # For $3 and 4, comma will be captured but should be treated as NULL
    # Hoping to capture End of line as $5 where exists a comma behind it

    push @colArr, $field;

}

push @colArr, '' if ( $text =~ /[,]$/ or $text eq '' ); # New: Capture final trailing NULL value

for ( my $i = 0; $i < @colArr; $i++ ) {
    print "[$i]\t: $colArr[$i]\n";
}

=pod
Expected:
[0]     :
[1]     :
[2]     :
[3]     :  test
[4]     : ing
[5]     :
[6]     : hello ", world
[7]     : some "," text w/ comma
[8]     :
[9]     :
[10]    : end_test
[11]    :
[12]    :

Actual:
[0]     :
[1]     :
[2]     :
[3]     :  test
[4]     : ing
[5]     :
[6]     : hello ", world
[7]     : some "," text w/ comma
[8]     :
[9]     :
[10]    : end_test
[11]    :
=cut

请让我知道是否有人对此有更好的想法。另外，由于用户限制，我无法安装CPAN模块。

Answer 1

很抱歉占用您的时间，但我认为我在while循环后添加了额外的推送，从而解决了自己的问题。

#!/usr/bin/perl
use strict;
use warnings;

my $text = qq(,,," test ",ing,,"hello "", world","some "","" text w/ comma",,,end_test,,);
my @colArr;

while ( $text =~ /([^",]+)|"((?:[^"]|"")*)"|(?<=^)([,])|(?<=[,])([,])/gx ) {

    my $field = '';

    if ( defined $1 ) {
        $field = $1;
    }
    elsif ( defined $2  ) {
        ( $field = $2 ) =~ s/""/"/g;
    }

    # For $3 and 4, comma will be captured but should be treated as NULL
    # Hoping to capture End of line as $5 where exists a comma behind it

    push @colArr, $field;

}

push @colArr, '' if ( $text =~ /[,]$/ or $text eq '' ); # New: Capture final trailing NULL value

for ( my $i = 0; $i < @colArr; $i++ ) {
    print "[$i]\t: $colArr[$i]\n";
}

可以将字符串结尾“ $”用作正则表达式捕获组吗？

1 个答案: