我有一个逗号分隔的字符串,其字段用双引号括起来,如下所示,并且希望使用Perl和regex拆分文本。
我从第二版《 Perl Cookbook》中获得了前2个regex捕获组,我试图修改该regex以在行的开头,中间和结尾捕获NULL值。
最后应该有2个NULL值,但是我只能得到1个,因为我使用后向检查来检查是否存在前面的逗号。
是否有可能将第5个捕获组模式化为,\ s *($)(逗号+也许是一些空格+行尾),但只能得到行尾?
#!/usr/bin/perl
use strict;
use warnings;
my $text = qq(,,," test ",ing,,"hello "", world","some "","" text w/ comma",,,end_test,,);
my @colArr;
while ( $text =~ /([^",]+)|"((?:[^"]|"")*)"|(?<=^)([,])|(?<=[,])([,])/gx ) {
my $field = '';
if ( defined $1 ) {
$field = $1;
}
elsif ( defined $2 ) {
( $field = $2 ) =~ s/""/"/g;
}
# For $3 and 4, comma will be captured but should be treated as NULL
# Hoping to capture End of line as $5 where exists a comma behind it
push @colArr, $field;
}
push @colArr, '' if ( $text =~ /[,]$/ or $text eq '' ); # New: Capture final trailing NULL value
for ( my $i = 0; $i < @colArr; $i++ ) {
print "[$i]\t: $colArr[$i]\n";
}
=pod
Expected:
[0] :
[1] :
[2] :
[3] : test
[4] : ing
[5] :
[6] : hello ", world
[7] : some "," text w/ comma
[8] :
[9] :
[10] : end_test
[11] :
[12] :
Actual:
[0] :
[1] :
[2] :
[3] : test
[4] : ing
[5] :
[6] : hello ", world
[7] : some "," text w/ comma
[8] :
[9] :
[10] : end_test
[11] :
=cut
请让我知道是否有人对此有更好的想法。 另外,由于用户限制,我无法安装CPAN模块。
答案 0 :(得分:-1)
很抱歉占用您的时间,但我认为我在while循环后添加了额外的推送,从而解决了自己的问题。
#!/usr/bin/perl
use strict;
use warnings;
my $text = qq(,,," test ",ing,,"hello "", world","some "","" text w/ comma",,,end_test,,);
my @colArr;
while ( $text =~ /([^",]+)|"((?:[^"]|"")*)"|(?<=^)([,])|(?<=[,])([,])/gx ) {
my $field = '';
if ( defined $1 ) {
$field = $1;
}
elsif ( defined $2 ) {
( $field = $2 ) =~ s/""/"/g;
}
# For $3 and 4, comma will be captured but should be treated as NULL
# Hoping to capture End of line as $5 where exists a comma behind it
push @colArr, $field;
}
push @colArr, '' if ( $text =~ /[,]$/ or $text eq '' ); # New: Capture final trailing NULL value
for ( my $i = 0; $i < @colArr; $i++ ) {
print "[$i]\t: $colArr[$i]\n";
}