Perl正则表达式保持匹配子字符串的有效方法?

时间:2016-01-18 21:37:48

标签: regex perl

我试图清理csv中的一些草率地址字段。

start_of_lineA,="123456789",end_of_lineA
start_of_lineB,="234560000",end_of_lineB
start_of_lineC,34567,end_of_lineC

我正在清理:

start_of_lineA,12345,end_of_lineA
start_of_lineB,23456,end_of_lineB
start_of_lineC,34567,end_of_lineC

一些包含逗号的街道地址条目我可以放弃:

start_of_lineD,"123 Foo St, #1",End_of_lineD
start_of_lineE,"456 Bar Lane, suite A, B",End_of_lineE

为:

start_of_lineD,"123 Foo St",End_of_lineD
start_of_lineE,"456 Bar Lane",End_of_lineE

到目前为止,我提出的是:

  chomp;
  if($_ =~ m/="/)
  {
    $_ =~ s/="\d{5}\K\d*"//g;
    $_ =~ s/="//g;
  }
  if($_ =~ m/"[^"|^,]+,[^"]*"/)
  {
    $_ =~ s/"[^"|^,]+\K,[^"]*"//g;
    $_=~ s/"//g;
  }
  @line = split(/,/,$_);
  etc.

虽然它有效,但似乎不够优雅。有更清洁的方式吗?

1 个答案:

答案 0 :(得分:1)

嗯,对于初学者:

$_ =~

通常是多余的。

否则 - 使用Text::CSV并解析它:

my $csv = Text::CSV -> new (); 
while ( my $row = $csv -> getline ( $filehandle ) ) { 
    $row -> [1] =~ s/=\"(\d+)\"/$1/;
    $row -> [1] =~ s/,//g; 
    $csv -> print ( \*STDOUT, $row ); 
}