我试图清理csv中的一些草率地址字段。
start_of_lineA,="123456789",end_of_lineA
start_of_lineB,="234560000",end_of_lineB
start_of_lineC,34567,end_of_lineC
我正在清理:
start_of_lineA,12345,end_of_lineA
start_of_lineB,23456,end_of_lineB
start_of_lineC,34567,end_of_lineC
一些包含逗号的街道地址条目我可以放弃:
start_of_lineD,"123 Foo St, #1",End_of_lineD
start_of_lineE,"456 Bar Lane, suite A, B",End_of_lineE
为:
start_of_lineD,"123 Foo St",End_of_lineD
start_of_lineE,"456 Bar Lane",End_of_lineE
到目前为止,我提出的是:
chomp;
if($_ =~ m/="/)
{
$_ =~ s/="\d{5}\K\d*"//g;
$_ =~ s/="//g;
}
if($_ =~ m/"[^"|^,]+,[^"]*"/)
{
$_ =~ s/"[^"|^,]+\K,[^"]*"//g;
$_=~ s/"//g;
}
@line = split(/,/,$_);
etc.
虽然它有效,但似乎不够优雅。有更清洁的方式吗?
答案 0 :(得分:1)
嗯,对于初学者:
$_ =~
通常是多余的。
否则 - 使用Text::CSV
并解析它:
my $csv = Text::CSV -> new ();
while ( my $row = $csv -> getline ( $filehandle ) ) {
$row -> [1] =~ s/=\"(\d+)\"/$1/;
$row -> [1] =~ s/,//g;
$csv -> print ( \*STDOUT, $row );
}