使用不同长度的行解析CSV文件

时间:2012-04-22 20:48:22

标签: parsing csv

我正在调用一个webservice,它返回一个逗号分隔的数据集,其中包含不同的列和多个文本限定的行(第一行表示列名)。我需要将每行插入数据库,同时连接变化的行。

返回数据

"Email Address","First Name","Last Name", "State","Training","Suppression","Events","MEMBER_RATING","OPTIN_TIME","CLEAN_CAMPAIGN_ID"

"scott@example.com","Scott","Staph","NY","Campaigns and activism","Social Media","Fundraiser",1,"2012-03-08 17:17:42","Training"

StateMember_Rating之间最多可以有60列,这些字段中的数据将连接并插入到一个数据库列中。列表中的前四个字段和后三个字段将始终相同。我不确定解决这个问题的最佳方法。

1 个答案:

答案 0 :(得分:1)

我不确定这种解决方案是否符合您的需求。希望如此。这是一个perl脚本,与-连接,用空格包围所有字段,但前四个和后三个。它使用非标准模块Text::CSV_XS,必须使用CPAN或类似工具进行安装。

infile的内容:

"Email Address","First Name","Last Name","State","Training","Suppression","Events","MEMBER_RATING","OPTIN_TIME","CLEAN_CAMPAIGN_ID"
"scott@example.com","Scott","Staph","NY","Campaigns and activism","Social Media","Fundraiser",1,"2012-03-08 17:17:42","Training"

script.pl的内容:

use warnings;
use strict;
use Text::CSV_XS;

my $csv = Text::CSV_XS->new({ 
        allow_whitespace => 1,
});

open my $fh, q[<], $ARGV[0] or die qq[Open: $!\n];

while ( my $row = $csv->getline( $fh ) ) {
        my $concat = join q[ - ], (@$row)[4 .. @$row-4];
        splice @$row, 4, scalar @$row - (3 + 4), $concat;
        $csv->print( \*STDOUT, $row );
        print qq[\n];
}

像以下一样运行:

perl script.pl infile

使用以下输出:

"Email Address","First Name","Last Name",State,"Training - Suppression - Events",MEMBER_RATING,OPTIN_TIME,CLEAN_CAMPAIGN_ID
scott@example.com,Scott,Staph,NY,"Campaigns and activism - Social Media - Fundraiser",1,"2012-03-08 17:17:42",Training