Perl - csv解析 - 当字段是动态时重新排列csv数据

时间:2014-02-19 22:09:35

标签: perl csv

使用Perl,我需要解析和重新排列具有一些动态字段(设备和相关值)的csv文件

这是原始的csv(标题仅供参考)

DISKBSIZE,sn_unknown,hostname,timestamp,origin-timestamp,sda,sda1,sda2,sda3,sdb,sdb1,sdb2,sdb3
DISKBSIZE,sn_unknown,host001,19-FEB-2014 20:55:47,T0001,0.0,0.0,0.0,0.0,18.0,0.0,18.0,0.0
DISKBSIZE,sn_unknown,host001,19-FEB-2014 20:55:49,T0002,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
DISKBSIZE,sn_unknown,host001,19-FEB-2014 20:55:51,T0003,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
DISKBSIZE,sn_unknown,host001,19-FEB-2014 20:55:53,T0004,0.0,0.0,0.0,0.0,369.8,0.0,369.8,0.0
DISKBSIZE,sn_unknown,host001,19-FEB-2014 20:55:55,T0005,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0

我需要将其转化为:

DISKBSIZE,sn_unknown,hostname,timestamp,origin-timestamp,device,value
DISKBSIZE,sn_unknown,host001,19-FEB-2014 20:55:47,T0001,sda,0.0
DISKBSIZE,sn_unknown,host001,19-FEB-2014 20:55:47,T0001,sda1,0.0

......等等

以下是基于原始数据生成csv文件的示例代码:

if (((rindex $l,"DISKBUSY,") > -1)) {
#Open destination file
if( ! open(FILE,">>".$dstfile_DISKBUSY) ) {
    exit(1);
}
(my @line) = split(",",$l);

my $section = "DISKBUSY";
my $write = $section.",".$SerialNumber.",".$hostnameT.",".
                $timestamp.",".$line[1];
my $i = 2;
while ($i <= $#line) {
    $write = $write.','.$line[$i];
  $i = $i + 1;
}
print (FILE $write."\n"); 

close( FILE );

}

我需要按照描述重新排列它,以便能够以通用方式处理数据,但动态字段(设备名称)让我发疯: - )

非常感谢您的帮助!

2 个答案:

答案 0 :(得分:2)

您可以使用Text::CSV

#!/usr/bin/perl

use strict;
use warnings;

use Text::CSV;

my $csv = Text::CSV->new({
    binary => 1,
    auto_diag => 1,
    eol => "\n"
}) or die "Cannot use CSV: " . Text::CSV->error_diag();

open my $fh, '<', 'file.csv' or die $!;

my @columns = @{ $csv->getline($fh) };
my @device_columns = @columns[5..$#columns];

my @header = (@columns[0..4], "device", "value");
$csv->print(\*STDOUT, \@header);

while (my $row = $csv->getline($fh)) {
    foreach my $i (0..$#device_columns) {
        my @output = (@$row[0..4], $device_columns[$i], $row->[5+$i]);
        $csv->print(\*STDOUT, \@output);
    }
}

close $fh;

输出:

DISKBSIZE,sn_unknown,hostname,timestamp,origin-timestamp,device,value
DISKBSIZE,sn_unknown,host001,"19-FEB-2014 20:55:47",T0001,sda,0.0
DISKBSIZE,sn_unknown,host001,"19-FEB-2014 20:55:47",T0001,sda1,0.0
DISKBSIZE,sn_unknown,host001,"19-FEB-2014 20:55:47",T0001,sda2,0.0
DISKBSIZE,sn_unknown,host001,"19-FEB-2014 20:55:47",T0001,sda3,0.0
DISKBSIZE,sn_unknown,host001,"19-FEB-2014 20:55:47",T0001,sdb,18.0
DISKBSIZE,sn_unknown,host001,"19-FEB-2014 20:55:47",T0001,sdb1,0.0
DISKBSIZE,sn_unknown,host001,"19-FEB-2014 20:55:47",T0001,sdb2,18.0
DISKBSIZE,sn_unknown,host001,"19-FEB-2014 20:55:47",T0001,sdb3,0.0

(这只是输入数据第一行的输出)

更好的解决方案

以下使用getline_hr将输入CSV中的每一行作为hashref返回,这使代码更清晰:

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;
use Text::CSV;

my $csv = Text::CSV->new({
    binary => 1,
    auto_diag => 1,
    eol => "\n"
}) or die "Cannot use CSV: " . Text::CSV->error_diag();

open my $fh, '<', 'file.csv' or die $!;

$csv->column_names($csv->getline($fh));

my @cols = ( $csv->column_names );
my @devices = splice @cols, 5;

my @header = ( @cols, "device", "value" );
$csv->print(\*STDOUT, \@header);

while (my $hr = $csv->getline_hr($fh)) {
    foreach my $device (@devices) {
        my @output = ( @$hr{@cols}, $device, $hr->{$device} );
        $csv->print(\*STDOUT, \@output);
    }
}

close $fh;

答案 1 :(得分:1)

使用Text::CSV模块。

您可以使用$csv->column_names(@column_names)指定标题名称,然后使用$csv->getline_hr将该行作为哈希引用,其中哈希引用将由列名称键入。这样可以更轻松地解析文件。

您不必使用Text::CSV来回写您的文件(虽然它确保您的文件写得正确),但您应该使用它来解析您的数据。