Perl - 从CSV文件中读取特定行

时间:2014-06-04 15:05:30

标签: perl csv text-parsing

我希望从.csv文件中读取某个“类别”,看起来像这样:

Category 1, header1, header2, header3,...,
          , data, data, data,...,
          , data, data, data,...,
          , data, data, data,...,
Category 2, header1, header2, header3,...,
          , data, data, data,...,
          , data, data, data,...,
          , data, data, data,...,
Category 3, header1, header2, header3,...,
          , data, data, data,...,
          , data, data, data,...,
          , data, data, data,...

假设我想只打印特定“类别”的数据......我将如何进行此操作?

ie:我想打印第2类数据,输出应该如下:

Category 2, header1, header2, header3,...,
          , data, data, data,...,
          , data, data, data,...,
          , data, data, data,...

2 个答案:

答案 0 :(得分:1)

除非您的数据包含引用字段,例如a,b,c,"complicated field, quoted",e,f,g,否则使用Text::CSV优于简单split /,/

此示例将数据分类为您可以简单直接访问的哈希。我只使用Data::Dump来显示结果数据结构的内容。

use strict;
use warnings;
use autodie;

open my $fh, '<', 'mydata.csv';

my $category;
my %data;
while (<$fh>) {
  chomp;
  my @data = split /,/;
  my $cat = shift @data;
  $category = $cat if $cat =~ /\S/;
  push @{ $data{$category} }, \@data;
}

use Data::Dumper;
$Data::Dumper::Useqq = 1;
print Dumper \%data;

<强>输出

{
  "Category 1" => [
                    [" header1", " header2", " header3", "..."],
                    [" data", " data", " data", "..."],
                    [" data", " data", " data", "..."],
                    [" data", " data", " data", "..."],
                  ],
  "Category 2" => [
                    [" header1", " header2", " header3", "..."],
                    [" data", " data", " data", "..."],
                    [" data", " data", " data", "..."],
                    [" data", " data", " data", "..."],
                  ],
  "Category 3" => [
                    [" header1", " header2", " header3", "..."],
                    [" data", " data", " data", "..."],
                    [" data", " data", " data", "..."],
                    [" data", " data", " data", "..."],
                  ],
}

<强>更新

如果您只想分离文件的给定部分,则无需将其放入哈希。这个程序会做你想要的。

#!/usr/bin/perl

use strict;
use warnings;
use autodie;

my ($file, $wanted) = @ARGV;

open my $fh, '<', $file;

my $category;

while (<$fh>) {
  my ($cat) = /\A([^,]*)/;
  $category = $cat if $cat =~ /\S/;
  print if $category eq $wanted;
}

在命令行上以这样的方式运行

get_category.pl mydata.csv 'Category 2' > cat2.csv

<强>输出

Category 2, header1, header2, header3,...,
          , data, data, data,...,
          , data, data, data,...,
          , data, data, data,...

答案 1 :(得分:0)

如果输出肯定是你想要的,那么你可以使用perl one-liner进行此操作:

perl -ne "$p = 0 if /^Category/;$p = 1 if /^Category 2/;print if $p;" myfile.csv