在Perl中读取CSV文件

时间:2011-12-20 03:37:22

标签: perl csv

我以前在Perl中读过文件,但是当CSV文件在不同的行上有我需要的值时。我假设我必须创建一个混合了哈希键的数组,但我不在这里。

基本上,我的CSV文件包含以下列:branch, job, timePeriod, periodType, day1Value, day2Value, day3Value, day4Value, day4Value, day6Valueday7Value

day *值分别代表一周中每一天的periodType值。

例如 -

East,Banker,9AM-12PM,Overtime,4.25,0,0,1.25,1.5,1.5,0,0
West,Electrician,12PM-5PM,Regular,4.25,0,0,-1.25,-1.5,-1.5,0,0
North,Janitor,5PM-12AM,Variance,-4.25,0,0,-1.25,-1.5,-1.5,0,0
South,Manager,12A-9AM,Overtime,77.75,14.75,10,10,10,10,10,

我需要输出一个文件,该文件从branch,job,timePeriod和day获取此数据和键。我的输出将列出一个特定日期的每个periodType值,而不是所有七个的一个periodType值。

例如 -

South,Manager,12A-9AM,77.75,14.75,16

在上面一行中,最后3个值代表三个periodTypes(加班,常规和差异)day1Values

正如您所看到的,我的问题是我不知道如何以允许我从不同行拉取数据并成功输出数据的方式将数据加载到内存中。我之前只解析过奇异的线条。

2 个答案:

答案 0 :(得分:15)

除非您喜欢疼痛,否则请使用Text::CSV及其亲属Text::CSV_XSText::CSV_PP

然而,这可能是这个问题更容易的部分。一旦您阅读并验证了该行已完成,您需要将相关信息添加到正确键入的哈希值。你可能也必须非常熟悉参考文献。

您可以创建由分支键入的哈希%BranchData。该哈希的每个元素都是对作业键入的哈希的引用;并且其中的每个元素都是对由timePeriod键入的哈希的引用,并且其中的每个元素都将引用由日期数字键入的数组(使用索引1..7;它稍微分配空间,但是获得的机会很多它的权利要大得多;不要混淆$[!)。并且数组的每个元素都是对由三个句点类型键入的哈希的引用。哎哟!

如果一切正常,原型分配可能类似于:

$BranchData{$row{branch}}->{$row{job}}->{$row{period}}->[1]->{$row{p_type}} +=
    $row{day1};

你将迭代元素1..7和'day1'..'day7';那里有一些清理工作的设计工作。

你必须担心正确地初始化东西(或者你可能没有--Perl会为你做这件事)。我假设该行作为直接散列(而不是散列引用)返回,包含分支,作业,句点,句点类型(p_type)和每天('day1',...的键)。 '第7天')。

如果你事先知道你需要哪一天,你可以避免累积所有的日子,但它可以使更通用的报告更容易阅读和累积所有数据,然后简单地打印处理任何子集的需要处理整个数据。


这是一个很有趣的问题,我将这段代码整合在一起。我怀疑它是否是最佳的,但确实有效。

#!/usr/bin/env perl
#
# SO 8570488

use strict;
use warnings;
use Text::CSV;
use Data::Dumper;
use constant debug => 0;

my $file = "input.csv";
my $csv = Text::CSV->new({ binary => 1, eol => $/ })
                   or die "Cannot use CSV: ".Text::CSV->error_diag();
my @headings = qw( branch job period p_type day1 day2 day3 day4 day5 day6 day7 );
my @days     = qw( day0 day1 day2 day3 day4 day5 day6 day7 );
my %BranchData;

open my $in, '<', $file or die "Unable to open $file for reading ($!)";

$csv->column_names(@headings);
while (my $row = $csv->getline_hr($in))
{
    print Dumper($row) if debug;
    my %r = %$row;  # Not for efficiency; for notational compactness
    $BranchData{$r{branch}} = { } if !defined $BranchData{$r{branch}};
    my $branch = $BranchData{$r{branch}};
    $branch->{$r{job}} = { } if !defined $branch->{$r{job}};
    my $job = $branch->{$r{job}};
    $job->{$r{period}} = [ ] if !defined $job->{$r{period}};
    my $period = $job->{$r{period}};
    for my $day (1..7)
    {
        # Assume that Overtime, Regular and Variance are the only types
        # Otherwise, you need yet another level of checking whether elements exist...
        $period->[$day] = { Overtime => 0, Regular => 0, Variance => 0} if !defined $period->[$day];
        $period->[$day]->{$r{p_type}} += $r{$days[$day]};
    }
}

print Dumper(\%BranchData);

根据您的样本数据,此输出为:

$VAR1 = {
    'West' => {
        'Electrician' => {
            '12PM-5PM' => [
                undef,
                {
                    'Regular'  => '4.25',
                    'Overtime' => 0,
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 0,
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 0,
                    'Variance' => 0
                },
                {
                    'Regular'  => '-1.25',
                    'Overtime' => 0,
                    'Variance' => 0
                },
                {
                    'Regular'  => '-1.5',
                    'Overtime' => 0,
                    'Variance' => 0
                },
                {
                    'Regular'  => '-1.5',
                    'Overtime' => 0,
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 0,
                    'Variance' => 0
                }
            ]
        }
    },
    'South' => {
        'Manager' => {
            '12A-9AM' => [
                undef,
                {
                    'Regular'  => 0,
                    'Overtime' => '77.75',
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => '14.75',
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 10,
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 10,
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 10,
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 10,
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 10,
                    'Variance' => 0
                }
            ]
        }
    },
    'North' => {
        'Janitor' => {
            '5PM-12AM' => [
                undef,
                {
                    'Regular'  => 0,
                    'Overtime' => 0,
                    'Variance' => '-4.25'
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 0,
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 0,
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 0,
                    'Variance' => '-1.25'
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 0,
                    'Variance' => '-1.5'
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 0,
                    'Variance' => '-1.5'
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 0,
                    'Variance' => 0
                }
            ]
        }
    },
    'East' => {
        'Banker' => {
            '9AM-12PM' => [
                undef,
                {
                    'Regular'  => 0,
                    'Overtime' => '4.25',
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 0,
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 0,
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => '1.25',
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => '1.5',
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => '1.5',
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 0,
                    'Variance' => 0
                }
            ]
        }
    }
};

从这里开心吧!

答案 1 :(得分:4)

我没有第一手经验,但您可以使用DBD::CSV然后传递计算所需聚合所需的相对简​​单的SQL查询。

但是,如果你坚持不懈地努力,你可以循环并在以下散列引用哈希中收集数据:

(
  "branch1,job1,timeperiod1"=>
    {
      "overtime"=>"overtimeday1value1",
      "regular"=>"regulartimeday1value1",
      "variance"=>"variancetimeday1value1"
    },
  "branch2,job2,timeperiod2"=>
    {
      "overtime"=>"overtimeday1value2",
      "regular"=>"regulartimeday1value2",
      "variance"=>"variancetimeday1value2"
    },
  #etc
);

然后只需相应地循环键。但是,这种方法依赖于密钥的一致格式(例如"East,Banker,9AM-12PM""East, Banker, 9AM-12PM"不同),因此您必须在制作时检查一致的格式(并强制执行)上面的哈希。