Spreadsheet :: ParseExcel构建数组或哈希

时间:2015-10-30 01:09:44

标签: perl

我是Spreadsheet::ParseExcel的新手。我有一个以空格分隔的文件,我在Microsoft Excel中打开并将其保存为XLS文件。

我安装了Spreadsheet::ParseExcel并使用文档中的示例代码来打印文件的内容。我的目标是构建一些要写入数据库的数据数组。我只需要一点帮助来构建数组 - 写入数据库我稍后会弄清楚。

我很难理解这个模块 - 我确实阅读了文档,但由于我的经验不足,我无法理解它。

以下是我用于输出的代码。

#!/usr/bin/perl

use warnings;
use strict;

use Data::Dumper;
use Spreadsheet::ParseExcel;

my $parser   = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse( 'file.xls' );

if ( !defined $workbook ) {
    die $parser->error(), ".\n";
}

for my $worksheet ( $workbook->worksheets() ) {

    my ( $row_min, $row_max ) = $worksheet->row_range();
    my ( $col_min, $col_max ) = $worksheet->col_range();

    for my $row ( $row_min .. $row_max ) {

        for my $col ( $col_min .. $col_max ) {

            my $cell = $worksheet->get_cell( $row, $col );
            next unless $cell;

            print "Row, Col    = ($row, $col)\n";
            print "Value       = ", $cell->value(),       "\n";
            print "Unformatted = ", $cell->unformatted(), "\n";
            print "\n";
        }
    }
}

这是一些输出

Row, Col    = (0, 0)
Value       = NewRecordFlag
Unformatted = NewRecordFlag

Row, Col    = (0, 1)
Value       = AgencyName
Unformatted = AgencyName

Row, Col    = (0, 2)
Value       = CredentialIdnt
Unformatted = CredentialIdnt

Row, Col    = (0, 3)
Value       = ContactIdnt
Unformatted = ContactIdnt

Row, Col    = (0, 4)
Value       = AgencyRegistryCardNumber
Unformatted = AgencyRegistryCardNumber

Row, Col    = (0, 5)
Value       = Description
Unformatted = Description

Row, Col    = (0, 6)
Value       = CredentialStatusDescription
Unformatted = CredentialStatusDescription

Row, Col    = (0, 7)
Value       = CredentialStatusDate
Unformatted = CredentialStatusDate

Row, Col    = (0, 8)
Value       = CredentialIssuedDate
Unformatted = CredentialIssuedDate

我的目标是构建一个CredentialIssuedDateAgencyRegistryCardNumberAgencyName的数组。一旦掌握了这样做的概念,我就可以通过这个伟大的模块去城里。

2 个答案:

答案 0 :(得分:1)

这是一个应该适合您的事情的快速示例。它为每个工作表构建所需三个字段值的数组@rows,并使用Data::Dumper显示每个结果。我无法测试它,但它看起来正确并且编译

首先构建一个散列%headers,它根据每个工作表中的第一行将列标题字符串与列号相关联。

然后处理第二行,提取@wanted数组中命名的列中的单元格,并将它们的值放在数组@row中,该数组被推送到@rows as每个人都积累了

#!/usr/bin/perl

use strict;
use warnings;

use Spreadsheet::ParseExcel;
use Data::Dumper;

my @wanted = qw/
    CredentialIssuedDate
    AgencyRegistryCardNumber
    AgencyName
/;

my $parser   = Spreadsheet::ParseExcel->new;
my $workbook = $parser->parse('file.xls');

if ( not defined $workbook ) {
    die $parser->error, ".\n";
}

for my $worksheet ( $workbook->worksheets ) {

    my ( $row_min, $row_max ) = $worksheet->row_range;
    my ( $col_min, $col_max ) = $worksheet->col_range;

    my %headers;

    for my $col ( $col_min, $col_max ) {
        my $header = $worksheet->get_cell($row_min, $col)->value;
        $headers{$header} = $col;
    }

    my @rows;

    for my $row ( $row_min + 1 .. $row_max ) {

        my @row;

        for my $name ( @wanted ) {
            my $col = $headers{$name};
            my $cell = $worksheet->get_cell($row, $col);
            push @row, $cell ? $cell->value : "";
        }

        push @rows, \@row;
    }

    print Dumper \@rows;
}

答案 1 :(得分:0)

我可以使用Spreadsheet::BasicReadNamedCol模块

来解决这个问题
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
use Spreadsheet::BasicReadNamedCol;

my $xlsFileName = 'shit.xls';
my @columnHeadings = (
'AgencyName',
'eMail',
'PhysicalAddress1',
'PhysicalAddress2'
);

my $ss = new Spreadsheet::BasicReadNamedCol($xlsFileName) ||
die "Could not open '$xlsFileName': $!";
$ss->setColumns(@columnHeadings);

# Print each row of the spreadsheet in the order defined in
# the columnHeadings array
my $row = 0;
while (my $data = $ss->getNextRow())
{
   $row++;
   print join('|', $row, @$data), "\n";
}