使用Spreadsheet :: ParseExcel在XLS文件中获取零个单元格

时间:2013-04-09 16:07:54

标签: perl parsing xls

首先参考Stack Overflow问题 Using Spreadsheet::WriteExcel

数据结构与以下内容相似:

      col1      col2    col3   col4   col5
row1  School    1
row2  Dean      John
row3  No.stu.   55
row4  some irrelevant stuff
row5  School2   2
row6  Dean      Tony
row7  No. stu.  60
row8  some irrelevant stuff
row9  School    3
row10 Dean      James
row11 No.stu.   56
row12 No. teacher 20
row13 School    4
row14 Dean      Tom
row15 No.stu.   79
row16 No. teacher 21
row17 course
row18           math    2
row19           eng     4
row20 teacher   name    age   gender   race
row21           Jane    20    female   white
row22 student   name    Lee
row23           SAT     1434
row24           gender  male

我想要实现的输出是:

      col1  col2  col3     col4          col5         col6          col7        col8       col9
row1 School Dean No.stu.  No. teacher  course_math  course_eng  teacher_name  teacher_age  teacher_gender    teacher_race    student_name   student_SAT   student_gender
row2 1      John  55
row3 2      Tony  60
row4 3      James 56       20
row5 4      Tome  79       21              2           4            Jane        20          female                white         Lee         1434          male

感谢gangabass,我给出的代码是:

use strict;
use warnings;

use Spreadsheet::ParseExcel;
use FindBin qw($Bin);

my ($infile) = @ARGV;

my $parser   = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse("$Bin/Test.xls");
die $parser->error unless defined $workbook;
my ($worksheet) = $workbook->worksheets();

my %data;
my $row    = 0;
my $school = "";
while (1) {
    my $cell = $worksheet->get_cell($row, 0);
    last unless defined($cell);

    my $key = $cell->value();
    my $value = $worksheet->get_cell($row++, 1)->value();

    if ($key eq "School") {

        $school = $value;
        next;
    }

    $data{$school}->{$key} = $value;
}
sleep 1;

我从解析row17-row19开始。我遇到的第一个问题是(row17,col3)中的空单元格。代码在到达此处时会出错。知道Excel单元格区分'空'和'空',我可以通过将原始XLS文件的单元格格式设置为除“常规”之外的其他格式来解决它。但是,这只是一个临时解决方案。我想知道是否有任何命令我可以用来获取空单元格。我已经尝试unformatted()添加:

my $unformattedvalue = $worksheet->get_cell( $row++, 1 )->unformatted();
然而,它并没有成功。

然后,我尝试使用以下代码在“课程”状态下指定数据结构:

my %data;
my $row    = 0;
my $school = "";
my $course = ""; #Initial value for the state of course
while (1) {
    my $cell = $worksheet->get_cell($row, 0);
    last unless defined($cell);

    my $key = $cell->value();
    my $value  = $worksheet->get_cell( $row++, 1 )->value();
    my $value1 = $worksheet->get_cell( $row++, 2 )->value(); #Fetching the value in column 3

    if ($key eq "School") {

        $school = $value;
        next;
    }

    if ($key eq "course") { #Just mimicking the how we construct the structure of 'School'

         $course = $value1;
         next;
    }

    $data{$school}->{$key} = $value;  #Must be something wrong here, but can not figure out
}

代码未通过并提供Can't call method "value" on an undefined value at xxx line of 'my $value1 = $worksheet->get_cell( $row++, 2 )->value()';

简而言之,我的问题是:

  1. 一般情况下,如何在不干预过程的情况下在XLS中获取空单元格?
  2. 如何使用Spreadsheet :: ParseExcel

    解析如下所示的结构
    row17 course
    row18           math    2
    row19           eng     4
    

2 个答案:

答案 0 :(得分:2)

你不应该做$row++

请尝试使用此代码作为起点:

my %data;
my $state = "";
my $school = "";
my $student = "";
my ( $row_min, $row_max ) = $worksheet->row_range();
my $row = $row_min;
while ($row <= $row_max) {
    my $cell0 = $worksheet->get_cell( $row, 0 );
    my $cell1 = $worksheet->get_cell( $row, 1 );

    if (defined($cell0)) {
        my $key = $cell0->value();
        if ($key eq 'School') {
            $state = 'school';
            $school = $cell1->value();
        } elsif ($key eq 'course') {
            $state = 'course';
        } elsif ($key eq 'teacher') {
            $state = 'teacher';
        } elsif ($key eq 'student') {
            $state = 'student');
            $student = $worksheet->get_cell( $row, 2 )->value();
        } else {
            $data{$school}{$key} = $cell1->value();
        }
    } elsif ($state eq 'course') {
        # process columns for course
    } elsif ($state eq 'teacher') {
        # process columns for teacher
    } elsif ($state eq 'student') {
        # process columns for student
    }
    $row++;
}
use Data::Dumper;
print Dumper(\%data);

<强>更新

要处理教师行,我首先会在第一个while循环之外声明%teacher_columns散列,并使用col_range()方法声明$ col_min和$ col_max。然后在$key eq 'teacher' elsif子句中,我会做这样的事情:

%teacher_columns = (); # clear it out in case column names are different for this school
for my $col (2 .. $col_max) {
    my $cell = $worksheet->get_cell( $row, $col );
    $teacher_columns{$cell->value()} = $col if defined($cell) and $cell->value();
}

然后在state eq 'teacher' elsif条款中,我会做这样的事情:

foreach my $key (keys %teacher_columns) {
    my $cell = $worksheet->get_cell($row, $teacher_columns{$key});
    if (defined($cell)) {
      # store cell data into proper location of your data hash
    }
}

答案 1 :(得分:0)

这是我的作业:

我只是添加了列的过程。

elsif ($state eq 'student') {
        my $key = $cell1->value();
        $data{$school}{$student}{$key} =$cell2->value();
}  elsif ($state eq 'course') {
       my $key = $cell1->value();
        $data{$school}{$course}{$key} =$cell2->value();

其中$cell2定义为:

my $cell2 = $worksheet->get_cell( $row, 2 );

它给了我:

$VAR1 = {
          '4' => {
                   'course' => {
                                 'math' => '2',
                                 'eng' => '4'
                               },
                   'No.Stu' => '79',
                   'No.Teacher' => '21',
                   'Lee' => {
                              'SAT' => '1434',
                              'gender' => 'male'
                            },
                   'Dean' => 'Tom'
                 },
          '1' => {
                   'No.Stu' => '55',
                   'Dean' => 'John'
                 },
          '3' => {
                   'No.Stu' => '56',
                   'No.Teacher' => '20',
                   'Dean' => 'James'
                 },
          '2' => {
                   'No.Stu' => '60',
                   'Dean' => 'Tony'
                 }
        };

看起来不错。

但我不知道如何为教师处理专栏,因为它有一些不同的结构:

        col1      col2    col3   col4   col5
  row20 teacher   name    age   gender   race
  rwo21           Jane    20    female   white

我试图遍历列以获取单元格值作为键但不成功:

my ( $col_min, $col_max ) = $worksheet->col_range();
my $col = $col_min;
my $cell3 = $worksheet->get_cell(++$row, $col );

elsif ($state eq 'teacher') {
        while ($col <= $col_max) {
            my $key = $worksheet->get_cell($row, $col++ );
        }
           $data{$school}{$teacher}{$key} =$cell3->value();
    }

有什么建议吗?

更新:我按照@imran的建议,它工作得很好然后我只是使用以下代码重写我已解析成新的XLS文件:

use Spreadsheet::WriteExcel;

$workbook = Spreadsheet::WriteExcel->new('Result.xls');
$worksheet = $workbook->add_worksheet();

$col = 0;
$row = 0;

$worksheet->write( $row++, $col,
    ["School", "No.Stu", "No.Teacher", "Dean","Course_math", "Course_eng", "student_SAT", "student_name", "student_gender", "teacher_race", "teacher_name", "teacher_age", "teacher_gender"] );

foreach my $school( sort keys %data ) {

     $worksheet->write( $row++, $col,
        [ $school, @{$data{$school}}{ "No.Stu", "No.Teacher", "Dean"}, @{$data{$school}{course}}{ "math", "eng"}, @{$data{$school}{student}}{ "SAT", "name", "gender"}, @{$data{$school}{teacher}}{ "race", "name", "age", "gender"} ]) ;
}

$workbook->close();

除了一件小事以外,一切顺利。 teacher_age的单元格为空,但它在哈希表中显示它已被解析。我无法理解......

哈希教师部分如下:

           'teacher' => {
                          'race' => 'white',
                          'name' => 'Jane',
                          'age ' => '25',
                          'gender' => ' female'
                        },

我使用的代码正是你所建议的,但重新格式化的xls中的'teacher_age'的单元格。文件是空的。

更新2:回到关于'state'的问题,是否总是从后续行开始解析?如果我有以下数据结构怎么办:

 School    1
 course    math
           eng
 ...
 School     2
 course    phy
 ...
 School     3
 course    chem
           gym
           music

如果我使用旧代码并定义课程状态,我只能从engschool 1课程gym and music解析school 2。好吧,我正在考虑提前一行定义课程状态,但没有任何运气。有什么建议吗?

更新

我通过将课程代码更改为:

来关注@ imran的建议
} elsif ($key eq 'course') {
            $state = 'course';
            $course = $worksheet->get_cell( $row, 1 )->value();
}

相应地,哈希表为:

} elsif ($state eq 'course') {
            my $key = $cell1->value();
            $data{$school}{$course}{$key} =$cell1->value();
}

但是,它只能解析

 '1' => {
                   'math' => {
                               'eng' => 'eng'
                             },
       }
 '3' => {
   'chem' => {
                               'gym' => 'gym',
                               'music' => 'music
                             },
      }

没有解析school2

通过在到达密钥后立即使用course来解决问题。即不是拥有定义course keys的单独块并继续进行课程表。我现在有:

  } elsif ($key eq 'course') {
            $state = 'course';
            $course = $state;
            my $key = $cell1->value();
            $data{$school}{$course}{$key} = $key;
}

它就是那个伎俩。但我仍然对它的运作方式感到好奇。