首先参考Stack Overflow问题 Using Spreadsheet::WriteExcel 。
数据结构与以下内容相似:
col1 col2 col3 col4 col5
row1 School 1
row2 Dean John
row3 No.stu. 55
row4 some irrelevant stuff
row5 School2 2
row6 Dean Tony
row7 No. stu. 60
row8 some irrelevant stuff
row9 School 3
row10 Dean James
row11 No.stu. 56
row12 No. teacher 20
row13 School 4
row14 Dean Tom
row15 No.stu. 79
row16 No. teacher 21
row17 course
row18 math 2
row19 eng 4
row20 teacher name age gender race
row21 Jane 20 female white
row22 student name Lee
row23 SAT 1434
row24 gender male
我想要实现的输出是:
col1 col2 col3 col4 col5 col6 col7 col8 col9
row1 School Dean No.stu. No. teacher course_math course_eng teacher_name teacher_age teacher_gender teacher_race student_name student_SAT student_gender
row2 1 John 55
row3 2 Tony 60
row4 3 James 56 20
row5 4 Tome 79 21 2 4 Jane 20 female white Lee 1434 male
感谢gangabass,我给出的代码是:
use strict;
use warnings;
use Spreadsheet::ParseExcel;
use FindBin qw($Bin);
my ($infile) = @ARGV;
my $parser = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse("$Bin/Test.xls");
die $parser->error unless defined $workbook;
my ($worksheet) = $workbook->worksheets();
my %data;
my $row = 0;
my $school = "";
while (1) {
my $cell = $worksheet->get_cell($row, 0);
last unless defined($cell);
my $key = $cell->value();
my $value = $worksheet->get_cell($row++, 1)->value();
if ($key eq "School") {
$school = $value;
next;
}
$data{$school}->{$key} = $value;
}
sleep 1;
我从解析row17-row19开始。我遇到的第一个问题是(row17,col3)中的空单元格。代码在到达此处时会出错。知道Excel单元格区分'空'和'空',我可以通过将原始XLS文件的单元格格式设置为除“常规”之外的其他格式来解决它。但是,这只是一个临时解决方案。我想知道是否有任何命令我可以用来获取空单元格。我已经尝试unformatted()
添加:
my $unformattedvalue = $worksheet->get_cell( $row++, 1 )->unformatted();
然而,它并没有成功。
然后,我尝试使用以下代码在“课程”状态下指定数据结构:
my %data;
my $row = 0;
my $school = "";
my $course = ""; #Initial value for the state of course
while (1) {
my $cell = $worksheet->get_cell($row, 0);
last unless defined($cell);
my $key = $cell->value();
my $value = $worksheet->get_cell( $row++, 1 )->value();
my $value1 = $worksheet->get_cell( $row++, 2 )->value(); #Fetching the value in column 3
if ($key eq "School") {
$school = $value;
next;
}
if ($key eq "course") { #Just mimicking the how we construct the structure of 'School'
$course = $value1;
next;
}
$data{$school}->{$key} = $value; #Must be something wrong here, but can not figure out
}
代码未通过并提供Can't call method "value" on an undefined value at xxx line of 'my $value1 = $worksheet->get_cell( $row++, 2 )->value()';
简而言之,我的问题是:
如何使用Spreadsheet :: ParseExcel
解析如下所示的结构row17 course
row18 math 2
row19 eng 4
答案 0 :(得分:2)
你不应该做$row++
。
请尝试使用此代码作为起点:
my %data;
my $state = "";
my $school = "";
my $student = "";
my ( $row_min, $row_max ) = $worksheet->row_range();
my $row = $row_min;
while ($row <= $row_max) {
my $cell0 = $worksheet->get_cell( $row, 0 );
my $cell1 = $worksheet->get_cell( $row, 1 );
if (defined($cell0)) {
my $key = $cell0->value();
if ($key eq 'School') {
$state = 'school';
$school = $cell1->value();
} elsif ($key eq 'course') {
$state = 'course';
} elsif ($key eq 'teacher') {
$state = 'teacher';
} elsif ($key eq 'student') {
$state = 'student');
$student = $worksheet->get_cell( $row, 2 )->value();
} else {
$data{$school}{$key} = $cell1->value();
}
} elsif ($state eq 'course') {
# process columns for course
} elsif ($state eq 'teacher') {
# process columns for teacher
} elsif ($state eq 'student') {
# process columns for student
}
$row++;
}
use Data::Dumper;
print Dumper(\%data);
<强>更新强>
要处理教师行,我首先会在第一个while循环之外声明%teacher_columns散列,并使用col_range()方法声明$ col_min和$ col_max。然后在$key eq 'teacher'
elsif子句中,我会做这样的事情:
%teacher_columns = (); # clear it out in case column names are different for this school
for my $col (2 .. $col_max) {
my $cell = $worksheet->get_cell( $row, $col );
$teacher_columns{$cell->value()} = $col if defined($cell) and $cell->value();
}
然后在state eq 'teacher'
elsif条款中,我会做这样的事情:
foreach my $key (keys %teacher_columns) {
my $cell = $worksheet->get_cell($row, $teacher_columns{$key});
if (defined($cell)) {
# store cell data into proper location of your data hash
}
}
答案 1 :(得分:0)
这是我的作业:
我只是添加了列的过程。
elsif ($state eq 'student') {
my $key = $cell1->value();
$data{$school}{$student}{$key} =$cell2->value();
} elsif ($state eq 'course') {
my $key = $cell1->value();
$data{$school}{$course}{$key} =$cell2->value();
其中$cell2
定义为:
my $cell2 = $worksheet->get_cell( $row, 2 );
它给了我:
$VAR1 = {
'4' => {
'course' => {
'math' => '2',
'eng' => '4'
},
'No.Stu' => '79',
'No.Teacher' => '21',
'Lee' => {
'SAT' => '1434',
'gender' => 'male'
},
'Dean' => 'Tom'
},
'1' => {
'No.Stu' => '55',
'Dean' => 'John'
},
'3' => {
'No.Stu' => '56',
'No.Teacher' => '20',
'Dean' => 'James'
},
'2' => {
'No.Stu' => '60',
'Dean' => 'Tony'
}
};
看起来不错。
但我不知道如何为教师处理专栏,因为它有一些不同的结构:
col1 col2 col3 col4 col5
row20 teacher name age gender race
rwo21 Jane 20 female white
我试图遍历列以获取单元格值作为键但不成功:
my ( $col_min, $col_max ) = $worksheet->col_range();
my $col = $col_min;
my $cell3 = $worksheet->get_cell(++$row, $col );
elsif ($state eq 'teacher') {
while ($col <= $col_max) {
my $key = $worksheet->get_cell($row, $col++ );
}
$data{$school}{$teacher}{$key} =$cell3->value();
}
有什么建议吗?
更新:我按照@imran的建议,它工作得很好然后我只是使用以下代码重写我已解析成新的XLS文件:
use Spreadsheet::WriteExcel;
$workbook = Spreadsheet::WriteExcel->new('Result.xls');
$worksheet = $workbook->add_worksheet();
$col = 0;
$row = 0;
$worksheet->write( $row++, $col,
["School", "No.Stu", "No.Teacher", "Dean","Course_math", "Course_eng", "student_SAT", "student_name", "student_gender", "teacher_race", "teacher_name", "teacher_age", "teacher_gender"] );
foreach my $school( sort keys %data ) {
$worksheet->write( $row++, $col,
[ $school, @{$data{$school}}{ "No.Stu", "No.Teacher", "Dean"}, @{$data{$school}{course}}{ "math", "eng"}, @{$data{$school}{student}}{ "SAT", "name", "gender"}, @{$data{$school}{teacher}}{ "race", "name", "age", "gender"} ]) ;
}
$workbook->close();
除了一件小事以外,一切顺利。 teacher_age
的单元格为空,但它在哈希表中显示它已被解析。我无法理解......
哈希教师部分如下:
'teacher' => {
'race' => 'white',
'name' => 'Jane',
'age ' => '25',
'gender' => ' female'
},
我使用的代码正是你所建议的,但重新格式化的xls中的'teacher_age'的单元格。文件是空的。
更新2:回到关于'state'的问题,是否总是从后续行开始解析?如果我有以下数据结构怎么办:
School 1
course math
eng
...
School 2
course phy
...
School 3
course chem
gym
music
如果我使用旧代码并定义课程状态,我只能从eng
,school 1
课程gym and music
解析school 2
。好吧,我正在考虑提前一行定义课程状态,但没有任何运气。有什么建议吗?
更新
我通过将课程代码更改为:
来关注@ imran的建议} elsif ($key eq 'course') {
$state = 'course';
$course = $worksheet->get_cell( $row, 1 )->value();
}
相应地,哈希表为:
} elsif ($state eq 'course') {
my $key = $cell1->value();
$data{$school}{$course}{$key} =$cell1->value();
}
但是,它只能解析
'1' => {
'math' => {
'eng' => 'eng'
},
}
'3' => {
'chem' => {
'gym' => 'gym',
'music' => 'music
},
}
没有解析school2
。
通过在到达密钥后立即使用course
来解决问题。即不是拥有定义course keys
的单独块并继续进行课程表。我现在有:
} elsif ($key eq 'course') {
$state = 'course';
$course = $state;
my $key = $cell1->value();
$data{$school}{$course}{$key} = $key;
}
它就是那个伎俩。但我仍然对它的运作方式感到好奇。