使用Perl按表名提取多个HTML表

时间:2015-01-02 01:40:37

标签: perl

我有一些HTML文件,我希望从每个文件中提取两个表。是否可以在一次扫描中从两个表中提取?

列标题稍有不同,这个脚本有效,但看起来有点长,有什么方法我可以有'Schedule Name |节点名称'作为最后一列的标题,并一次性获取两个表。 tabes是深度/计数2.1和2.2。

<code>
  #!/usr/bin/perl
use strict;
use warnings;
#use diagnostics;
use HTML::TableExtract;
use Text::Table;

##my $sched = qr/Schedule Name|Node Name/;
my $html = "c:\\Testin.htm";
my $out = "c:\\Testout.csv";
open( my $ofh, ">", $out ) or die "oops" ;
 my  $headers =  [ 'Status', 'Results', 'Schedule Name'];
my $table_extract = HTML::TableExtract->new(headers => $headers);
my $table_output = Text::Table->new();
$table_extract->parse_file($html);
my ($table) = $table_extract->tables or die "no emails to process\n";

foreach  my $row ($table->rows) {
       $table_output->load($row);
     print "   ", join(',',grep defined, @$row), "\n";
print $ofh "   ", join(',',grep defined, @$row ), "\n";
}
   $headers =  [ 'Status', 'Results', 'Node Name'];
 $table_extract = HTML::TableExtract->new(headers => $headers);
 $table_output = Text::Table->new();

$table_extract->parse_file($html);
 ($table) = $table_extract->tables;

foreach my $row ($table->rows) {
       $table_output->load($row);
     print "   ", join(',',grep defined, @$row),"\n";
print $ofh "   ", join(',',grep defined, @$row), "\n";
}

<code>

1 个答案:

答案 0 :(得分:0)

你不能说出你的意思&#34;名称&#34; (HTML <table>元素不能有{{1} }属性)但如果两个表&#39;标题如代码所示,您只需编写

即可
name

如果包含 my $table_extract = HTML::TableExtract->new(headers => [qw/ status Results Name /]) 数组中的任何字符串,则标题将匹配。这也是一个不区分大小写的匹配。