我正在尝试使用perl在这个html文件中提取表。
我试过这个:
my $te = HTML::TableExtract->new();
$te->parse_file($g_log);
print "=====TE: $te ======\n";
输出是:
HTML:TableExtract = Hash(0x266f5f)
我试过迭代$ te而没有发现任何东西。任何人都可以指导下一步做什么。我是新手。
这是HTML文件:
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:math="http://exslt.org/math"
xmlns:testng="http://testng.org">
<head xmlns="">
<title>TestNG Results</title>
<meta http-equiv="content-type" content="text/html; charset=utf-8"></meta>
<meta http-equiv="pragma" content="no-cache"></meta>
<meta http-equiv="cache-control" content="max-age=0"></meta>
<meta http-equiv="cache-control" content="no-cache"></meta>
<meta http-equiv="cache-control" content="no-store"></meta>
<LINK rel="stylesheet" href="style.css"></LINK>
<script type="text/javascript" src="main.js"></script>
</head>
<body>
<h2>Test suites overview</h2>
<table width="100%">
<tr>
<td align="center" id="chart-container"><script type="text/javascript">
renderSvgEmbedTag(600, 200);
</script></td>
</tr>
</table>
</body>
</html>
答案 0 :(得分:2)
#!/usr/bin/perl
#use strict;
use warnings;
use HTML::TableExtract;
my $filename = "testfile.html";
my $te = HTML::TableExtract->new();
$te->parse_file($filename);
foreach $ts ($te->tables) {
print "Table found at ", join(',', $ts->coords), ":\n";
foreach $row ($ts->rows) {
print " ", join(',', @$row), "\n";
}
}
请注意,HTML::TableExtract
也可以在'tree' mode中调用,其中生成的HTML和提取的表格以HTML::Element
树结构进行编码。
use HTML::TableExtract 'tree';
答案 1 :(得分:1)
不确定你想要离开桌子的是什么。但我强烈建议使用数据转储器查看哈希内部。
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TableExtract;
use Data::Dumper;
my $html = <<'EOT';
<html xmlns="http://w...content-available-to-author-only...3.org/1999/xhtml" xmlns:math="http://e...content-available-to-author-only...t.org/math"
xmlns:testng="http://t...content-available-to-author-only...g.org">
<head xmlns="">
<title>TestNG Results</title>
<meta http-equiv="content-type" content="text/html; charset=utf-8"></meta>
<meta http-equiv="pragma" content="no-cache"></meta>
<meta http-equiv="cache-control" content="max-age=0"></meta>
<meta http-equiv="cache-control" content="no-cache"></meta>
<meta http-equiv="cache-control" content="no-store"></meta>
<LINK rel="stylesheet" href="style.css"></LINK>
<script type="text/javascript" src="main.js"></script>
</head>
<body>
<h2>Test suites overview</h2>
<table width="100%">
<tr>
<td align="center" id="chart-container"><script type="text/javascript">
renderSvgEmbedTag(600, 200);
</script></td>
</tr>
</table>
</table>
</body>
</html>
EOT
my $te = HTML::TableExtract->new();
$te->parse($html);
print Dumper($te);