我正在尝试使用HTML :: TableExtract从html文件中提取表内容。我的问题是我的html文件是按以下方式构建的:
!DOCTYPE html>
<html>
<body>
<h4>One row and three columns:</h4>
<table border="1">
<tr>
<td>
<p> 100 </p></td>
<td>
<p> 200 </p></td>
<td>
<p> 300 </p></td>
</tr>
<tr>
<td>
<p> 100 </p></td>
<td>
<p> 200 </p></td>
<td>
<p> 300 </p></td>
</tr>
</table>
</body>
由于这种结构,我的输出如下:
100|
200|
300|
400|
500|
600|
100|200|300|
400|500|600|
use strict;
use warnings;
use HTML::TableExtract;
my $te = HTML::TableExtract->new();
$te->parse_file('Table_One.html');
open (DATA2, ">TableOutput.txt")
or die "Can't open file";
foreach my $ts ($te->tables()) {
foreach my $row ($ts->rows()) {
my $Final = join('|', @$row );
print DATA2 "$Final";
}
}
close (DATA2);
答案 0 :(得分:1)
sub trim(_) { my ($s) = @_; $s =~ s/^\s+//; $s =~ s/\s+\z//; $s }
foreach my $ts ($te->tables()) {
foreach my $row ($ts->rows()) {
print DATA2 join('|', map trim, @$row), "\n";
}
} ^
|
|
或者如果你真的想要尾随“|
”,
sub trim(_) { my ($s) = @_; $s =~ s/^\s+//; $s =~ s/\s+\z//; $s }
foreach my $ts ($te->tables()) {
foreach my $row ($ts->rows()) {
print DATA2 (map { trim($_).'|' } @$row), "\n";
}
}