在Perl中的HTML块中添加我自己的标签

时间:2017-12-04 23:09:44

标签: html perl html-parsing

有没有办法接收html页面,查找块并用我自己的字符括起文字?

例如,给出以下html:

<html>
<head>
    <title>Appleseed Farm</title>
</html>
<body>
<table>
    <tr>
        <td>Col1</td>
        <td>Col2</td>
        <td><img src="blah/blah.jpg"></td> 
        <td></td>
    </tr>
</table>
<div>Some random text</div>
<p>Random image of the day: <img src="random.jpg"></p>
</body>
</html>

使用我的'@'标记围绕文字成为以下内容:

<html>
<head>
     <title>@Appleseed Farm@</title>
</html>
<body>
<table>
    <tr>
        <td>@Col1@</td>
        <td>@Col2@</td>
        <td><img src="blah/blah.jpg"></td> 
        <td></td>
    </tr>
</table>
<div>@Some random text@</div>
<p>@Random image of the day:@ <img src="random.jpg"></p>
</body>
</html>

1 个答案:

答案 0 :(得分:0)

抱歉这个坏问题。在浏览HTML :: Element页面后,我确实弄明白了。

use strict;
use warnings;
use HTML::TreeBuilder;
use feature ':5.10';

my $root = HTML::TreeBuilder->new_from_file(\*DATA);
$root->elementify;

for my $e ($root->look_down (
    sub {
      my $tag = $_[0]->tag();
      grep( /^$tag$/, qw(td div p title span) ) 
      }
  ))
{
    #Do an in-place replacement
    foreach my $item_r ($e->content_refs_list) {
        next if ref $$item_r;
        $$item_r =~ s/^(.+)$/\@$1\@/g;
    }
}

say $root->as_HTML (undef, "  ", {});

__DATA__
<html>
<head>
    <title>Appleseed Farm</title>
</html>
<body>
<table>
    <tr>
        <td>Col1</td>
        <td>Col2</td>
        <td><img src="blah/blah.jpg">Col 3</td> 
        <td>cat</td>
    </tr>
</table>
<div>Some random text</div>
<p>Random image of the day: <img src="random.jpg"></p>
</body>