如何搜索和替换特定的src =" url"使用perl在html中标记?

时间:2015-12-14 18:22:55

标签: html regex perl

假设我有一个包含大量文本的变量,包括普通HTML标记内的URL。特别是,我对标签的src =元素感兴趣。让我们说我知道我要在那堆文本中搜索的确切src =字符串,我想用其他文本替换它。 。 。这是我尝试过的一些内容(伪代码):

my $bunchotxt = << 'END_MESSAGE';
<a href="http://link.com/image.gif"><img class="alignleft size-thumbnail wp-image-295" src="http://link.com/image.gif" alt="shredding" width="150" height="150" /></a>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis convallis fringilla dui eget cursus. Nullam in mauris viverra elit pharetra fringilla. Pellentesque gravida ligula sit amet magna blandit, semper luctus enim semper. Nam a sem ut ex aliquam consectetur. Nulla enim metus, porta at elementum non, facilisis ullamcorper nisl. Vestibulum sed iaculis ante. Nullam mollis luctus posuere.

Suspendisse ipsum odio, iaculis in malesuada id, varius
END_MESSAGE

my $parser = HTML::TokeParser::Simple->new(
    string => $bunchotxt
);

while ( my $tag = $parser->get_tag('img') ) {
    #print $tag->as_is, "\n";
    for my $attr ( qw( src ) ) {
        $replaceStr = sprintf qq{%s="%s"\n}, $attr, $tag->get_attr($attr);
        $parsedtag =~ s/"//g;
        my @bits = $url->path_segments( );
        $cidreplace{$unparsedtag} = $path;
    }
    my $replaceStr = "src:\"replaced\"";
    $bunchotxt =~ s/$findURL/$replaceStr/g;
    print "$buchotxt\n";
}

1 个答案:

答案 0 :(得分:0)

首先,我们需要将您的问题提炼到我们真正关心的部分。你的示例代码不是很好,因为它包含很多不相关的错误,所以我已经采取了一些自由来剥离我认为不是绝对必要的东西来解决问题。我还在HTML中添加了一些换行符,以帮助进行水平滚动。

这让我们留下了这个:

use strict;
use warnings;

use HTML::TokeParser::Simple;

my $bunchotxt = << 'END_MESSAGE';
<a href="http://link.com/image.gif">
  <img
      class="alignleft size-thumbnail wp-image-295"
      src="http://link.com/image.gif"
      alt="shredding"
      width="150"
      height="150" />
</a>
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis convallis
fringilla dui eget cursus. Nullam in mauris viverra elit pharetra fringilla.
Pellentesque gravida ligula sit amet magna blandit, semper luctus enim semper.
Nam a sem ut ex aliquam consectetur. Nulla enim metus, porta at elementum non,
facilisis ullamcorper nisl. Vestibulum sed iaculis ante. Nullam mollis luctus
posuere.

Suspendisse ipsum odio, iaculis in malesuada id, varius
END_MESSAGE

my $parser = HTML::TokeParser::Simple->new(string => $bunchotxt);

while (my $tag = $parser->get_tag('img')) {
    my $src = $tag->get_attr('src');
    $bunchotxt =~ s/\Qsrc="$src"\E/src:"replaced"/g;
    print "$bunchotxt\n";
}

结果的第一行是:

<a href="http://link.com/image.gif"><img class="alignleft size-thumbnail wp-image-295" src:"replaced" ...