如何使用正则表达式找到以下div
?
我必须使用正则表达式,因为我使用的软件限制了我可以使用的内容:http://community.autoblogged.com/entries/344640-common-search-and-replace-patterns
<div class="tweetmeme_button" style="float: right; margin-left: 10px;"> <a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fjumpinblack.com%2F2011%2F11%2F25%2Fdrake-and-rick-ross-you-only-live-once-ep-mixtape-2011-download%2F"><br /> <img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fjumpinblack.com%2F2011%2F11%2F25%2Fdrake-and-rick-ross-you-only-live-once-ep-mixtape-2011-download%2F&source=jumpinblack1&style=compact&b=2" height="61" width="50" /><br /> </a> </div>
我尝试使用
<div class="tweetmeme_button" style="float: right; margin-left: 10px;">.*<\/div>
答案 0 :(得分:1)
使用HTML解析器解析HTML。
HTML::TokeParser::Simple或HTML::TreeBuilder::XPath以及其他许多人。
E.g:
#!/usr/bin/env perl
use strict;
use warnings;
use HTML::TokeParser::Simple;
my $parser = HTML::TokeParser::Simple->new( ... );
while (my $div = $parser->get_tag) {
next unless $div->is_start_tag('div');
{
no warnings 'uninitialized';
next unless $div->get_attr('class') eq 'tweetmeme_button';
next unless $div->get_attr('style') eq 'float: right; margin-left: 10px;'
# now do what you want until the next </div>
}
}
答案 1 :(得分:1)
使用正则表达式处理HTML是个坏主意。我正在使用HTML :: TreeBuilder :: XPath。
use strict;
use warnings;
use HTML::TreeBuilder::XPath;
use WWW::Mechanize;
my $mech = WWW::Mechanize->new();
$mech->get("http://www.someURL.com");
my $tree = HTML::TreeBuilder::XPath->new_from_content( $mech->content() );
my $div = $tree->findnodes( '//div[@class="tweetmeme_button"]')->[0];