HTML内容正则表达式 - perl

时间:2014-06-26 09:58:51

标签: regex perl

我有html内容,如下所示:

html code ... </div>content1</div> html code ... 
html code ... </div>content2</div> html code ...

我想从HTML中提取content1 / 2/3 ...作为content1新行content2新行content3任何想法?提前致谢。

1 个答案:

答案 0 :(得分:0)

以下是使用Mojo::DOM启发this StackOverflow answer的示例:

#!/usr/bin/env perl

use strict ;
use warnings ;

use Mojo::DOM ;

my $html = <<EOHTML;
<!DOCTYPE html>
<html>
<head>
<title>Sample HTML with 2 divs</title>
</head>
<body>
     <div>
        Four score and seven years ago our fathers brought forth on this
        continent a new nation, conceived in liberty, and dedicated to the
        proposition that all men are created equal.
     </div>
     <div>
        Lorem ipsum dolor sit amet, consectetur adipisicing elit,
        sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
     </div>
</body>
</html>
EOHTML

my $dom = Mojo::DOM->new ;

$dom->parse( $html ) ;

for my $div ( $dom->find( 'div' )->each ) {

    print $div->all_text . "\n" ;

}

输出结果为:

Four score and seven years ago our fathers brought forth on this continent a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.