Perl如何进行可选的正则表达式匹配

时间:2014-03-18 06:05:12

标签: regex perl

假设以下字符串

$doc=<<'TEXT_END';
<LI>11:20
           </LI>
<LI><a href="/showtime/ticket/4f3a3cc7017f4202b5add5803594fdd9/" class="openbox">13:55 &nbsp; <font color=a81234 size=-1>訂票</font>&nbsp;</a> </LI>

TEXT_END

如何使用一个正则表达式捕获11:20和13:55

我不知道如何进行可选匹配 (让以下两个标签可以忽略)

<a href=".....">
<font color="...."> 

订票是指“预订机票”。(网站在可预订时添加链接)

抱歉我的英文不好

下面是我的代码,它无法正常工作。

#!/usr/bin/env perl
#use utf8;
use LWP::Simple;

binmode(STDIN, ':encoding(utf8)');
binmode(STDOUT, ':encoding(utf8)');
binmode(STDERR, ':encoding(utf8)'); 

my $doc = get 'http://www.atmovies.com.tw/showtime/theater_t06609_a06.html';

my @movies = ($doc =~ /<a href="\/movie\/([a-z]+\d+)\/">([^><]+)<\/a>.+?<UL>(.+?)<\/UL>/gs);

for($i=1; $i<=$#movies; $i+=3){
    print "$movies[$i]\n";
    print $movies[$i+1]."\n\n";

    #this work just fine!
    my @times = ($movies[$i+1] =~ /<LI>([^<>]+)\r\n\s+<\/LI>/g);
    for($j=0; $j<=$#times; $j++){
        print "$times[$j]\n";
    }

    #this regex doesn't work correctly, it catch nothing
    @times_available=($movies[$i+1] =~ /<LI><a href="\/showtime\/ticket\/[0-9a-f]{32}\/" class="openbox">([^><\s]+) &nbsp; <font color=a81234 size=-1>☆訂票<\/font>&nbsp;<\/a> <\/LI>/g);
    for($j=0; $j<=$#times_available; $j++){
        print "$times_available[$j]\n";
    }

}

1 个答案:

答案 0 :(得分:1)

你可以试试这个

@times = $doc =~ m/>\s*([\d:]+)/g;

这是完整的测试程序:

#!/usr/bin/perl

use warnings;
use strict;

use utf8;

use Data::Dumper;

my $doc=<<'TEXT_END';
<LI>11:20
           </LI>
       <LI><a href="/showtime/ticket/4f3a3cc7017f4202b5add5803594fdd9/" class="openbox">13:55 &nbsp; <font color=a81234 size=-1>訂票</font>&nbsp;</a> </LI>

TEXT_END

my @times = $doc =~ m/>\s*([\d:]+)/g;

print Dumper(\@times);

结果:

$ perl t020.pl 
$VAR1 = [
          '11:20',
          '13:55'
        ];