假设以下字符串
$doc=<<'TEXT_END';
<LI>11:20
</LI>
<LI><a href="/showtime/ticket/4f3a3cc7017f4202b5add5803594fdd9/" class="openbox">13:55 <font color=a81234 size=-1>訂票</font> </a> </LI>
TEXT_END
如何使用一个正则表达式捕获11:20和13:55
我不知道如何进行可选匹配 (让以下两个标签可以忽略)
<a href=".....">
<font color="....">
订票是指“预订机票”。(网站在可预订时添加链接)
抱歉我的英文不好下面是我的代码,它无法正常工作。
#!/usr/bin/env perl
#use utf8;
use LWP::Simple;
binmode(STDIN, ':encoding(utf8)');
binmode(STDOUT, ':encoding(utf8)');
binmode(STDERR, ':encoding(utf8)');
my $doc = get 'http://www.atmovies.com.tw/showtime/theater_t06609_a06.html';
my @movies = ($doc =~ /<a href="\/movie\/([a-z]+\d+)\/">([^><]+)<\/a>.+?<UL>(.+?)<\/UL>/gs);
for($i=1; $i<=$#movies; $i+=3){
print "$movies[$i]\n";
print $movies[$i+1]."\n\n";
#this work just fine!
my @times = ($movies[$i+1] =~ /<LI>([^<>]+)\r\n\s+<\/LI>/g);
for($j=0; $j<=$#times; $j++){
print "$times[$j]\n";
}
#this regex doesn't work correctly, it catch nothing
@times_available=($movies[$i+1] =~ /<LI><a href="\/showtime\/ticket\/[0-9a-f]{32}\/" class="openbox">([^><\s]+) <font color=a81234 size=-1>☆訂票<\/font> <\/a> <\/LI>/g);
for($j=0; $j<=$#times_available; $j++){
print "$times_available[$j]\n";
}
}
答案 0 :(得分:1)
你可以试试这个
@times = $doc =~ m/>\s*([\d:]+)/g;
这是完整的测试程序:
#!/usr/bin/perl
use warnings;
use strict;
use utf8;
use Data::Dumper;
my $doc=<<'TEXT_END';
<LI>11:20
</LI>
<LI><a href="/showtime/ticket/4f3a3cc7017f4202b5add5803594fdd9/" class="openbox">13:55 <font color=a81234 size=-1>訂票</font> </a> </LI>
TEXT_END
my @times = $doc =~ m/>\s*([\d:]+)/g;
print Dumper(\@times);
结果:
$ perl t020.pl
$VAR1 = [
'11:20',
'13:55'
];