如何使用正则表达式获取链接网址
`<tr class="ipl-zebra-list__item">
<td class="ipl-zebra-list__label">Official Sites</td>
<td>
<ul class="ipl-inline-list">
<li class="ipl-inline-list__item">
<a href="https://www.indiegogo.com/projects/super-troopers-2">IndieGoGo page</a>
</li>
<li class="ipl-inline-list__item">
<a href="https://www.facebook.com/SuperTroopersMovie/">Official site</a>
</li>
</ul>
</td>
</tr>
尝试使用此代码,但不会发生
`$arr['sites'] = $this->match_all('/<a.*?>(.*?)<\/a>/ms', $this->match('/Official Sites <a href="(.*?)".*?<\/a>(<\/tr>)/ms', $html, 1), 1);
尝试代码2 ..我只获取名称 IndieGoGo页面和官方网站
$arr['sites1'] = $this->match_all('/<a.*?>(.*?)<\/a>/ms', $this->match('/Official Sites(.*?)(<\/tr>)/ms', $html, 1), 1);
请帮助您只获取网址 https://www.indiegogo.com/projects/super-troopers-2 和 https://www.facebook.com/SuperTroopersMovie/
这里是我的imdb php http://movie21.top/imdb.txt
答案 0 :(得分:1)
不要使用正则表达式来解析HTML,而是使用类似DOMDocument的内容。
This is a non-qualifying#HashTag and this has white space before it #<a href="https://somesite.com/search?cityid=0&lang=en&search=Test9&subcatid=1&view=ads&catid=2">#Test9</a> and some more text.
<强>结果:强>
<?php
$dom = new DOMDocument();
$dom->loadHtml('
<tr class="ipl-zebra-list__item">
<td class="ipl-zebra-list__label">Official Sites</td>
<td>
<ul class="ipl-inline-list">
<li class="ipl-inline-list__item">
<a href="https://www.indiegogo.com/projects/super-troopers-2">IndieGoGo page</a>
</li>
<li class="ipl-inline-list__item">
<a href="https://www.facebook.com/SuperTroopersMovie/">Official site</a>
</li>
</ul>
</td>
</tr>
', LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
$arr['sites1'] = [];
foreach ($xpath->query("//li[@class=\"ipl-inline-list__item\"]/a") as $link) {
$href = $link->getAttribute('href');
$arr['sites1'][] = $href;
}
print_r($arr['sites1']);