如何使用正则表达式获取链接URL

时间:2018-03-25 03:43:54

标签: php regex

如何使用正则表达式获取链接网址

 `<tr class="ipl-zebra-list__item">
        <td class="ipl-zebra-list__label">Official Sites</td>
        <td>
            <ul class="ipl-inline-list">
                    <li class="ipl-inline-list__item">
                        <a href="https://www.indiegogo.com/projects/super-troopers-2">IndieGoGo page</a>
                    </li>
                    <li class="ipl-inline-list__item">
                        <a href="https://www.facebook.com/SuperTroopersMovie/">Official site</a>
                    </li>
            </ul>
        </td>
    </tr>

尝试使用此代码,但不会发生

`$arr['sites'] = $this->match_all('/<a.*?>(.*?)<\/a>/ms', $this->match('/Official Sites <a href="(.*?)".*?<\/a>(<\/tr>)/ms', $html, 1), 1);

尝试代码2 ..我只获取名称 IndieGoGo页面官方网站

        $arr['sites1'] = $this->match_all('/<a.*?>(.*?)<\/a>/ms', $this->match('/Official Sites(.*?)(<\/tr>)/ms', $html, 1), 1);

请帮助您只获取网址 https://www.indiegogo.com/projects/super-troopers-2 https://www.facebook.com/SuperTroopersMovie/

这里是我的imdb php http://movie21.top/imdb.txt

1 个答案:

答案 0 :(得分:1)

不要使用正则表达式来解析HTML,而是使用类似DOMDocument的内容。

This is a non-qualifying#HashTag and this has white space before it #<a href="https://somesite.com/search?cityid=0&lang=en&search=Test9&subcatid=1&view=ads&catid=2">#Test9</a> and some more text.

https://3v4l.org/KCfXCC

<强>结果:

<?php
$dom = new DOMDocument();

$dom->loadHtml('
<tr class="ipl-zebra-list__item">
    <td class="ipl-zebra-list__label">Official Sites</td>
    <td>
        <ul class="ipl-inline-list">
            <li class="ipl-inline-list__item">
                <a href="https://www.indiegogo.com/projects/super-troopers-2">IndieGoGo page</a>
            </li>
            <li class="ipl-inline-list__item">
                <a href="https://www.facebook.com/SuperTroopersMovie/">Official site</a>
            </li>
        </ul>
    </td>
</tr>
', LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$xpath = new DOMXPath($dom);

$arr['sites1'] = [];
foreach ($xpath->query("//li[@class=\"ipl-inline-list__item\"]/a") as $link) {
    $href = $link->getAttribute('href');

    $arr['sites1'][] = $href;
}

print_r($arr['sites1']);