Question

我有这个html数据，我试图从下面的div元素中提取第一个href值。

<div>blah blah.
    <a href="http://www.example.com">example</a>
    <a href="http://www.example2.com">site</a>
</div>

我尝试使用这个正则表达式，但我无法弄清楚我哪里出错了？

preg_match('/<div>.*?<a.*"(.*)">/', $html, $match);

有人可以提出更好的方法吗？

Answer 1

不要重新发明轮子..

使用正确的tool作为作业，而非正则表达式。

$dom = DOMDocument::loadHTML('
     <div>blah blah.
         <a href="http://www.example.com">example</a>
         <a href="http://www.example2.com">site</a>
     </div>
');
$xpath = new DOMXPath($dom);
$link  = $xpath->query("//div/a")->item(0);
echo $link->getAttribute('href'); //=> "http://www.example.com"

Answer 2

请参阅hwnd的答案，使用更舒适，更精确的方式。

要真正使用正则表达式来执行您的请求，您可以使用这样的方法：

<div>.*?<a[^>]+href="([^"]*)"

Regular expression visualization

Debuggex Demo

还是要说：

不要重新发明轮子，就像@hwnd所说
避免解析HTML / XML＆amp;有正则表达式的公司

Answer 3

    x="<div>blah blah.\n\t<a href="http://www.example.com">example</a>\n\t<a href="http://www.example2.com">site</a>\n</div>"
    import re
    pattern=re.compile(r".*? href=(\S+?)>.*?",re.DOTALL)
    y=pattern.match(x).groups()
    print y[0]

输出：＆＃34; http://www.example.com＆＃34;

Answer 4

你可以试试这个 preg_match('/<div>[^<]*?<a[^>]*\"([^>]*?)\"/', $html, $match); var_dump($match);

preg_match从div获得第一个href

4 个答案:

不要重新发明轮子..