Question

现在已经过了几天，我正试图找到解决问题的方法。我使用CURL来获取网页的内容，然后使用prey_match_all来使用我的风格上的内容，但是当找到一些＆lt;时，我遇到了问题。 a＆gt;文档中的标签。

我希望preg_match_all找到所有＆lt; a＆gt;标签后跟一个＆lt;强大＆gt;标记并存储这些＆lt;的所有href值。 a＆gt;数组变量中的标签。

这就是我的想法：

preg_match_all("~(<a href=\"(.*)\"><strong>\w+<\/strong>)~iU", $result, $link);

它回来了我：

Array ( [0] => Array ( ) [1] => Array ( ) [2] => Array ( ) )

请有人帮帮我!!

Answer 1

我强烈建议您使用DomDocument

这段代码可以解决问题...

<?php

/**
* @author Jay Gilford
* @edited KHMKShore:stackoverflow
*/

/**
* get_links()
* 
* @param string $url
* @return array
*/
function get_links($url) {

  // Create a new DOM Document to hold our webpage structure
  $xml = new DOMDocument();

  // Load the url's contents into the DOM (the @ supresses any errors from invalid XML)
  @$xml->loadHTMLFile($url);

  // Empty array to hold all links to return
  $links = array();

  //Loop through each <a> and </a> tag in the dom
  foreach($xml->getElementsByTagName('a') as $link) {
    //if it has a strong tag in it, save the href link.
    if (count($link->getElementsByTagName('strong')) > 0) {
        $links[] = array('url' => $link->getAttribute('href'), 'text' => $link->nodeValue);
    }
  }

  //Return the links
  return $links;
}

Answer 2

首先，你的正则表达式很容易失败

<a alt="cow > moo" href="cow.php"><strong>moo</strong></a>

你的正则表达式略微偏出，以下将起作用：

~(<a href="(.*)"><strong>\w+</strong></a>)~

第三，也是最重要的，如果你想保证提取你想要的东西而不会失败，就像@KHMKShore指出的那样，DOMDocument是最好的途径。

获得<a> tag with preg_match_all and curl</a>的HREF

2 个答案: