Curl和preg_match_all错误

时间:2016-12-19 20:25:12

标签: php regex curl

我想得到这个频道有一个卷曲的订阅者数量,但似乎我得到一个空数组,任何帮助?

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://www.youtube.com/channel/UCU3i-l-rqTVGQj3Q3LePhJQ");
curl_setopt($ch, CURLOPT_USERAGENT,"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1");
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Accept-Language: es-es,en"));
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$result = curl_exec($ch);
//para mostrar posibles error
$error = curl_error($ch);
curl_close($ch);

//parsear

preg_match_all("(<a class=\"secondary-header-action\" href=\"/subscribers\" role=\"menuitem\">
        <span class=\"nav-text\">
          (.*)
        </span>
      </a>)siU", $result, $matches);

print_r($matches);

1 个答案:

答案 0 :(得分:1)

解析HTML时,最安全的方法是使用HTML DOM解析器。以下示例代码采用$result HTML字符串,并使用span标记a内的 nav-text 类获取secondary-header-action标记内的所有文字} class:

$result = <<<DATA
<body>
<a class="secondary-header-action" href="/subscribers" role="menuitem">
<span class="nav-text">Some text here</span>
</a>
</body>
DATA;

$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($result, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$xpath = new DOMXPath($dom);
$atags = $xpath->query('//a[@class="secondary-header-action"]/span[@class="nav-text"]');
$res = array();

foreach($atags as $a) { 
   array_push($res, $a->nodeValue);
}

print_r($res); // => Array ( [0] => Some text here )

请参阅PHP demo

使用DOMDocument初始化DOM,DOMXPath使用xpath表达式帮助访问DOM树中的必要元素。