我想得到这个频道有一个卷曲的订阅者数量,但似乎我得到一个空数组,任何帮助?
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://www.youtube.com/channel/UCU3i-l-rqTVGQj3Q3LePhJQ");
curl_setopt($ch, CURLOPT_USERAGENT,"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1");
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Accept-Language: es-es,en"));
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
//para mostrar posibles error
$error = curl_error($ch);
curl_close($ch);
//parsear
preg_match_all("(<a class=\"secondary-header-action\" href=\"/subscribers\" role=\"menuitem\">
<span class=\"nav-text\">
(.*)
</span>
</a>)siU", $result, $matches);
print_r($matches);
答案 0 :(得分:1)
解析HTML时,最安全的方法是使用HTML DOM解析器。以下示例代码采用$result
HTML字符串,并使用span
标记a
内的 nav-text 类获取secondary-header-action
标记内的所有文字} class:
$result = <<<DATA
<body>
<a class="secondary-header-action" href="/subscribers" role="menuitem">
<span class="nav-text">Some text here</span>
</a>
</body>
DATA;
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($result, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
$atags = $xpath->query('//a[@class="secondary-header-action"]/span[@class="nav-text"]');
$res = array();
foreach($atags as $a) {
array_push($res, $a->nodeValue);
}
print_r($res); // => Array ( [0] => Some text here )
请参阅PHP demo
使用DOMDocument
初始化DOM,DOMXPath使用xpath表达式帮助访问DOM树中的必要元素。