目前我正在尝试使用xpath从网站解析html页面。
我需要获得以下格式的结果:
节目时间:节目名称
例如:
1.00PM:Ye Hai Mohabbatein
我正在使用以下代码(如here所示)来获取它但它不起作用。
<?php
libxml_use_internal_errors(true);
$dom = new DomDocument;
$dom->loadHTMLFile("www.starplus.in/schedule.aspx");
$xpath = new DomXPath($dom);
$nodes = $xpath->query("//table");
foreach ($nodes as $i => $node) {
echo "hy";
echo "Node($i): ", $node->nodeValue, "\n";
}
?>
如果有人在这个问题上帮助我,我将感激不尽。
答案 0 :(得分:2)
基本上,只需定位具有该节目名称和时间段的表格div / table。
粗略的例子:
// it seems it doesn't work when there is no user agent
$ch = curl_init('http://www.starplus.in/schedule.aspx');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$page = curl_exec($ch);
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($page);
libxml_clear_errors();
$xpath = new DOMXPath($dom);
$shows = array();
$tables = $xpath->query("//div[@class='sech_div_bg']/table"); // target that table
foreach ($tables as $table) {
$time_slot = $xpath->query('./tr[1]/td/span', $table)->item(0)->nodeValue;
$show_name = $xpath->query('./tr[3]/td/span', $table)->item(0)->nodeValue;
$shows[] = array('time_slot' => $time_slot, 'show_name' => $show_name);
echo "$time_slot - $show_name <br/>";
}
// echo '<pre>';
// print_r($shows);