我需要抓取这个HTML页面......
http://www1.usl3.toscana.it/default.asp?page=ps&ospedale=3
....使用PHP和XPath在字符串附近获得值 7 " CODICE GIALLO "
(注意:如果您尝试浏览它,您可以在该页面中看到不同的值...它并不重要......它会改变它的恐怖......)
我使用此PHP代码示例来打印值...
<?php
ini_set('display_errors', 'On');
error_reporting(E_ALL);
$url = 'http://www1.usl3.toscana.it/default.asp?page=ps&ospedale=3';
$xpath_for_parsing = '/html/body/div/div[2]/table[2]/tbody/tr[1]/td/table/tbody/tr[3]/td[2]/table/tbody/tr[4]/td[2]/table/tbody/tr[2]/td[2]/b';
//#Set CURL parameters: pay attention to the PROXY config !!!!
$ch = curl_init();
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_PROXY, '');
$data = curl_exec($ch);
curl_close($ch);
$dom = new DOMDocument();
@$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
$colorWaitingNumber = $xpath->query($xpath_for_parsing);
$theValue = 'N.D.';
foreach( $colorWaitingNumber as $node )
{
$theValue = $node->nodeValue;
}
print $theValue;
?>
通过这种方式,我获得了&#34; N.D。&#34;输出不是&#34; 7 &#34;正如我想的那样。
阅读此Why does my XPath query (scraping HTML tables) only work in Firebug, but not the application I'm developing?我发现该问题与<tbody>
标记有关,因此我尝试将其从原始xpath中删除,并尝试使用以下代码:
$xpath_for_parsing = '/html/body/div/div[2]/table[2]/tr[1]/td/table/tr[3]/td[2]/table/tr[4]/td[2]/table/tr[2]/td[2]/b'
但结果仍然是&#34; N.D。&#34;而不是&#34; 7 &#34;。
使用
$xpath_for_parsing = '/html/body/div/div[2]/table[2]/tr[1]/td/table/tr[3]/td[2]/table/tr[4]/td[2]/table'
结果是&#34; Codice GIALLO 7 &#34;
我如何才能获得&#34; 7 &#34;值?
任何建议/示例?
答案 0 :(得分:1)
这个应该可以解决问题:
//td[.="Codice GIALLO"]/following-sibling::td/b