在PHP中通过XPath提取信息

时间:2015-04-04 09:17:30

标签: php html xml xpath

只是尝试从AEC网站提取一些信息(例如http://apps.aec.gov.au/eSearch/LocalitySearchResults.aspx?filter=3977&filterby=Postcode)。我正在运行的XPath查询是" //x:tbody/x:tr/x:td[4]/x:a",我已经在XPath Checker(Firefox扩展程序)中测试了它,它会提取相关的位置数据。

然后我使用PHP加载页面,执行查询然后遍历结果。

$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$html = curl_exec($ch);
curl_close($ch);

# Create a DOM parser object
$dom = new DOMDocument();
libxml_use_internal_errors(true);


 $dom->loadHTML($html);

$xpath = new DOMXpath($dom);

$elements = $xpath->query( '//tbody/tr/td[4]/a');


foreach ($elements as $element) {
     echo $element;
}

我接着:

Warning: Invalid argument supplied for foreach() in /home/givesh5/public_html/dig/electoratesearch.php on line 41

似乎查询返回某种布尔值而不是查询匹配列表?

相关标记如下:

<table cellspacing="0" rules="all" border="1" id="ContentPlaceHolderBody_gridViewLocalities" style="border-collapse:collapse;">
        <tr class="headingLink">
            <th scope="col"><a href="javascript:__doPostBack(&#39;ctl00$ContentPlaceHolderBody$gridViewLocalities&#39;,&#39;Sort$StateAb&#39;)">State</a></th><th scope="col"><a href="javascript:__doPostBack(&#39;ctl00$ContentPlaceHolderBody$gridViewLocalities&#39;,&#39;Sort$LocalityNm&#39;)">Locality/Suburb</a></th><th scope="col"><a href="javascript:__doPostBack(&#39;ctl00$ContentPlaceHolderBody$gridViewLocalities&#39;,&#39;Sort$Postcode&#39;)">Postcode</a></th><th scope="col"><a href="javascript:__doPostBack(&#39;ctl00$ContentPlaceHolderBody$gridViewLocalities&#39;,&#39;Sort$DivisionNm&#39;)">Electorate</a></th><th scope="col"><a href="javascript:__doPostBack(&#39;ctl00$ContentPlaceHolderBody$gridViewLocalities&#39;,&#39;Sort$DivisionNmRedistributed&#39;)">Redistributed Electorate</a></th><th scope="col">Other Locality(s)</th>
        </tr><tr>
            <td>VIC</td><td>BOTANIC RIDGE</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Flinders&amp;filterby=Electorate&amp;divid=211">Flinders</a></td><td></td><td>&nbsp;</td>
        </tr><tr>
            <td>VIC</td><td>CANNONS CREEK</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Flinders&amp;filterby=Electorate&amp;divid=211">Flinders</a></td><td></td><td>&nbsp;</td>
        </tr><tr>
            <td>VIC</td><td>CRANBOURNE</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Holt&amp;filterby=Electorate&amp;divid=216">Holt</a></td><td></td><td>&nbsp;</td>
        </tr><tr>
            <td>VIC</td><td>CRANBOURNE EAST</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Flinders&amp;filterby=Electorate&amp;divid=211">Flinders</a></td><td></td><td>&nbsp;</td>
        </tr><tr>
            <td>VIC</td><td>CRANBOURNE EAST</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Holt&amp;filterby=Electorate&amp;divid=216">Holt</a></td><td></td><td>&nbsp;</td>
        </tr><tr>
            <td>VIC</td><td>CRANBOURNE NORTH</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Holt&amp;filterby=Electorate&amp;divid=216">Holt</a></td><td></td><td>&nbsp;</td>
        </tr><tr>
            <td>VIC</td><td>CRANBOURNE SOUTH</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Flinders&amp;filterby=Electorate&amp;divid=211">Flinders</a></td><td></td><td>&nbsp;</td>
        </tr><tr>
            <td>VIC</td><td>CRANBOURNE WEST</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Holt&amp;filterby=Electorate&amp;divid=216">Holt</a></td><td></td><td>&nbsp;</td>
        </tr><tr>
            <td>VIC</td><td>DEVON MEADOWS</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Flinders&amp;filterby=Electorate&amp;divid=211">Flinders</a></td><td></td><td>&nbsp;</td>
        </tr><tr>
            <td>VIC</td><td>FIVEWAYS</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Flinders&amp;filterby=Electorate&amp;divid=211">Flinders</a></td><td></td><td><a href="LocalitySearchResults.aspx?filter=DEVON+MEADOWS&amp;filterby=LocalityorSuburb&amp;state=VIC">DEVON MEADOWS</a></td>
        </tr><tr>
            <td>VIC</td><td>JUNCTION VILLAGE</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Flinders&amp;filterby=Electorate&amp;divid=211">Flinders</a></td><td></td><td>&nbsp;</td>
        </tr><tr>
            <td>VIC</td><td>SANDHURST</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Isaacs&amp;filterby=Electorate&amp;divid=219">Isaacs</a></td><td></td><td>&nbsp;</td>
        </tr><tr>
            <td>VIC</td><td>SKYE</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Dunkley&amp;filterby=Electorate&amp;divid=210">Dunkley</a></td><td></td><td>&nbsp;</td>
        </tr><tr>
            <td>VIC</td><td>SKYE</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Isaacs&amp;filterby=Electorate&amp;divid=219">Isaacs</a></td><td></td><td>&nbsp;</td>
        </tr>
    </table>

2 个答案:

答案 0 :(得分:1)

  

似乎查询返回某种布尔值而不是查询匹配列表?

是的,它可以返回一个布尔值,然后它将是FALSE。它表示存在运行xpath查询的错误。这可能是由传递给DOMXpath::query()Php Manual的两个参数之一引起的, xpath表达式上下文节点

在您的情况下,您只使用一个参数,因此这表示xpath表达式是错误的。但是,您使用的那个没有错,并且不会导致布尔FALSE。但是当你遇到这个错误我假设可能有其他错误,所以可能xpath对象没有完全初始化,但即使没有或部分下载我模拟我无法重现错误。它可能与PHP版本有所不同?我不知道。

对于实际的xpath表达式,它应用 adeneo Gordon 已写入的内容, <tbody> - 元素插入到Firefox中的DOM,PHP中的DOMDocument实现在这里表现不同。您可以在这里模仿Firefox(更多工作) - 或者 - 您只是搜索实际的表元素,然后它工作正常。这是一个有效的例子:

$url = 'http://apps.aec.gov.au/eSearch/LocalitySearchResults.aspx?filter=3977&filterby=Postcode';

# Create a DOMDocument to parse HTML
$doc    = new DOMDocument();
$saved  = libxml_use_internal_errors(true);
$result = $doc->loadHTMLFile($url);
libxml_use_internal_errors($saved);
if (false === $result) {
    throw new UnexpectedValueException(sprintf('Failed to create DOMDocument from url %s', var_export($url, true)));
}

# Create a DOMXPath to get data from HTML document
$xpath = new DOMXpath($doc);

$expression = '//table/tr/td[4]/a';
$elements   = $xpath->query($expression);
if (false === $elements) {
    throw new UnexpectedValueException(sprintf('The xpath expression %s failed', var_export($expression, true)));
}

foreach ($elements as $index => $element) {
    printf("#%02d: %s\n", $index + 1, trim($element->textContent));
}

示例输出:

#01: Flinders
#02: Flinders
#03: Holt
#04: Flinders
#05: Holt
#06: Holt
#07: Flinders
#08: Holt
#09: Flinders
#10: Flinders
#11: Flinders
#12: Isaacs
#13: Dunkley
#14: Isaacs

答案 1 :(得分:0)

该HTML中没有tbody 浏览器会在需要时插入tbody个元素,但我们没有使用浏览器,我们使用的DOMDocument没有插入tbody元素。

相反,tr元素是表格的直接子元素

$elements = $xpath->query( '//table/tr/td[4]/a');

foreach ($elements as $element) {
     echo $dom->saveHTML($element);
}