使用Xpath检索HTML表

时间:2014-07-28 16:39:33

标签: php html xpath

this url开始,我想要了解HTML表格,特别是这个元素:

<td class="tbl_black_n_1" nowrap="">
<a href="popup.asp?tp=2100&amp;lang=en&amp;idm=553759" target="_blank"><img src="http://www.betonews.com//img/i_betfair.gif" width="12" height="10" border="0" alt=""></a>
<a href="popup.asp?tp=2110&amp;lang=en&amp;idm=553759" target="_blank"><img src="http://www.betonews.com//img/i_history.gif" width="12" height="10" border="0" alt=""></a>
</td>

以相同的方式构造了一百多个<tr>,其中包含大量<td>我设法循环使用xpath将所有数据存储在数据库中,除了一个:最后{{1}元素..我想要&#34; href&#34;第一个<td>的属性。所以,在我的例子中:

&#34; popup.asp TP = 2100&安培;朗= EN&安培; IDM = 553759&#34;

但是当我运行我的查询时,id变量检索一个NULL值。帮助!

这是我的PHP代码,但我无法继续...

<a>

@LarsH我使用这个PHP代码来检索你所问的内容,结果是NULL

<?php
$url = 'http://www.betonews.com/table.asp?tp=2001&lang=en&dd=28&dm=7&dy=2014&df=1&dw=3';
$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$response = curl_exec($ch);

curl_close($ch);
$document = new DOMDocument();
$document->loadHTML($response);


$xpath = new DOMXPath($document);
$expression = '(//table[@cellpadding="3"])[1]/tr[position() > 1]';
$rows = $xpath->query($expression);

results = array();

foreach ($rows as $row) {
  $result = array();
  $td = $row->childNodes;
  $id = $td->item(36)->childNodes->item(1)->attributes->getNamedItem("href")->nodeValue‌​;
  $result["id"] = $id;
  $results[] = $result;
  }
  var_dump($results);

这是$expression = '(//table[@cellpadding="3"])[1]/tr[position() > 1]'; $rows = $xpath->query($expression); $results = array(); foreach ($rows as $row) { $td = $row->childNodes; $ok = $td->item(36)->childNodes->item(1)->nodetype; echo $ok; } 的值,使用您建议的上一个表达式!

$row

哇!我们能够看到自己的价值!所以..如何肯定检索它!?谢谢

编辑:Yeesss!我终于明白了!我用{ [ 0 ] => array(1) { [ "ok" ] => object(DOMAttr)#3 (21) { [ "name" ] => string(4) "href" [ "specified" ] => bool(true) [ "value" ] => string(36) "popup.asp?tp=2100&lang=en&idm=556296" [ "ownerElement" ] => string(22) "(object value omitted)" [ "schemaTypeInfo" ] => NULL [ "nodeName" ] => string(4) "href" [ "nodeValue" ] => string(36) "popup.asp?tp=2100&lang=en&idm=556296" [ "nodeType" ] => int(2) [ "parentNode" ] => string(22) "(object value omitted)" [ "childNodes" ] => string(22) "(object value omitted)" [ "firstChild" ] => string(22) "(object value omitted)" [ "lastChild" ] => string(22) "(object value omitted)" [ "previousSibling" ] => NULL [ "nextSibling" ] => string(22) "(object value omitted)" [ "attributes" ] => NULL [ "ownerDocument" ] => string(22) "(object value omitted)" [ "namespaceURI" ] => NULL [ "prefix" ] => string(0) "" [ "localName" ] => string(4) "href" [ "baseURI" ] => NULL [ "textContent" ] => string(36) "popup.asp?tp=2100&lang=en&idm=556296" } } !谢谢谢谢@LarsH

1 个答案:

答案 0 :(得分:1)

问题可能是<td>的第一个子节点实际上是一个文本节点,仅由空格组成。您可以通过查看nodetype

来测试该假设
$td->item(36)->childNodes->item(1)->nodetype

要解决此问题,您可以在XPath中尝试更多导航,例如

(//table[@cellpadding="3"])[1]/tr[position() > 1]/td[36]/a[1]/@href

然后循环遍历这些结果:

$expression = '(//table[@cellpadding="3"])[1]/tr[position() > 1]/td[19]/a[1]/@href';
$ids = $xpath->query($expression);

results = array();

foreach ($ids as $idNode) {
  $result = array();
  $result["id"] = $idNode->nodeValue;
  $results[] = $result;
}
var_dump($results);