从this url开始,我想要了解HTML表格,特别是这个元素:
<td class="tbl_black_n_1" nowrap="">
<a href="popup.asp?tp=2100&lang=en&idm=553759" target="_blank"><img src="http://www.betonews.com//img/i_betfair.gif" width="12" height="10" border="0" alt=""></a>
<a href="popup.asp?tp=2110&lang=en&idm=553759" target="_blank"><img src="http://www.betonews.com//img/i_history.gif" width="12" height="10" border="0" alt=""></a>
</td>
以相同的方式构造了一百多个<tr>
,其中包含大量<td>
我设法循环使用xpath将所有数据存储在数据库中,除了一个:最后{{1}元素..我想要&#34; href&#34;第一个<td>
的属性。所以,在我的例子中:
&#34; popup.asp TP = 2100&安培;朗= EN&安培; IDM = 553759&#34;
但是当我运行我的查询时,id变量检索一个NULL值。帮助!
这是我的PHP代码,但我无法继续...
<a>
@LarsH我使用这个PHP代码来检索你所问的内容,结果是NULL
<?php
$url = 'http://www.betonews.com/table.asp?tp=2001&lang=en&dd=28&dm=7&dy=2014&df=1&dw=3';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
curl_close($ch);
$document = new DOMDocument();
$document->loadHTML($response);
$xpath = new DOMXPath($document);
$expression = '(//table[@cellpadding="3"])[1]/tr[position() > 1]';
$rows = $xpath->query($expression);
results = array();
foreach ($rows as $row) {
$result = array();
$td = $row->childNodes;
$id = $td->item(36)->childNodes->item(1)->attributes->getNamedItem("href")->nodeValue;
$result["id"] = $id;
$results[] = $result;
}
var_dump($results);
这是$expression = '(//table[@cellpadding="3"])[1]/tr[position() > 1]';
$rows = $xpath->query($expression);
$results = array();
foreach ($rows as $row) {
$td = $row->childNodes;
$ok = $td->item(36)->childNodes->item(1)->nodetype;
echo $ok;
}
的值,使用您建议的上一个表达式!
$row
哇!我们能够看到自己的价值!所以..如何肯定检索它!?谢谢
编辑:Yeesss!我终于明白了!我用{
[
0
] => array(1) {
[
"ok"
] => object(DOMAttr)#3 (21) {
[
"name"
] => string(4) "href" [
"specified"
] => bool(true) [
"value"
] => string(36) "popup.asp?tp=2100&lang=en&idm=556296" [
"ownerElement"
] => string(22) "(object value omitted)" [
"schemaTypeInfo"
] => NULL [
"nodeName"
] => string(4) "href" [
"nodeValue"
] => string(36) "popup.asp?tp=2100&lang=en&idm=556296" [
"nodeType"
] => int(2) [
"parentNode"
] => string(22) "(object value omitted)" [
"childNodes"
] => string(22) "(object value omitted)" [
"firstChild"
] => string(22) "(object value omitted)" [
"lastChild"
] => string(22) "(object value omitted)" [
"previousSibling"
] => NULL [
"nextSibling"
] => string(22) "(object value omitted)" [
"attributes"
] => NULL [
"ownerDocument"
] => string(22) "(object value omitted)" [
"namespaceURI"
] => NULL [
"prefix"
] => string(0) "" [
"localName"
] => string(4) "href" [
"baseURI"
] => NULL [
"textContent"
] => string(36) "popup.asp?tp=2100&lang=en&idm=556296"
}
}
!谢谢谢谢@LarsH
答案 0 :(得分:1)
问题可能是<td>
的第一个子节点实际上是一个文本节点,仅由空格组成。您可以通过查看nodetype:
$td->item(36)->childNodes->item(1)->nodetype
要解决此问题,您可以在XPath中尝试更多导航,例如
(//table[@cellpadding="3"])[1]/tr[position() > 1]/td[36]/a[1]/@href
然后循环遍历这些结果:
$expression = '(//table[@cellpadding="3"])[1]/tr[position() > 1]/td[19]/a[1]/@href';
$ids = $xpath->query($expression);
results = array();
foreach ($ids as $idNode) {
$result = array();
$result["id"] = $idNode->nodeValue;
$results[] = $result;
}
var_dump($results);