最近我遇到了一个问题,我要做的是从HTML表中读取数据并将数据抓取到名为$id
的变量中。例如,我有这段代码:
<tr>
<td>413</td>
<td>Party Hat</td>
<td>0</td>
<td>No</td>
<td><a href="http://clubpenguincheatsnow.com/tools/swfviewer/items.swf?id=413">View SWF</a></td>
</tr>
我想要做的是另一个名为$array[$i]
的变量,它持有搜索查询。我希望我的PHP代码在表中搜索,直到找到包含该特定查询的部分。在这种情况下,将是“党的帽子”。在找到查询之后我想要它做的是查看ID,这是“Party Hat”名称上方的“td”部分,在这种情况下ID是413.在此之后我想要变量$ id to持有身份证。我该怎么做呢?任何帮助都会 HIGHLY 赞赏!
答案 0 :(得分:3)
使用Tidy,DOMDocument和DOMXPath(确保启用了PHP扩展程序),您可以执行以下操作:
<?php
$url = "http://example.org/test.html";
function get_data_from_table($id, $url)
{
// retrieve the content of that url
$content = file_get_contents($url);
// repair bad HTML
$tidy = tidy_parse_string($content);
$tidy->cleanRepair();
$content = (string)$tidy;
// load into DOM
$dom = new DOMDocument();
$dom->loadHTML($content);
// make xpath-able
$xpath = new DOMXPath($dom);
// search for the first td of each tr, where its content is $id
$query = "//tr/td[position()=1 and normalize-space(text())='$id']";
$elements = $xpath->query($query);
if ($elements->length != 1) {
// not exactly 1 result as expected? return number of hits
return $elements->length;
}
// our td was found
$element = $elements->item(0);
// get his parent element (tr)
$tr = $element->parentNode;
$data = array();
// iterate over it's td elements
foreach ($tr->getElementsByTagName("td") as $td) {
// retrieve the content as text
$data[] = $td->textContent;
}
// return the array of <td> contents
return $data;
}
echo '<pre>';
print_r(
get_data_from_table(
414,
$url
)
);
echo '</pre>';
您的HTML源代码(http://example.org/test.html):
<table><tr>
<td>413</td>
<td>Party Hat</td>
<td>0</td>
<td>No</td>
<td><a href="http://clubpenguincheatsnow.com/tools/swfviewer/items.swf?id=413">View SWF</a></td>
</tr><tr>
<td>414</td>
<td>Party Hat</td>
<td>0</td>
<td>No</td>
<td><a href="http://clubpenguincheatsnow.com/tools/swfviewer/items.swf?id=413">View SWF</a></td>
</tr>
(正如你所看到的,没有有效的HTML,但这没关系)
答案 1 :(得分:2)
这有效:(虽然有点难看,也许其他人可以想出更好的xpath解决方案)
$html = <<<HTML
<html>
<body>
<table>
<thead>
<tr>
<td>id</td>
<td>name</td>
<td>a</td>
<td>b</td>
<td>c</td>
</tr>
</thead>
<tbody>
<tr>
<td>413</td>
<td>Party Hat</td>
<td>0</td>
<td>No</td>
<td>a link</td>
</tr>
<tr>
<td>414</td>
<td>Party Hat 2</td>
<td>0</td>
<td>No</td>
<td>a link</td>
</tr>
</tbody>
</table>
</body>
</html>
HTML;
$doc = new DOMDocument();
$doc->loadHTML($html);
$domxpath = new DOMXPath($doc);
$res = $domxpath->query("//*[local-name() = 'td'][text() = 'Party Hat']/../td[position() = '1']");
var_dump($res->length, $res->item(0)->textContent);
输出:
INT(1)
string(3)“413”
答案 2 :(得分:0)
尝试通过loadHTML将html加载到新的DOMDocument中,并像使用xpath或其他类型的查询一样处理XML文档