我最近在这里很快解决了一个解析问题,但这是一个我无法击败的新挑战。
这里有一个包含多个表的(可怕的)html页面:mxs link 我感兴趣的表是代码中的第二个,就在
之下 <DIV CLASS="main"><H3>funrace.MXSConcept.com</H3><H3>Recent Races</H3>
。
我需要的是收集所有种族以在下拉框中获得类似的内容:
40 minutes ago - 8M+1L at 2013 Motosport World GP Rd 09: Lommel (2 riders)
1 day ago - 8M+1L at 2013 EMF FrenchCup Rd5 : Lacapelle Marival (1 riders)
...
as for exemple $date is the date,
$race is the second column,
$link is hidden but is the URL of the first column (to use later in my dropdown)
注意: 日期似乎是在飞行中生成的,有些线路谈论新的跟踪记录 - &gt;必须删除这些行。
这是我试过的(嘿别笑了!):
require('simple_html_dom.php');
$doc = new DOMDocument;
//$doc->preserveWhiteSpace = false;
$doc->loadHTMLfile('http://mxsimulator.com/servers/mx.MXSConcept.com/');
$xpath = new DOMXPath($doc);
$table = array();
$xpath = new DOMXPath($doc);
$table2 = $doc->getElementsByTagName('table')->item(1);
// collect data
$data = array();
foreach ($table2->query('//tr') as $node) {
$rowData = array();
foreach ($table2->query('td', $node) as $cell) {
$rowData[] = $cell->nodeValue;
}
}
print_r($data);
答案 0 :(得分:1)
你必须使用 $ doc-&GT;负载(...) 对于外部文件。这里回答了类似的问题:Xpath and conditionally selecting descendants based on element value of ancestors
答案 1 :(得分:1)
首先,只需放弃require('simple_html_dom.php');
,因为您正在使用DOMDocument
和DOMXpath
。
其次,$table2->query('//tr')
这将失败,因为它不是DOMXpath
对象。它是DOMElement
。
$dom = new DOMDocument();
$dom->loadHTMLFile('http://mxsimulator.com/servers/mx.MXSConcept.com/');
$xpath = new DOMXpath($dom);
$data = array();
// target each table row of the first table
$target_table_rows = $xpath->query('//div[@class="main"]/table[1]/tr');
// if there are rows found,
if($target_table_rows->length > 0) {
// for each row, loop it
foreach($target_table_rows as $row_key => $row) {
// if the first td cell of this current row is empty
if(trim($xpath->query('./td[1]', $row)->item(0)->nodeValue) == '') {
continue; // then skip it
}
$data[] = array(
'datetime' => $xpath->query('./td[1]', $row)->item(0)->nodeValue,
'link' => $xpath->query('./td[1]/a', $row)->item(0)->getAttribute('href'),
'description' => $xpath->query('./td[2]', $row)->item(0)->nodeValue,
);
}
}
echo '<pre>';
print_r($data);
输出应如下所示:
Array
(
[0] => Array
(
[datetime] => 2014-08-14 15:32 UTC
[link] => /servers/mx.MXSConcept.com/races/825.html
[description] => 8M+1L at 2013 Johnson Mine MX (1 riders)
)
... and so on
答案 2 :(得分:1)
这是我需要更新链接的更新,但我确信这是一种更简单的方法。 目标是在同一个数组中有链接,这里我必须有第二个:
$dom = new DOMDocument();
$dom->loadHTMLFile($selectserv);
$xpath = new DOMXpath($dom);
$data = array();
$links = array();
// target each table row of the first table
$target_table_rows = $xpath->query('//div[@class="main"]/table[1]/tr');
// if there are rows found,
if($target_table_rows->length > 0) {
// for each row, loop it
foreach($target_table_rows as $row_key => $row) {
// if the first td cell of this current row is empty
if(trim($xpath->query('./td[1]', $row)->item(0)->nodeValue) == '') {
continue; // then skip it
}
// each td of this current row, push it inside the array data
foreach($row->childNodes as $td) {
$data[$row_key][] = $td->nodeValue;
}
}
foreach($target_table_rows as $container) {
$arr = $container->getElementsByTagName("a"); //get href tags
foreach($arr as $item) {
$href = $item->getAttribute("href"); //get the href value I think ?
$links[] = array(
'href' => $href //put href in the array
);
}
}
}