我有下面的PHP代码,它获取一个HTML文件并从中拉出表格,然后解析表格,并且单元数据返回正常,如Current Output
,我正在尝试获取href属性输出也像在Desired Output
片段中一样,如果href存在,我无法看到如何仅从单元格中定位href,我似乎只能获得节点值,任何帮助都非常感激。< / p>
当前输出
Array
(
[0] => Array
(
[id] => 213
[url] => Website
)
)
所需输出
Array
(
[0] => Array
(
[id] => 213
[url] => Website
[link] => example.com/page/1/
)
)
HTML
<table>
<tr>
<td>213</td>
<td><a href="example.com/page/1/">Website</a></td>
</tr>
</table>
PHP
$dom = new DOMDocument();
$html = $dom->loadHTMLFile($url);
$dom->preserveWhiteSpace = false;
$tables = $dom->getElementsByTagName('table');
$rows = $tables->item(0)->getElementsByTagName('tr');
$cols = $rows->item(0)->getElementsByTagName('th');
$row_headers = null;
foreach($cols AS $node) {
$row_headers[] = $node->nodeValue;
}
$table = array();
$rows = $tables->item(0)->getElementsByTagName('tr');
foreach($rows AS $row) {
$cols = $row->getElementsByTagName('td');
$row = array();
$i = 0;
foreach($cols AS $node) {
if ($row_headers != null) {
$row[$row_headers[$i]] = $node->nodeValue;
}
$i++;
}
if (!empty($row)) {
$table[] = $row;
}
}
我在嵌套的foreach $row['link'] = $node->getAttribute('href');
中尝试了foreach($cols AS $node)
,但它似乎也没有用。
答案 0 :(得分:1)
请参阅下面的代码和内联评论
$html = '<table>
<tr>
<td>213</td>
<td><a href="example.com/page/1/">Website</a></td>
</tr>
<tr>
<td>444</td>
<td><a href="example.org/page/1/">not a website</a></td>
</tr>
</table>';
$dom = new DOMDocument();
$html = $dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$rows = $dom->getElementsByTagName("tr");
foreach($rows as $row){
$cols = $row->getElementsByTagName('td');
$id = $cols->item(0)->nodeValue; // get the id, the first td element, index=0
$anchor = $cols->item(1)->nodeValue; // get the anchor text, the second td element, index=1
$url = $cols->item(1)->getElementsByTagName('a')->item(0)->getAttribute('href'); // get the url from the href attribute, the second td element, index=1
$result[] = array(
'id' => $id,
'anchor'=> $anchor,
'url'=>$url
);
}
print_r($result);
应输出此
Array
(
[0] => Array
(
[id] => 213
[anchor] => Website
[url] => example.com/page/1/
)
[1] => Array
(
[id] => 444
[anchor] => not a website
[url] => example.org/page/1/
)
)