从td元素中拉出节点属性以及节点值

时间:2015-11-10 14:44:21

标签: php html

我有下面的PHP代码,它获取一个HTML文件并从中拉出表格,然后解析表格,并且单元数据返回正常,如Current Output,我正在尝试获取href属性输出也像在Desired Output片段中一样,如果href存在,我无法看到如何仅从单元格中定位href,我似乎只能获得节点值,任何帮助都非常感激。< / p>

当前输出

Array
(
    [0] => Array
        (
            [id] => 213
            [url] => Website
        )
)

所需输出

Array
(
    [0] => Array
        (
            [id] => 213
            [url] => Website
            [link] => example.com/page/1/
        )
)

HTML

<table>
    <tr>
        <td>213</td>
        <td><a href="example.com/page/1/">Website</a></td>
    </tr>
</table>

PHP

$dom = new DOMDocument();
$html = $dom->loadHTMLFile($url);

$dom->preserveWhiteSpace = false;

$tables = $dom->getElementsByTagName('table');
$rows = $tables->item(0)->getElementsByTagName('tr');
$cols = $rows->item(0)->getElementsByTagName('th');
$row_headers = null;

foreach($cols AS $node) {
    $row_headers[] = $node->nodeValue;
}

$table = array();
$rows = $tables->item(0)->getElementsByTagName('tr');
foreach($rows AS $row) {
    $cols = $row->getElementsByTagName('td');
    $row = array();
    $i = 0;
    foreach($cols AS $node) {
        if ($row_headers != null) {
            $row[$row_headers[$i]] = $node->nodeValue;
        }
        $i++;
    }
    if (!empty($row)) {
        $table[] = $row;
    }
}

我在嵌套的foreach $row['link'] = $node->getAttribute('href');中尝试了foreach($cols AS $node),但它似乎也没有用。

1 个答案:

答案 0 :(得分:1)

请参阅下面的代码和内联评论

$html = '<table>
    <tr>
        <td>213</td>
        <td><a href="example.com/page/1/">Website</a></td>
    </tr>
    <tr>
        <td>444</td>
        <td><a href="example.org/page/1/">not a website</a></td>
    </tr>
</table>';

$dom = new DOMDocument();
$html = $dom->loadHTML($html);

$dom->preserveWhiteSpace = false;

$rows = $dom->getElementsByTagName("tr");

foreach($rows as $row){

    $cols = $row->getElementsByTagName('td'); 

    $id = $cols->item(0)->nodeValue; // get the id, the first td element, index=0
    $anchor = $cols->item(1)->nodeValue; // get the anchor text, the second td element, index=1
    $url    = $cols->item(1)->getElementsByTagName('a')->item(0)->getAttribute('href'); // get the url from the href attribute, the second td element, index=1

    $result[] = array(
        'id' => $id,
        'anchor'=> $anchor,
        'url'=>$url
    );
}

print_r($result);

应输出此

Array
(
    [0] => Array
        (
            [id] => 213
            [anchor] => Website
            [url] => example.com/page/1/
        )

    [1] => Array
        (
            [id] => 444
            [anchor] => not a website
            [url] => example.org/page/1/
        )

)