使用DOMDocument将表的内容添加到数组中

时间:2018-08-21 10:28:12

标签: php xpath domdocument domxpath

我有$html

$html = '
<table id="myTable">
    <tbody>
        <tr>
            <td>08/20/18</td>
            <td> <a href="https://example.com/1a">Text 1 A</a> </td>
            <td> <a href="https://example.com/1b">Test 1 B</a> </td>
        </tr>
        <tr>
            <td>08/21/18</td>
            <td> <a href="https://example.com/2a">Text 2 A</a> </td>
            <td> <a href="https://example.com/2b">Test 2 B</a> </td>
        </tr>
    </tbody>
</table>
';

我要使用DOMDocument,将表的内容添加到多维$array中:

$array = array(
    // tr 1
    array(
        array(
            'content' => '08/20/18'
        ),
        array(
            'content' => 'Text 1 A',
            'href' => 'https://example.com/1a'
        ),
        array(
            'content' => 'Text 1 B',
            'href' => 'https://example.com/1b'
        )
    ),
    // tr 2
    array(
        array(
            'content' => '08/21/18'
        ),
        array(
            'content' => 'Text 2 A',
            'href' => 'https://example.com/1a'
        ),
        array(
            'content' => 'Text 2 B',
            'href' => 'https://example.com/1b'
        )
    )
);

到目前为止我已经尝试过的

我设法使用table获取了xpath的内容:

// setup DOMDocument
$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="utf-8" ?>' . $html); 
$xpath = new DOMXPath($doc);
// target table using xpath
$results = $xpath->query("//*[@id='myTable']");

if ($results->length > 0) {
    var_dump($results->item(0));
    var_dump($results->item(0)->nodeValue);
}

Test it。将每个tr的内容放入$array的方法是什么?

2 个答案:

答案 0 :(得分:1)

这看起来很有趣,所以我尝试了一下。

$dom = new DOMDocument;
$dom->loadHTML($html);
$multidimensional_array = [];
$i = 0;
$myTable = $dom->getElementById('myTable');
foreach ($myTable->getElementsByTagName('tr') as $tableRow) {
   foreach ($tableRow->getElementsByTagName('td') as $tableData) {

        foreach ($tableData->getElementsByTagName('a') as $a) {
            $href = ($a->getAttribute('href'));
        }

        if(isset($href) && !empty($href)){
            $multidimensional_array[$i][] = array(
                                                'content'   => $tableData->nodeValue,
                                                'href'      => $href
                                            );
            unset($href);
        }else{
            $multidimensional_array[$i][] = array(
                                                'content'   => $tableData->nodeValue
                                                );
        }

   }
   $i++;
}
print_r($multidimensional_array);

希望是问的。

编辑:在循环之前添加了特定的表搜索。

答案 1 :(得分:1)

<?php

$html = '
<table id="myTable">
    <tbody>
        <tr>
            <td>08/20/18</td>
            <td> <a href="https://example.com/1a">Text 1 A</a> </td>
            <td> <a href="https://example.com/1b">Test 1 B</a> </td>
        </tr>
        <tr>
            <td>08/21/18</td>
            <td> <a href="https://example.com/2a">Text 2 A</a> </td>
            <td> <a href="https://example.com/2b">Test 2 B</a> </td>
        </tr>
    </tbody>
</table>
';

$data = [];

$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="utf-8" ?>' . $html);
$xpath = new DOMXPath($doc);
$trs = $xpath->query("//*[@id='myTable']/tbody/tr");
foreach ($trs as $i => $tr) {
    /** @var DOMElement $td */
    foreach ($tr->childNodes as $td) {
        if ($td instanceof DOMElement) {
            /** @var DOMElement $a */
            $row = [];
            foreach ($td->childNodes as $a) {
                /** @var DOMAttr $attribute */
                $row['content'] = $td->nodeValue;
                if ($a->hasAttributes()) {
                    foreach ($a->attributes as $attribute) {
                        $row[$attribute->name] = $attribute->value;
                    }

                }

            }
            $data[$i][] = $row;
        }
    }
}

var_dump($data);