PHP Dom解析表到数组

时间:2017-04-12 14:19:46

标签: arrays parsing multidimensional-array html-table

我有表结构:

<table class="c_order u_list">
    <thead>
        <tr>
        </tr>
    </thead>
    <tbody>
            <tr>
            <td>
                11.04.2017<br/>
                18:20            </td>
            <td><a target="_blank" href="/personal/order/detail/457/">A-457</a></td>
            <td>+7 (917) 119-11-42</td>
            <td>1685.20</td>
            <td>
                <a target="_blank" href="/resn/i/zda_2_1/">УШКА</a><br/>с. холмский, ул. Фрунзе, д. 11<br/>3477740087            </td>
            <td>Принят</td>
        </tr>
                <tr>
            <td>
                11.04.2017<br/>
                16:47            </td>
            <td><a target="_blank" href="/personal/order/detail/47565/">A-47565</a></td>
            <td>+7 (909) 556-77-99</td>
            <td>2574.80</td>
            <td>
                <a target="_blank" href="/kir/a/an_10/">ООО &quot;План&quot;</a><br/>г. Омск, ул. 10-летия Победы, д. 3;<br/>8845701069            </td>
            <td>Доставлен</td>
        </tr>

            </tbody>
</table>

我正在尝试使用我的PHP代码将其转换为数组:

$page = curl_exec ($ch);
curl_close ($ch);
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($page);
libxml_clear_errors();
$xpath = new DOMXpath($dom);
$data = array();
// get all table rows and rows which are not headers
$table_rows = $xpath->query('//tr');
foreach($table_rows as $row => $tr) {
    foreach($tr->childNodes as $td) {
        echo $td->nodeValue;
        $data[$row][] = preg_replace('~[\r\n]+~', '', trim($td->nodeValue));
    }
    $data[$row] = array_values(array_filter($data[$row]));
}
print_r($data);

但是我在数组中得到了错误的结果(没有href标签),但我需要得到类似的东西,包括td元素中的所有标签:

Array
(
    [0] => Array
    (
        [0] => 11.04.2017 18:20
        [1] => <a target="_blank" href="/personal/order/detail/457/">A-457</a>
        [2] => +7 (917) 119-11-42
        [3] => 1685.20
        [4] => <a target="_blank" href="/resn/i/zda_2_1/">УШКА</a><br/>с. холмский, ул. Фрунзе, д. 11<br/>3477740087
        [5] => Принят
    )

    [1] => Array
    (
        [0] => 11.04.2017 16:47
        [1] => <a target="_blank" href="/personal/order/detail/47565/">A-47565</a>
        [2] => +7 (909) 556-77-99
        [3] => 2574.80
        [4] => <a target="_blank" href="/kir/a/an_10/">ООО &quot;План&quot;</a><br/>г. Омск, ул. 10-летия Победы, д. 3;<br/>8845701069
        [5] => Доставлен
    )

如何为数组键索引命名?所以不要[0]而是['time']

2 个答案:

答案 0 :(得分:2)

DOMDocument的构造函数中,将编码指定为UTF-8

$dom = new DOMDocument('1.0', 'UTF-8');

要使preg_replace()函数安全地使用UTF-8字符串,您必须使用u修饰符:

$data[$row][] = preg_replace('~[\r\n]+~u', '', trim($td->nodeValue));

答案 1 :(得分:1)

 $table_rows = $xpath->query('//table/tbody/tr');
 $results = array();
            foreach($table_rows as $row) {
                $result = array();
                    $expression = './td[1]';
                        $result['Name1'] = preg_replace('~[\r\n\s]+~u', '_', trim($xpath->query($expression, $row)->item(0)->nodeValue));
                    $expression = './td[2]';
                        $result['Name2'] = $xpath->query($expression, $row)->item(0)->nodeValue;
                    $expression = './td[2]/a/@href';
                        $result['NameURL'] = $xpath->query($expression, $row)->item(0)->nodeValue;


                    $expression = './td[3]';
                        $result['Phone'] = $xpath->query($expression, $row)->item(0)->nodeValue;
                    $expression = './td[4]';
                        $result['Price'] = $xpath->query($expression, $row)->item(0)->nodeValue;
                            $expression = './td[5]/a/@href';
                                $result['Name10'][] = $xpath->query($expression, $row)->item(0)->nodeValue;
                            $expression = './td[5]/a';

                    $expression = './td[6]';
                        $result['Name11'] = $xpath->query($expression, $row)->item(0)->nodeValue;
                array_push($results, $result);        
            }

    print_r($results);