解析"平坦" PHP DOM的HTML结构

时间:2014-07-11 20:30:50

标签: php json parsing dom html-parsing

我正在尝试使用PHP DOM帮助解析我想要转换为JSON的HTML文件。但是,遗憾的是HTML DOM相当平坦(我无法改变它)。平坦的我的意思是结构是这样的:

<h2>title</h2>
<span>child node</span>
<span>another child</span>
<h2>title</h2>
<span>child node</span>
<span>another child</span>
<h2>title</h2>
<span>child node</span>
<span>another child</span>

我需要能够获得<h2>并将<span>视为孩子。如果有更好的选择it's simply what I found in an answer I came across,我还没有完全开始使用PHP DOM,所以请随时提出建议。我真正需要的是将这个HTML字符串提供给JSON,到目前为止,PHP DOM看起来是我最好的选择。

1 个答案:

答案 0 :(得分:0)

$XML =<<<XML
    <h2>title</h2>
    <span>child node</span>
    <span>another child</span>
    <h2>title</h2>
    <span>child node</span>
    <span>another child</span>
    <h2>title </h2>
    <span>child node</span>
    <span>another child</span>
XML;

    $dom = new DOMDocument;
    $dom->loadHTML($XML);
    $xp = new DOMXPath($dom);

    $new = new DOMDocument;
    $root = $new->createElement('root');

    foreach($xp->query('/html//*/node()') as $i => $node) {
        if ($node->nodeType == XML_TEXT_NODE)
            continue;
        if ($node->nodeName == 'h2') {
            if(isset($current))
                $root->appendChild($current);
            $current = $new->createElement('div');
            $current->appendChild($new->importNode($node, true));
            continue;
        }
        $current->appendChild($new->importNode($node, true));
    }

    $new->appendChild($root);
    $xml2 = simplexml_load_string($new->saveHTML());
    echo json_encode($xml2);