php DomXPath - 如何从nodeValue中删除html标签及其内容?

时间:2014-06-29 12:10:57

标签: php xpath domxpath

在此代码中

<root>
    <main>
        <cont>
            <p>hello<a>world</a></p>
            <p>hello</p>
            <p>hello<a>world</a></p>
        </cont>
    </main>
</root>

我只需要获取<cont>标记内的文字。没有获得<a>代码和its contents

因此,结果将是hello hello hello,而不是world

2 个答案:

答案 0 :(得分:1)

simplexml_load_string()simplexml_load_file()应该足够了:

$xml_string = '<root> <main> <cont> <p>hello<a>world</a></p> <p>hello</p> <p>hello<a>world</a></p> </cont> </main></root>';
$xml = simplexml_load_string($xml_string);
$p = $xml->main->cont->p;
foreach($p as $value) {
    $parapgraphs[] = (string) $value;
}

echo '<pre>';
print_r($parapgraphs);

应该显示如下内容:

Array
(
    [0] => hello
    [1] => hello
    [2] => hello
)

答案 1 :(得分:1)

您可以选择作为每个<p>标记的直接后代的文本节点:

$dom = new DOMDocument;
$dom->loadXml($xmlData);

$xpath = new DOMXpath($dom);

foreach ($xpath->query('//cont/p/text()') as $text) {
    echo $text->textContent, "\n";
}