如何删除字符串中的元素内容,只留下最外层的元素标签?

时间:2014-12-14 15:43:48

标签: php xpath domdocument

我有一个这样的字符串:

<p>
This is some text
</p>

<p>
This is some text
</p>
<p>
This is some text
</p>

<blockquote data-id="1">
    This is some text

    <blockquote data-id="2">
        This is some text
    </blockquote>
</blockquote>

<blockquote data-id="3">
    <blockquote data-id="4">
        This is some text

        <blockquote data-id="5">
            This is some text
        </blockquote>
    </blockquote>
    This is some text
</blockquote>

<blockquote data-id="6">
    This is some text
</blockquote>

我想保留最外面的blockquote标记,但删除内容。所以我想将上面的内容转换为:

<p>
This is some text
</p>

<p>
This is some text
</p>
<p>
This is some text
</p>

<blockquote data-id="1"></blockquote>

<blockquote data-id="3"></blockquote>

<blockquote data-id="6"></blockquote>

在PHP中执行此操作的有效方法是什么?

2 个答案:

答案 0 :(得分:1)

为这只猫皮肤涂抹的方法很多。我给字符串一个虚拟的根节点,抛弃与xpath表达式/root/blockquote/text() | /root/blockquote/*匹配的所有节点,然后从root的子节点重建字符串。


实施例

$string = <<<'STRING'
<p>
This is some text
</p>

<p>
This is some text
</p>
<p>
This is some text
</p>

<blockquote data-id="1">
    This is some text

    <blockquote data-id="2">
        This is some text
    </blockquote>
</blockquote>

<blockquote data-id="3">
    <blockquote data-id="4">
        This is some text

        <blockquote data-id="5">
            This is some text
        </blockquote>
    </blockquote>
    This is some text
</blockquote>

<blockquote data-id="6">
    This is some text
</blockquote>
STRING;

$dom = new DOMDocument();
$dom->loadXML("<root>$string</root>");
$xpath = new DOMXPath($dom);

foreach ($xpath->query('/root/blockquote/text() | /root/blockquote/*') as $node) {
    $node->parentNode->removeChild($node);
}

$string = '';
foreach ($dom->documentElement->childNodes as $node) {
    $string .= $dom->saveHTML($node);
}

echo $string;

输出:

<p>
This is some text
</p>

<p>
This is some text
</p>
<p>
This is some text
</p>

<blockquote data-id="1"></blockquote>

<blockquote data-id="3"></blockquote>

<blockquote data-id="6"></blockquote>

答案 1 :(得分:0)

在发布我的问题后不久,我发现DomDocument可以很好地解决这个问题(尽管可能有更好的解决方案)。

这就是我提出的:

$html = '<p>
This is some text
</p>

<p>
This is some text
</p>
<p>
This is some text
</p>

<blockquote data-id="1">
    This is some text

    <blockquote data-id="2">
        This is some text
    </blockquote>
</blockquote>

<blockquote data-id="3">
    <blockquote data-id="4">
        This is some text

        <blockquote data-id="5">
            This is some text
        </blockquote>
    </blockquote>
    This is some text
</blockquote>

<blockquote data-id="6">
    This is some text
</blockquote>';


libxml_use_internal_errors(true); // MUST INCLUDE THIS LINE!
$dom = new \DOMDocument();
$dom->loadHTML($html); // pass the HTML string

$xpath = new \DOMXPath($dom); // pass the appropriate DomDocument object to the constructor

foreach ($xpath->query('//blockquote') as $node) {
    /** @var \DOMElement $node */
    $node->nodeValue = '';
}

echo domInnerHtml($xpath->query('//body')->item(0));


 /**
 * Returns the inner HTML of a DOMNode
 *
 * @link http://stackoverflow.com/questions/2087103/innerhtml-in-phps-domdocument
 * @param DOMNode $element
 * @return string
 */
function domInnerHtml(DOMNode $element) {
    $innerHtml = '';
    $children  = $element->childNodes;

    foreach ($children as $child) {
        $innerHtml .= $element->ownerDocument->saveHTML($child);
    }

    return $innerHtml;
}

输出结果为:

<p>
This is some text
</p>

<p>
This is some text
</p>
<p>
This is some text
</p>

<blockquote data-id="1"></blockquote>

<blockquote data-id="3"></blockquote>

<blockquote data-id="6"></blockquote>