防止白色空间从具有混合内容的XML元素中消失

时间:2013-09-10 10:26:37

标签: php whitespace domdocument

当使用PHP的DOMDocument并将preserveWhiteSpace设置为false并将formatOutput设置为true时,即使在同一元素中,也不会始终保留混合内容中的空格。

Source XML:
<p><span>one</span> <span>two</span> text <span>three</span> <span>four</span></p>

Expected output:
<p><span>one</span> <span>two</span> text <span>three</span> <span>four</span></p>

Actual output (space lost between "one" and "two"):
<p><span>one</span><span>two</span> text <span>three</span> <span>four</span></p>

使用另一个示例显示在某些情况下会保留空白区域:

$examples = array(
    '<p>text <span>one</span> <span>two</span> text <span>three</span> <span>four</span></p>',
    '<p><span>one</span> <span>two</span> text <span>three</span> <span>four</span></p>',
);

foreach ($examples as $example) {
    $doc = new DOMDocument;
    $doc->preserveWhiteSpace = false;
    $doc->loadXML($example);
    $doc->formatOutput = true;

    print $doc->saveXML();
}

// <p>text <span>one</span> <span>two</span> text <span>three</span> <span>four</span></p>
// <p><span>one</span><span>two</span> text <span>three</span> <span>four</span></p>

我猜测libxml用于检测混合内容的启发式方法在元素中不会向前看,因此只有在找到包含实际文本的文本节点后才开始保留空文本节点。

这是a)libxml中的错误(即使它警告自动格式化可能很危险)和/或b)使用DTD可以避免的事情吗?

1 个答案:

答案 0 :(得分:0)

可以通过使用DTD并将元素声明为混合内容来防止空白区域丢失:

<?php

$xml = '<!DOCTYPE p [
<!ELEMENT p (#PCDATA|span)*>
<!ELEMENT span (#PCDATA)>
]>
<p><span>one</span> <span>two</span> text <span>three</span> <span>four</span></p>';

$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->loadXML($xml);
$doc->formatOutput = true;

print $doc->saveXML($doc->documentElement);

// <p><span>one</span> <span>two</span> text <span>three</span> <span>four</span></p>