当使用PHP的DOMDocument并将preserveWhiteSpace设置为false并将formatOutput设置为true时,即使在同一元素中,也不会始终保留混合内容中的空格。
Source XML:
<p><span>one</span> <span>two</span> text <span>three</span> <span>four</span></p>
Expected output:
<p><span>one</span> <span>two</span> text <span>three</span> <span>four</span></p>
Actual output (space lost between "one" and "two"):
<p><span>one</span><span>two</span> text <span>three</span> <span>four</span></p>
使用另一个示例显示在某些情况下会保留空白区域:
$examples = array(
'<p>text <span>one</span> <span>two</span> text <span>three</span> <span>four</span></p>',
'<p><span>one</span> <span>two</span> text <span>three</span> <span>four</span></p>',
);
foreach ($examples as $example) {
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->loadXML($example);
$doc->formatOutput = true;
print $doc->saveXML();
}
// <p>text <span>one</span> <span>two</span> text <span>three</span> <span>four</span></p>
// <p><span>one</span><span>two</span> text <span>three</span> <span>four</span></p>
我猜测libxml用于检测混合内容的启发式方法在元素中不会向前看,因此只有在找到包含实际文本的文本节点后才开始保留空文本节点。
这是a)libxml中的错误(即使它警告自动格式化可能很危险)和/或b)使用DTD可以避免的事情吗?
答案 0 :(得分:0)
可以通过使用DTD并将元素声明为混合内容来防止空白区域丢失:
<?php
$xml = '<!DOCTYPE p [
<!ELEMENT p (#PCDATA|span)*>
<!ELEMENT span (#PCDATA)>
]>
<p><span>one</span> <span>two</span> text <span>three</span> <span>four</span></p>';
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->loadXML($xml);
$doc->formatOutput = true;
print $doc->saveXML($doc->documentElement);
// <p><span>one</span> <span>two</span> text <span>three</span> <span>four</span></p>