我刚刚发现,当从DOMElement读取nodeValue时,它返回原始值(即未编码),但是当您设置该值时,必须先对其进行编码,否则会截断第一个无效实体的文本。
<?php
$doc = new DOMDocument('1.0', 'UTF-8');
$element_p = $doc->createElement('p'); // DOMElement
$element_p->nodeValue = 'This & that.');
print($element_p->nodeValue); // 'This '
$element_p->nodeValue = 'This & that.');
print($element_p->nodeValue); // 'This & that.'
// Setting it to itself truncates the text!
$element_p->nodeValue = $element_p->nodeValue;
print($element_p->nodeValue); // 'This '
// Encode before setting, don't decode after getting
$element_p->nodeValue = htmlspecialchars('This & that.');
print($element_p->nodeValue); // 'This & that.'
// Using htmlspecialchars() preserves the original text
$element_p->nodeValue = htmlspecialchars($element_p->nodeValue);
print($element_p->nodeValue); // 'This & that.'
这是预期的行为吗?因为当我发现它时它让我感到惊讶!