PHP DOMDocument错误实体'nbsp'未定义

时间:2012-03-18 16:51:12

标签: php dom domdocument

我使用DOMDocument编辑一些HTML文件,但是一些主题在其名称空间中。所以DOMDocument自动将空格更改为%20,然后找不到它们。

这就是如何准确查看错误:

Warning: DOMDocument::load() [domdocument.load]: Entity 'nbsp' not defined in file:///C:/Path/To/The/File/01%20c%2040-1964.html, line: 11 in C:/Path/To/class.php on line 51

你知道如何修复这个错误吗?

提前感谢您的答案

4 个答案:

答案 0 :(得分:13)

使用DOMDocument::loadHTMLFile()代替load()。这就是它的目的。 HTML不是XML。

XML不知道命名实体 。但是,如果使用loadHTML,XML解析器将加载HTML命名实体,以便错误消失。

请参阅:XML parser error: entity not defined

答案 1 :(得分:0)

以下是解析包含 之类的HTML实体的XHTML代码段的方法。

该示例来自一段代码,该代码分析了从Confluence API导出的此类XHTML,包括诸如<ac:structured-macro>之类的自定义元素。这就是为什么您需要设置名称空间的原因。更改或删除xmlns属性以适合您的用例。

$snippet = '<p>hello&nbsp;world</p>';

$entitydefs = XML_HTML_DEFS;

$xml = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [
$entitydefs
]>
<root
  xmlns:ac="http://www.atlassian.com/schema/confluence/4/ac/"
  xmlns:ri="http://www.atlassian.com/schema/confluence/4/ri/"
  xmlns="http://www.atlassian.com/schema/confluence/4/">
  $snippet
</root>
XML;

$dom = new DOMDocument();
$dom->loadXML($xml);

XML_HTML_DEFS所在的位置(您可以删除不会发生的内容):

const XML_HTML_DEFS = <<<ENTITIES
<!ENTITY amp "&#38;">
<!ENTITY lt "&#60;">
<!ENTITY gt "&#62;">
<!ENTITY Agrave "&#192;">
<!ENTITY Aacute "&#193;">
<!ENTITY Acirc "&#194;">
<!ENTITY Atilde "&#195;">
<!ENTITY Auml "&#196;">
<!ENTITY Aring "&#197;">
<!ENTITY AElig "&#198;">
<!ENTITY Ccedil "&#199;">
<!ENTITY Egrave "&#200;">
<!ENTITY Eacute "&#201;">
<!ENTITY Ecirc "&#202;">
<!ENTITY Euml "&#203;">
<!ENTITY Igrave "&#204;">
<!ENTITY Iacute "&#205;">
<!ENTITY Icirc "&#206;">
<!ENTITY Iuml "&#207;">
<!ENTITY ETH "&#208;">
<!ENTITY Ntilde "&#209;">
<!ENTITY Ograve "&#210;">
<!ENTITY Oacute "&#211;">
<!ENTITY Ocirc "&#212;">
<!ENTITY Otilde "&#213;">
<!ENTITY Ouml "&#214;">
<!ENTITY Oslash "&#216;">
<!ENTITY Ugrave "&#217;">
<!ENTITY Uacute "&#218;">
<!ENTITY Ucirc "&#219;">
<!ENTITY Uuml "&#220;">
<!ENTITY Yacute "&#221;">
<!ENTITY THORN "&#222;">
<!ENTITY szlig "&#223;">
<!ENTITY agrave "&#224;">
<!ENTITY aacute "&#225;">
<!ENTITY acirc "&#226;">
<!ENTITY atilde "&#227;">
<!ENTITY auml "&#228;">
<!ENTITY aring "&#229;">
<!ENTITY aelig "&#230;">
<!ENTITY ccedil "&#231;">
<!ENTITY egrave "&#232;">
<!ENTITY eacute "&#233;">
<!ENTITY ecirc "&#234;">
<!ENTITY euml "&#235;">
<!ENTITY igrave "&#236;">
<!ENTITY iacute "&#237;">
<!ENTITY icirc "&#238;">
<!ENTITY iuml "&#239;">
<!ENTITY eth "&#240;">
<!ENTITY ntilde "&#241;">
<!ENTITY ograve "&#242;">
<!ENTITY oacute "&#243;">
<!ENTITY ocirc "&#244;">
<!ENTITY otilde "&#245;">
<!ENTITY ouml "&#246;">
<!ENTITY oslash "&#248;">
<!ENTITY ugrave "&#249;">
<!ENTITY uacute "&#250;">
<!ENTITY ucirc "&#251;">
<!ENTITY uuml "&#252;">
<!ENTITY yacute "&#253;">
<!ENTITY thorn "&#254;">
<!ENTITY yuml "&#255;">
<!ENTITY nbsp "&#160;">
<!ENTITY iexcl "&#161;">
<!ENTITY cent "&#162;">
<!ENTITY pound "&#163;">
<!ENTITY curren "&#164;">
<!ENTITY yen "&#165;">
<!ENTITY brvbar "&#166;">
<!ENTITY sect "&#167;">
<!ENTITY uml "&#168;">
<!ENTITY copy "&#169;">
<!ENTITY ordf "&#170;">
<!ENTITY laquo "&#171;">
<!ENTITY not "&#172;">
<!ENTITY shy "&#173;">
<!ENTITY reg "&#174;">
<!ENTITY macr "&#175;">
<!ENTITY deg "&#176;">
<!ENTITY plusmn "&#177;">
<!ENTITY sup2 "&#178;">
<!ENTITY sup3 "&#179;">
<!ENTITY acute "&#180;">
<!ENTITY micro "&#181;">
<!ENTITY para "&#182;">
<!ENTITY cedil "&#184;">
<!ENTITY sup1 "&#185;">
<!ENTITY ordm "&#186;">
<!ENTITY raquo "&#187;">
<!ENTITY frac14 "&#188;">
<!ENTITY frac12 "&#189;">
<!ENTITY frac34 "&#190;">
<!ENTITY iquest "&#191;">
<!ENTITY times "&#215;">
<!ENTITY divide "&#247;">
<!ENTITY forall "&#8704;">
<!ENTITY part "&#8706;">
<!ENTITY exist "&#8707;">
<!ENTITY empty "&#8709;">
<!ENTITY nabla "&#8711;">
<!ENTITY isin "&#8712;">
<!ENTITY notin "&#8713;">
<!ENTITY ni "&#8715;">
<!ENTITY prod "&#8719;">
<!ENTITY sum "&#8721;">
<!ENTITY minus "&#8722;">
<!ENTITY lowast "&#8727;">
<!ENTITY radic "&#8730;">
<!ENTITY prop "&#8733;">
<!ENTITY infin "&#8734;">
<!ENTITY ang "&#8736;">
<!ENTITY and "&#8743;">
<!ENTITY or "&#8744;">
<!ENTITY cap "&#8745;">
<!ENTITY cup "&#8746;">
<!ENTITY int "&#8747;">
<!ENTITY there4 "&#8756;">
<!ENTITY sim "&#8764;">
<!ENTITY cong "&#8773;">
<!ENTITY asymp "&#8776;">
<!ENTITY ne "&#8800;">
<!ENTITY equiv "&#8801;">
<!ENTITY le "&#8804;">
<!ENTITY ge "&#8805;">
<!ENTITY sub "&#8834;">
<!ENTITY sup "&#8835;">
<!ENTITY nsub "&#8836;">
<!ENTITY sube "&#8838;">
<!ENTITY supe "&#8839;">
<!ENTITY oplus "&#8853;">
<!ENTITY otimes "&#8855;">
<!ENTITY perp "&#8869;">
<!ENTITY sdot "&#8901;">
<!ENTITY Alpha "&#913;">
<!ENTITY Beta "&#914;">
<!ENTITY Gamma "&#915;">
<!ENTITY Delta "&#916;">
<!ENTITY Epsilon "&#917;">
<!ENTITY Zeta "&#918;">
<!ENTITY Eta "&#919;">
<!ENTITY Theta "&#920;">
<!ENTITY Iota "&#921;">
<!ENTITY Kappa "&#922;">
<!ENTITY Lambda "&#923;">
<!ENTITY Mu "&#924;">
<!ENTITY Nu "&#925;">
<!ENTITY Xi "&#926;">
<!ENTITY Omicron "&#927;">
<!ENTITY Pi "&#928;">
<!ENTITY Rho "&#929;">
<!ENTITY Sigma "&#931;">
<!ENTITY Tau "&#932;">
<!ENTITY Upsilon "&#933;">
<!ENTITY Phi "&#934;">
<!ENTITY Chi "&#935;">
<!ENTITY Psi "&#936;">
<!ENTITY Omega "&#937;">
<!ENTITY alpha "&#945;">
<!ENTITY beta "&#946;">
<!ENTITY gamma "&#947;">
<!ENTITY delta "&#948;">
<!ENTITY epsilon "&#949;">
<!ENTITY zeta "&#950;">
<!ENTITY eta "&#951;">
<!ENTITY theta "&#952;">
<!ENTITY iota "&#953;">
<!ENTITY kappa "&#954;">
<!ENTITY lambda "&#955;">
<!ENTITY mu "&#956;">
<!ENTITY nu "&#957;">
<!ENTITY xi "&#958;">
<!ENTITY omicron "&#959;">
<!ENTITY pi "&#960;">
<!ENTITY rho "&#961;">
<!ENTITY sigmaf "&#962;">
<!ENTITY sigma "&#963;">
<!ENTITY tau "&#964;">
<!ENTITY upsilon "&#965;">
<!ENTITY phi "&#966;">
<!ENTITY chi "&#967;">
<!ENTITY psi "&#968;">
<!ENTITY omega "&#969;">
<!ENTITY thetasym "&#977;">
<!ENTITY upsih "&#978;">
<!ENTITY piv "&#982;">
<!ENTITY OElig "&#338;">
<!ENTITY oelig "&#339;">
<!ENTITY Scaron "&#352;">
<!ENTITY scaron "&#353;">
<!ENTITY Yuml "&#376;">
<!ENTITY fnof "&#402;">
<!ENTITY circ "&#710;">
<!ENTITY tilde "&#732;">
<!ENTITY ensp "&#8194;">
<!ENTITY emsp "&#8195;">
<!ENTITY thinsp "&#8201;">
<!ENTITY zwnj "&#8204;">
<!ENTITY zwj "&#8205;">
<!ENTITY lrm "&#8206;">
<!ENTITY rlm "&#8207;">
<!ENTITY ndash "&#8211;">
<!ENTITY mdash "&#8212;">
<!ENTITY lsquo "&#8216;">
<!ENTITY rsquo "&#8217;">
<!ENTITY sbquo "&#8218;">
<!ENTITY ldquo "&#8220;">
<!ENTITY rdquo "&#8221;">
<!ENTITY bdquo "&#8222;">
<!ENTITY dagger "&#8224;">
<!ENTITY Dagger "&#8225;">
<!ENTITY bull "&#8226;">
<!ENTITY hellip "&#8230;">
<!ENTITY permil "&#8240;">
<!ENTITY prime "&#8242;">
<!ENTITY Prime "&#8243;">
<!ENTITY lsaquo "&#8249;">
<!ENTITY rsaquo "&#8250;">
<!ENTITY oline "&#8254;">
<!ENTITY euro "&#8364;">
<!ENTITY trade "&#8482;">
<!ENTITY larr "&#8592;">
<!ENTITY uarr "&#8593;">
<!ENTITY rarr "&#8594;">
<!ENTITY darr "&#8595;">
<!ENTITY harr "&#8596;">
<!ENTITY crarr "&#8629;">
<!ENTITY lceil "&#8968;">
<!ENTITY rceil "&#8969;">
<!ENTITY lfloor "&#8970;">
<!ENTITY rfloor "&#8971;">
<!ENTITY loz "&#9674;">
<!ENTITY spades "&#9824;">
<!ENTITY clubs "&#9827;">
<!ENTITY hearts "&#9829;">
<!ENTITY diams "&#9830;">
ENTITIES;

答案 2 :(得分:0)

如果您加载xml-将htmlentities()与ENT_XML1标志一起使用。

$offerXml->addChild('name', htmlentities($name, ENT_XML1));

答案 3 :(得分:-1)

$textHTML = '&lt;ul&gt; &lt;li&gt;Dentro da ordem jur&amp;iacute;dica
brasileira.&lt;/li&gt; &lt;/ul&gt;

保存在XML文件中:

htmlspecialchars($textHTML, ENT_QUOTES);

像这样恢复文件:

$doc->load(file.xml);