我使用DOMDocument编辑一些HTML文件,但是一些主题在其名称空间中。所以DOMDocument自动将空格更改为%20,然后找不到它们。
这就是如何准确查看错误:
Warning: DOMDocument::load() [domdocument.load]: Entity 'nbsp' not defined in file:///C:/Path/To/The/File/01%20c%2040-1964.html, line: 11 in C:/Path/To/class.php on line 51
你知道如何修复这个错误吗?
提前感谢您的答案
答案 0 :(得分:13)
使用DOMDocument::loadHTMLFile()
代替load()
。这就是它的目的。 HTML不是XML。
XML不知道命名实体
。但是,如果使用loadHTML,XML解析器将加载HTML命名实体,以便错误消失。
答案 1 :(得分:0)
以下是解析包含
之类的HTML实体的XHTML代码段的方法。
该示例来自一段代码,该代码分析了从Confluence API导出的此类XHTML,包括诸如<ac:structured-macro>
之类的自定义元素。这就是为什么您需要设置名称空间的原因。更改或删除xmlns
属性以适合您的用例。
$snippet = '<p>hello world</p>';
$entitydefs = XML_HTML_DEFS;
$xml = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [
$entitydefs
]>
<root
xmlns:ac="http://www.atlassian.com/schema/confluence/4/ac/"
xmlns:ri="http://www.atlassian.com/schema/confluence/4/ri/"
xmlns="http://www.atlassian.com/schema/confluence/4/">
$snippet
</root>
XML;
$dom = new DOMDocument();
$dom->loadXML($xml);
XML_HTML_DEFS
所在的位置(您可以删除不会发生的内容):
const XML_HTML_DEFS = <<<ENTITIES
<!ENTITY amp "&">
<!ENTITY lt "<">
<!ENTITY gt ">">
<!ENTITY Agrave "À">
<!ENTITY Aacute "Á">
<!ENTITY Acirc "Â">
<!ENTITY Atilde "Ã">
<!ENTITY Auml "Ä">
<!ENTITY Aring "Å">
<!ENTITY AElig "Æ">
<!ENTITY Ccedil "Ç">
<!ENTITY Egrave "È">
<!ENTITY Eacute "É">
<!ENTITY Ecirc "Ê">
<!ENTITY Euml "Ë">
<!ENTITY Igrave "Ì">
<!ENTITY Iacute "Í">
<!ENTITY Icirc "Î">
<!ENTITY Iuml "Ï">
<!ENTITY ETH "Ð">
<!ENTITY Ntilde "Ñ">
<!ENTITY Ograve "Ò">
<!ENTITY Oacute "Ó">
<!ENTITY Ocirc "Ô">
<!ENTITY Otilde "Õ">
<!ENTITY Ouml "Ö">
<!ENTITY Oslash "Ø">
<!ENTITY Ugrave "Ù">
<!ENTITY Uacute "Ú">
<!ENTITY Ucirc "Û">
<!ENTITY Uuml "Ü">
<!ENTITY Yacute "Ý">
<!ENTITY THORN "Þ">
<!ENTITY szlig "ß">
<!ENTITY agrave "à">
<!ENTITY aacute "á">
<!ENTITY acirc "â">
<!ENTITY atilde "ã">
<!ENTITY auml "ä">
<!ENTITY aring "å">
<!ENTITY aelig "æ">
<!ENTITY ccedil "ç">
<!ENTITY egrave "è">
<!ENTITY eacute "é">
<!ENTITY ecirc "ê">
<!ENTITY euml "ë">
<!ENTITY igrave "ì">
<!ENTITY iacute "í">
<!ENTITY icirc "î">
<!ENTITY iuml "ï">
<!ENTITY eth "ð">
<!ENTITY ntilde "ñ">
<!ENTITY ograve "ò">
<!ENTITY oacute "ó">
<!ENTITY ocirc "ô">
<!ENTITY otilde "õ">
<!ENTITY ouml "ö">
<!ENTITY oslash "ø">
<!ENTITY ugrave "ù">
<!ENTITY uacute "ú">
<!ENTITY ucirc "û">
<!ENTITY uuml "ü">
<!ENTITY yacute "ý">
<!ENTITY thorn "þ">
<!ENTITY yuml "ÿ">
<!ENTITY nbsp " ">
<!ENTITY iexcl "¡">
<!ENTITY cent "¢">
<!ENTITY pound "£">
<!ENTITY curren "¤">
<!ENTITY yen "¥">
<!ENTITY brvbar "¦">
<!ENTITY sect "§">
<!ENTITY uml "¨">
<!ENTITY copy "©">
<!ENTITY ordf "ª">
<!ENTITY laquo "«">
<!ENTITY not "¬">
<!ENTITY shy "­">
<!ENTITY reg "®">
<!ENTITY macr "¯">
<!ENTITY deg "°">
<!ENTITY plusmn "±">
<!ENTITY sup2 "²">
<!ENTITY sup3 "³">
<!ENTITY acute "´">
<!ENTITY micro "µ">
<!ENTITY para "¶">
<!ENTITY cedil "¸">
<!ENTITY sup1 "¹">
<!ENTITY ordm "º">
<!ENTITY raquo "»">
<!ENTITY frac14 "¼">
<!ENTITY frac12 "½">
<!ENTITY frac34 "¾">
<!ENTITY iquest "¿">
<!ENTITY times "×">
<!ENTITY divide "÷">
<!ENTITY forall "∀">
<!ENTITY part "∂">
<!ENTITY exist "∃">
<!ENTITY empty "∅">
<!ENTITY nabla "∇">
<!ENTITY isin "∈">
<!ENTITY notin "∉">
<!ENTITY ni "∋">
<!ENTITY prod "∏">
<!ENTITY sum "∑">
<!ENTITY minus "−">
<!ENTITY lowast "∗">
<!ENTITY radic "√">
<!ENTITY prop "∝">
<!ENTITY infin "∞">
<!ENTITY ang "∠">
<!ENTITY and "∧">
<!ENTITY or "∨">
<!ENTITY cap "∩">
<!ENTITY cup "∪">
<!ENTITY int "∫">
<!ENTITY there4 "∴">
<!ENTITY sim "∼">
<!ENTITY cong "≅">
<!ENTITY asymp "≈">
<!ENTITY ne "≠">
<!ENTITY equiv "≡">
<!ENTITY le "≤">
<!ENTITY ge "≥">
<!ENTITY sub "⊂">
<!ENTITY sup "⊃">
<!ENTITY nsub "⊄">
<!ENTITY sube "⊆">
<!ENTITY supe "⊇">
<!ENTITY oplus "⊕">
<!ENTITY otimes "⊗">
<!ENTITY perp "⊥">
<!ENTITY sdot "⋅">
<!ENTITY Alpha "Α">
<!ENTITY Beta "Β">
<!ENTITY Gamma "Γ">
<!ENTITY Delta "Δ">
<!ENTITY Epsilon "Ε">
<!ENTITY Zeta "Ζ">
<!ENTITY Eta "Η">
<!ENTITY Theta "Θ">
<!ENTITY Iota "Ι">
<!ENTITY Kappa "Κ">
<!ENTITY Lambda "Λ">
<!ENTITY Mu "Μ">
<!ENTITY Nu "Ν">
<!ENTITY Xi "Ξ">
<!ENTITY Omicron "Ο">
<!ENTITY Pi "Π">
<!ENTITY Rho "Ρ">
<!ENTITY Sigma "Σ">
<!ENTITY Tau "Τ">
<!ENTITY Upsilon "Υ">
<!ENTITY Phi "Φ">
<!ENTITY Chi "Χ">
<!ENTITY Psi "Ψ">
<!ENTITY Omega "Ω">
<!ENTITY alpha "α">
<!ENTITY beta "β">
<!ENTITY gamma "γ">
<!ENTITY delta "δ">
<!ENTITY epsilon "ε">
<!ENTITY zeta "ζ">
<!ENTITY eta "η">
<!ENTITY theta "θ">
<!ENTITY iota "ι">
<!ENTITY kappa "κ">
<!ENTITY lambda "λ">
<!ENTITY mu "μ">
<!ENTITY nu "ν">
<!ENTITY xi "ξ">
<!ENTITY omicron "ο">
<!ENTITY pi "π">
<!ENTITY rho "ρ">
<!ENTITY sigmaf "ς">
<!ENTITY sigma "σ">
<!ENTITY tau "τ">
<!ENTITY upsilon "υ">
<!ENTITY phi "φ">
<!ENTITY chi "χ">
<!ENTITY psi "ψ">
<!ENTITY omega "ω">
<!ENTITY thetasym "ϑ">
<!ENTITY upsih "ϒ">
<!ENTITY piv "ϖ">
<!ENTITY OElig "Œ">
<!ENTITY oelig "œ">
<!ENTITY Scaron "Š">
<!ENTITY scaron "š">
<!ENTITY Yuml "Ÿ">
<!ENTITY fnof "ƒ">
<!ENTITY circ "ˆ">
<!ENTITY tilde "˜">
<!ENTITY ensp " ">
<!ENTITY emsp " ">
<!ENTITY thinsp " ">
<!ENTITY zwnj "‌">
<!ENTITY zwj "‍">
<!ENTITY lrm "‎">
<!ENTITY rlm "‏">
<!ENTITY ndash "–">
<!ENTITY mdash "—">
<!ENTITY lsquo "‘">
<!ENTITY rsquo "’">
<!ENTITY sbquo "‚">
<!ENTITY ldquo "“">
<!ENTITY rdquo "”">
<!ENTITY bdquo "„">
<!ENTITY dagger "†">
<!ENTITY Dagger "‡">
<!ENTITY bull "•">
<!ENTITY hellip "…">
<!ENTITY permil "‰">
<!ENTITY prime "′">
<!ENTITY Prime "″">
<!ENTITY lsaquo "‹">
<!ENTITY rsaquo "›">
<!ENTITY oline "‾">
<!ENTITY euro "€">
<!ENTITY trade "™">
<!ENTITY larr "←">
<!ENTITY uarr "↑">
<!ENTITY rarr "→">
<!ENTITY darr "↓">
<!ENTITY harr "↔">
<!ENTITY crarr "↵">
<!ENTITY lceil "⌈">
<!ENTITY rceil "⌉">
<!ENTITY lfloor "⌊">
<!ENTITY rfloor "⌋">
<!ENTITY loz "◊">
<!ENTITY spades "♠">
<!ENTITY clubs "♣">
<!ENTITY hearts "♥">
<!ENTITY diams "♦">
ENTITIES;
答案 2 :(得分:0)
如果您加载xml-将htmlentities()与ENT_XML1标志一起使用。
$offerXml->addChild('name', htmlentities($name, ENT_XML1));
答案 3 :(得分:-1)
$textHTML = '<ul> <li>Dentro da ordem jur&iacute;dica
brasileira.</li> </ul>
保存在XML文件中:
htmlspecialchars($textHTML, ENT_QUOTES);
像这样恢复文件:
$doc->load(file.xml);