Question

我想在XmlReader个实例上引用一个使用C＃/ .NET的字符实体的网址，例如this w3c entity set定义 和其他字符。

如果我要用纯XML来完成它，那就像这样，或者变化：
<!ENTITY foo SYSTEM "http://example.org/myent.ent">

我实际上读取XHTML 源代码片段（包含命名实体），因此需要定义/识别名为HTML 4的XML 1.0 / Entity Sets defined by w3c。 /> （我在询问如何以编程方式在设置XmlReader及其设置以便读取片段时动态地引用它们;但是我对选项持开放态度）。

无论哪种方式，如果我不包含这些命名实体，读者将会咳嗽并产生.NET错误，例如 和其他非数字实体的以下XmlException：

测试'Xml_Tester.Test_Reading'失败： System.Xml.XmlException：引用未申报的实体'nbsp'。 6号线，第393位。

注意：我使用XHTML Schema集合属性成功引用XmlReaderSettings.Schemas，并假设必须有一种同样简单的方法来调用外部实体引用修改XML源代码，但它避开了我。

ETC：

我在搜索答案时遇到以下重要信息 - 这些在这里很有用......

支持实体
要使用实体，作者必须使用DTD机制。请参见1.5节以一起使用DTD和XML Schema。 - http://www.w3.org/TR/xhtml1-schema/#diffs

1.5。一起使用DTD和XML Schema
DTD验证和XML Schema验证不是互斥的。有时，作者可能希望在利用XML Schema验证时使用某些DTD功能（例如实体）。 - http://www.w3.org/TR/xhtml1-schema/#together

将XML文档与XInclude结合
必须在DTD或内部子集中声明外部实体。这将打开一个充满含义的潘多拉盒子，例如必须在Doctype声明中命名文档元素，并且验证读者可能要求在DTD中定义文档的完整内容模型。
- http://msdn.microsoft.com/en-us/library/aa302291.aspx#xinc_topic1

Answer 1

找到答案使用XMLReader实例来读取包含 等命名实体的XHTML源，而不会抛出XmlException

首先，我直接从W3C页面复制了以下XML示例：XML Schema中的XHTML 1.0，1.5. Using DTD and XML Schema together部分，支持引入命名实体字符并同时进行基于模式的验证：

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"[
<!ATTLIST html
    xmlns:xsi CDATA #FIXED "http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation CDATA #IMPLIED
>
]>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.w3.org/1999/xhtml
                          http://www.w3.org/2002/08/xhtml/xhtml1-strict.xsd">
  ...
</html>

并且替换XHTML片段，例如<body><div><b>xhtml stuff</b></div></body> ...位于上述示例中using System; using System.IO; using System.Xml;的位置。

这成功地将DTD（引用命名实体）与Schema验证混合在一起。遇到命名实体时，XMLReader不再抛出XMLExeption 成功！

处理上述示例的 C＃.NET代码

XmlReaderSettings settingsXRdr = new XmlReaderSettings();
settingsXRdr.ProhibitDtd = false;
settingsXRdr.CheckCharacters = true;
settingsXRdr.ConformanceLevel = ConformanceLevel.Document;
settingsXRdr.IgnoreProcessingInstructions = false;
settingsXRdr.IgnoreComments = false;
settingsXRdr.XmlResolver = new CustomXmlResolver();
settingsXRdr.ValidationType = ValidationType.DTD;

// This is a format string; notice the placeholder {0} where the fragment will be injected:

string mixFmtString1 = @"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Strict//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd""[
<!ATTLIST html
xmlns:xsi CDATA #FIXED ""http://www.w3.org/2001/XMLSchema-instance""
xsi:schemaLocation CDATA #IMPLIED
>
]>
<html xmlns=""http://www.w3.org/1999/xhtml"" lang=""en"" xml:lang=""en""
xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance""
xsi:schemaLocation=""http://www.w3.org/1999/xhtml
              http://www.w3.org/2002/08/xhtml/xhtml1-strict.xsd"">
<head><title></title></head>
<body>
<div>{0}</div>
</body>
</html>";


// Inject any well-formed fragment via the second argument
string xhtml = string.Format(mixFmtString1, "<b>Xhtml fragment w/named entity: &nbsp;</b>");

// Creates a validating reader (derived type) because of the above settings)
XmlReader rdr = XmlReader.Create(new StringReader(xhtml), settingsXRdr);

// Reads the entire XHTML document (validating it along the way).
while (rdr.Read()) {

    // Do whatever you want here for each piece processed.
    var dummy = rdr.NodeType.ToString();  // Access a string value for fun.
    // If you just want validation to occur then leave this an empty code block. 

}

核心逻辑如下。注意：这是逐字复制和粘贴的。某些设置可能是无聊或冗余，因此您可以调整以实现各种其他里程。

<center>

注意：此解决方案使用SHTML的严格模板，因此某些弃用的标记（如{{1}}）将使读者失败。您可能希望重新配置引用的项目以指向更宽容的loose XHTML template。

沿途的相关/有用资源：

类似的问题，以不同的方式解决：Reference to undeclared entity exception while working with XML

使用XmlReader时引用外部命名字符实体？

ETC：

1 个答案: