如何在C#中检索XML实体值?

时间:2011-08-11 15:29:07

标签: c# .net xml entity

我希望能够在C#/ .NET 4.0应用程序中显示实体名称和值的列表。

我可以使用XmlDocument.DocumentType.Entities轻松检索实体名称,但有没有一种方法可以检索这些实体的值?

我注意到我可以使用InnerText检索纯文本实体的值,但这对包含XML标记的实体不起作用。

采用正则表达式的最佳方式是什么?

假设我有一个这样的文件:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document [
  <!ENTITY test "<para>only a test</para>">
  <!ENTITY wwwc "World Wide Web Corporation">
  <!ENTITY copy "&#xA9;">
]>

<document>
  <!-- The following image is the World Wide Web Corporation logo. -->
  <graphics image="logo" alternative="&wwwc; Logo"/>
</document>

我想向用户提供一个列表,其中包含三个实体名称(test,wwwc和copy)及其值(名称后引号中的文本)。我没有想过嵌套在其他实体中的实体的问题,所以我会对完全扩展实体值或显示文本的解决方案感兴趣。

4 个答案:

答案 0 :(得分:2)

虽然这不太可能是最优雅的解决方案,但我想出了一些似乎适用于我的目的的东西。首先,我解析原始文档并从该文档中检索实体节点。然后我创建了一个小的内存中XML文档,我添加了所有实体节点。接下来,我向临时XML中的所有实体添加了实体引用。最后,我从所有引用中检索了InnerXml。

以下是一些示例代码:

        // parse the original document and retrieve its entities
        XmlDocument parsedXmlDocument = new XmlDocument();
        XmlUrlResolver resolver = new XmlUrlResolver();
        resolver.Credentials = CredentialCache.DefaultCredentials;
        parsedXmlDocument.XmlResolver = resolver;
        parsedXmlDocument.Load(path);

        // create a temporary xml document with all the entities and add references to them
        // the references can then be used to retrieve the value for each entity
        XmlDocument entitiesXmlDocument = new XmlDocument();
        XmlDeclaration dec = entitiesXmlDocument.CreateXmlDeclaration("1.0", null, null);
        entitiesXmlDocument.AppendChild(dec);
        XmlDocumentType newDocType = entitiesXmlDocument.CreateDocumentType(parsedXmlDocument.DocumentType.Name, parsedXmlDocument.DocumentType.PublicId, parsedXmlDocument.DocumentType.SystemId, parsedXmlDocument.DocumentType.InternalSubset);
        entitiesXmlDocument.AppendChild(newDocType);
        XmlElement root = entitiesXmlDocument.CreateElement("xmlEntitiesDoc");
        entitiesXmlDocument.AppendChild(root);
        XmlNamedNodeMap entitiesMap = entitiesXmlDocument.DocumentType.Entities;

        // build a dictionary of entity names and values
        Dictionary<string, string> entitiesDictionary = new Dictionary<string, string>();
        for (int i = 0; i < entitiesMap.Count; i++)
        {
            XmlElement entityElement = entitiesXmlDocument.CreateElement(entitiesMap.Item(i).Name);
            XmlEntityReference entityRefElement = entitiesXmlDocument.CreateEntityReference(entitiesMap.Item(i).Name);
            entityElement.AppendChild(entityRefElement);
            root.AppendChild(entityElement);
            if (!string.IsNullOrEmpty(entityElement.ChildNodes[0].InnerXml))
            {
                // do not add parameter entities or invalid entities
                // this can be determined by checking for an empty string
                entitiesDictionary.Add(entitiesMap.Item(i).Name, entityElement.ChildNodes[0].InnerXml);
            }
        }

答案 1 :(得分:1)

这是一种方式(未经测试),它使用此类的XMLReader和ResolveEntity()方法:

private Dictionary<string, string> GetEntities(XmlReader xr)
{
    Dictionary<string, string> entityList = new Dictionary<string, string>();

    while (xr.Read())
    {
        HandleNode(xr, entityList);
    }
    return entityList;
}

StringBuilder sbEntityResolver;
int extElementIndex = 0;
int resolveEntityNestLevel = -1;
string dtdCurrentTopEntity = "";

private void HandleNode(XmlReader inReader, Dictionary<string, string> entityList)
{
    if (inReader.NodeType == XmlNodeType.Element)
    {
        if (resolveEntityNestLevel < 0)
        {
                while (inReader.MoveToNextAttribute())
                {
                    HandleNode(inReader, entityList); // for namespaces
                    while (inReader.ReadAttributeValue())
                    {
                        HandleNode(inReader, entityList); // recursive for resolving entity refs in attributes
                    }                       
                }
        }
        else
        {
            extElementIndex++;
            sbEntityResolver.Append(inReader.ReadOuterXml());
            resolveEntityNestLevel--;
            if (!entityList.ContainsKey(dtdCurrentTopEntity))
            {
                entityList.Add(dtdCurrentTopEntity, sbEntityResolver.ToString());
            }
        }
    }
    else if (inReader.NodeType == XmlNodeType.EntityReference)
    {
        if (inReader.Name[0] != '#' && !entityList.ContainsKey(inReader.Name))
        {
            if (resolveEntityNestLevel < 0)
            {
                sbEntityResolver = new StringBuilder(); // start building entity
                dtdCurrentTopEntity = inReader.Name;
            }
            // entityReference can have contents that contains other
            // entityReferences, so keep track of nest level
            resolveEntityNestLevel++;
            inReader.ResolveEntity();
        }
    }
    else if (inReader.NodeType == XmlNodeType.EndEntity)
    {
        resolveEntityNestLevel--;
        if (resolveEntityNestLevel < 0)
        {
            if (!entityList.ContainsKey(dtdCurrentTopEntity))
            {
                entityList.Add(dtdCurrentTopEntity, sbEntityResolver.ToString());
            }
        }
    }
    else if (inReader.NodeType == XmlNodeType.Text)
    {
        if (resolveEntityNestLevel > -1)
        {
            sbEntityResolver.Append(inReader.Value);
        }
    }
}

答案 2 :(得分:0)

如果你有一个XmlDocument对象,也许以递归方式逐步遍历每个XmlNode对象(来自XmlDocument.ChildNodes)会更容易,对于每个节点,你可以使用{{1获取节点名称的属性。然后“获取值”取决于您想要的内容(Name用于字符串表示,InnerXml用于编程访问ChildNodes对象,可以转换为XmlNode / { {1}} / XmlEntity)。

答案 3 :(得分:0)

只需递归地遍历树,即可轻松显示XML文档的表示。

这个小班级碰巧使用了控制台,但您可以根据需要轻松修改它。

public static class XmlPrinter {
   private const Int32 SpacesPerIndent = 3;

   public static void Print(XDocument xDocument) {
      if (xDocument == null) {
         Console.WriteLine("No XML Document Provided");
         return;
      }

      PrintElementRecursive(xDocument.Root);
   }

   private static void PrintElementRecursive(XElement element, Int32 indentationLevel = 0) {
      if(element == null) return;

      PrintIndentation(indentationLevel);
      PrintElement(element);
      PrintNewline();

      foreach (var xAttribute in element.Attributes()) {
         PrintIndentation(indentationLevel + 1);
         PrintAttribute(xAttribute);
         PrintNewline();
      }

      foreach (var xElement in element.Elements()) {
         PrintElementRecursive(xElement, indentationLevel+1);
      }
   }

   private static void PrintAttribute(XAttribute xAttribute) {
      if (xAttribute == null) return;

      Console.Write("[{0}] = \"{1}\"", xAttribute.Name, xAttribute.Value);
   }

   private static void PrintElement(XElement element) {
      if (element == null) return;

      Console.Write("{0}", element.Name);

      if(!String.IsNullOrWhiteSpace(element.Value))
         Console.Write(" : {0}", element.Value);
   }

   private static void PrintIndentation(Int32 level) {
      Console.Write(new String(' ', level * SpacesPerIndent));
   }

   private static void PrintNewline() {
      Console.Write(Environment.NewLine);
   }
}

使用该课程是微不足道的。以下是打印出当前配置文件的示例:

static void Main(string[] args) {
   XmlPrinter.Print(XDocument.Load(
      ConfigurationManager.OpenExeConfiguration(ConfigurationUserLevel.None).FilePath
                        ));

   Console.ReadKey();
}

亲自尝试,你应该能够快速修改以获得你想要的东西。