在C#中使用Xml Document取消注释XML片段

时间:2015-02-09 21:58:32

标签: c# regex xml parsing dom

在XML中取消注释某个节点主体的最简单方法是什么?元素具有唯一名称,文档结构如下:

somefile.xml

<?xml version="1.0"?>
<name1>
  <irrelevant1>
    <irrelevant2>
    <!--
      <irrelevant3 />
    -->
    </irrelevant2>
  </irrelevant1>
  <name2>
    <name3>
    <!--
      <name4 field="The" />
      <name4 field="Owls" />
      <name4 field="Are />
      <name4 field="Not" />
      <name4 field="What" />
      <name4 field="They" />
      <name4 field="Seem />
    -->
    </name3>
  </name2>
</name1>

目标应如下所示,删除评论:

uncommented.xml

<?xml version="1.0"?>
<name1>
  <irrelevant1>
    <irrelevant2>
    <!--
      <irrelevant3 />
    -->
    </irrelevant2>
  </irrelevant1>
  <name2>
    <name3>
      <name4 field="The" />
      <name4 field="Owls" />
      <name4 field="Are />
      <name4 field="Not" />
      <name4 field="What" />
      <name4 field="They" />
      <name4 field="Seem />
    </name3>
  </name2>
</name1>

我的解析方法:

XmlDocument xdoc = new XmlDocument();
xdoc.Load(@"C:\somefile.xml");

XmlNodeList nl = xdoc.GetElementsByTagName("name2");

XmlNode xn = nl[0];
string xn_content = xn.InnerXml;

xn_content = Regex.Replace(xn_content, "<!--|-->", String.Empty);

XmlDocument doc = new XmlDocument();
doc.LoadXml(xn_content);
XmlNode newNode = doc.DocumentElement;

// this import doesn't really help
xdoc.ImportNode(newNode, true);
xn.RemoveAll();
xn.AppendChild(newNode);

xdoc.Save(@"C:\uncommented.xml");

ArgumentException的结果:

  

{&#34;要插入的节点来自不同的文档上下文。&#34;}

1 个答案:

答案 0 :(得分:1)

您当前的问题是您调用XmlDocument.ImportNode()但不使用返回的节点。您需要newNode = xDoc.ImportNode(newNode, true);

但是,更简洁的方法是完全避免Regex解析。相反,下降XmlNode层次结构,选择要取消注释的XmlComment个节点,将InnerText加载到XmlDocumentFragment,然后将新创建的子节点添加到父节点评论:

public static class XmlNodeExtensions
{
    public static XmlDocument Document(this XmlNode node)
    {
        for (; node != null; node = node.ParentNode)
        {
            var doc = node as XmlDocument;
            if (doc != null)
                return doc;
        }
        return null;
    }

    public static IEnumerable<XmlNode> AncestorsAndSelf(this XmlNode node)
    {
        for (; node != null; node = node.ParentNode)
            yield return node;
    }

    public static IEnumerable<XmlNode> DescendantsAndSelf(this XmlNode root)
    {
        if (root == null)
            yield break;
        yield return root;
        foreach (var child in root.ChildNodes.Cast<XmlNode>())
            foreach (var subChild in child.DescendantsAndSelf())
                yield return subChild;
    }

    public static void UncommentXmlNodes(IEnumerable<XmlComment> comments)
    {
        foreach (var comment in comments.ToList())
            UncommentXmlNode(comment);
    }

    public static void UncommentXmlNode(XmlComment comment)
    {
        if (comment == null)
            throw new NullReferenceException();
        var doc = comment.Document();
        if (doc == null)
            throw new InvalidOperationException();
        var parent = comment.ParentNode;
        var innerText = comment.InnerText;
        XmlDocumentFragment docFrag = doc.CreateDocumentFragment();
        //Set the contents of the document fragment.
        docFrag.InnerXml = innerText;
        XmlNode insertAfter = comment;
        foreach (var child in docFrag.ChildNodes.OfType<XmlElement>().ToList())
        {
            insertAfter = parent.InsertAfter(child, insertAfter);
        }
        parent.RemoveChild(comment);
    }
}

然后称之为:

        string xml = @"<?xml version=""1.0""?>
        <name1>
          <irrelevant1>
            <irrelevant2>
            <!--
              <irrelevant3 />
            -->
            </irrelevant2>
          </irrelevant1>
          <name2>
            <name3>
            <!--
              <name4 field=""The"" />
              <name4 field=""Owls"" />
              <name4 field=""Are"" />
              <name4 field=""Not"" />
              <name4 field=""What"" />
              <name4 field=""They"" />
              <name4 field=""Seem"" />
            -->
            </name3>
          </name2>
        </name1>
        ";
        var xmlDoc = new XmlDocument();
        xmlDoc.LoadXml(xml);
        Debug.WriteLine(xmlDoc.ToXml());

        XmlNodeExtensions.UncommentXmlNodes(xmlDoc.DocumentElement.DescendantsAndSelf().OfType<XmlComment>().Where(c => c.ParentNode.Name == "name3"));

        Debug.WriteLine(xmlDoc.ToXml());

请注意,您评论的XML无效。 <name4 field="Are />应为<name4 field="Are"/><name4 field="Seem />应为<name4 field="Seem"/>。我在测试用例中为你解决了这个问题,因为我认为这是一个错字。