Question

我需要遍历书签开头和书签结束标签之间的节点。问题似乎分解为树遍历，但我无法确定正确的算法。书签开始和结束元素是非复合节点（没有子节点），并且可以出现在树中的任意深度。书签开始也不能保证在同一深度。

如果您绘制文档的树结构，我想检查开始和结束书签之间的所有节点。我认为从节点x开始并在节点y结束的遍历不平衡树的算法将起作用。这听起来是否可行，或者我错过了什么。

如果这是可行的，你可以指出我可以完成返回节点的树遍历的方向吗？

Answer 1

这取决于你想要做什么，但是，如果你主要对两个书签之间的文本感兴趣，那么这就是其中XmlDocument / XPath语义比LINQ to XML更容易使用的情况之一Open XML SDK V2的类型化对象模型。 XPath的'following :: *'轴的语义是你想要的。以下示例使用XmlDocument和XPath打印书签开头和结尾之间的节点名称。

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Xml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;

class Program
{
    public static XmlDocument GetXmlDocument(OpenXmlPart part)
    {
        XmlDocument xmlDoc = new XmlDocument();
        using (Stream partStream = part.GetStream())
        using (XmlReader partXmlReader = XmlReader.Create(partStream))
            xmlDoc.Load(partXmlReader);
        return xmlDoc;
    }

    static void Main(string[] args)
    {
        using (WordprocessingDocument doc =
            WordprocessingDocument.Open("Test.docx", false))
        {
            XmlDocument xmlDoc = GetXmlDocument(doc.MainDocumentPart);
            string wordNamespace =
                "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
            XmlNamespaceManager nsmgr =
                new XmlNamespaceManager(xmlDoc.NameTable);
            nsmgr.AddNamespace("w", wordNamespace);
            XmlElement bookmarkStart = (XmlElement)xmlDoc.SelectSingleNode("descendant::w:bookmarkStart[@w:id='0']", nsmgr);
            XmlNodeList nodesFollowing = bookmarkStart.SelectNodes("following::*", nsmgr);
            var nodesBetween = nodesFollowing
                .Cast<XmlNode>()
                .TakeWhile(n =>
                    {
                        if (n.Name != "w:bookmarkEnd")
                            return true;
                        if (n.Attributes.Cast<XmlAttribute>().Any(a => a.Name == "w:id" && a.Value == "0"))
                            return false;
                        return true;
                    });
            foreach (XmlElement item in nodesBetween)
            {
                Console.WriteLine(item.Name);
                if (item.Name == "w:bookmarkStart" || item.Name == "w:bookmarkEnd")
                    foreach (XmlAttribute att in item.Attributes)
                        Console.WriteLine("{0}:{1}", att.Name, att.Value);
            }
        }
    }
}

Answer 2

我已经整理了一个可以轻松检索书签文本的算法。

How to Retrieve the Text of a Bookmark from an OpenXML WordprocessingML Document

我还编写了替换书签文本的代码：

Replacing Text of a Bookmark in an OpenXML WordprocessingML Document

-Eric

Word OpenXML。遍历书签之间的OpenXmlElements

2 个答案: