如何从XmlNode实例获取xpath

时间:2008-10-27 20:12:46

标签: c# .net xml .net-2.0

有人可以提供一些代码来获取System.Xml.XmlNode实例的xpath吗?

谢谢!

14 个答案:

答案 0 :(得分:53)

好吧,我忍不住去了。它只适用于属性和元素,但是嘿......你能在15分钟内得到什么:)同样可能有更简洁的方法。

在每个元素(特别是根元素)上包含索引是多余的!但是它比试图弄清楚是否存在任何歧义更容易。

using System;
using System.Text;
using System.Xml;

class Test
{
    static void Main()
    {
        string xml = @"
<root>
  <foo />
  <foo>
     <bar attr='value'/>
     <bar other='va' />
  </foo>
  <foo><bar /></foo>
</root>";
        XmlDocument doc = new XmlDocument();
        doc.LoadXml(xml);
        XmlNode node = doc.SelectSingleNode("//@attr");
        Console.WriteLine(FindXPath(node));
        Console.WriteLine(doc.SelectSingleNode(FindXPath(node)) == node);
    }

    static string FindXPath(XmlNode node)
    {
        StringBuilder builder = new StringBuilder();
        while (node != null)
        {
            switch (node.NodeType)
            {
                case XmlNodeType.Attribute:
                    builder.Insert(0, "/@" + node.Name);
                    node = ((XmlAttribute) node).OwnerElement;
                    break;
                case XmlNodeType.Element:
                    int index = FindElementIndex((XmlElement) node);
                    builder.Insert(0, "/" + node.Name + "[" + index + "]");
                    node = node.ParentNode;
                    break;
                case XmlNodeType.Document:
                    return builder.ToString();
                default:
                    throw new ArgumentException("Only elements and attributes are supported");
            }
        }
        throw new ArgumentException("Node was not in a document");
    }

    static int FindElementIndex(XmlElement element)
    {
        XmlNode parentNode = element.ParentNode;
        if (parentNode is XmlDocument)
        {
            return 1;
        }
        XmlElement parent = (XmlElement) parentNode;
        int index = 1;
        foreach (XmlNode candidate in parent.ChildNodes)
        {
            if (candidate is XmlElement && candidate.Name == element.Name)
            {
                if (candidate == element)
                {
                    return index;
                }
                index++;
            }
        }
        throw new ArgumentException("Couldn't find element within parent");
    }
}

答案 1 :(得分:23)

Jon是正确的,有任意数量的XPath表达式将在实例文档中产生相同的节点。构建明确产生特定节点的表达式的最简单方法是使用谓词中节点位置的节点测试链,例如:

/node()[0]/node()[2]/node()[6]/node()[1]/node()[2]

显然,这个表达式不是使用元素名称,但是如果你要做的就是在文档中找到一个节点,那么你不需要它的名字。它也不能用于查找属性(因为属性不是节点而没有位置;您只能通过名称找到它们),但它会找到所有其他节点类型。

要构建此表达式,您需要编写一个返回节点在其父节点中的位置的方法,因为XmlNode不会将其作为属性公开:

static int GetNodePosition(XmlNode child)
{
   for (int i=0; i<child.ParentNode.ChildNodes.Count; i++)
   {
       if (child.ParentNode.ChildNodes[i] == child)
       {
          // tricksy XPath, not starting its positions at 0 like a normal language
          return i + 1;
       }
   }
   throw new InvalidOperationException("Child node somehow not found in its parent's ChildNodes property.");
}

(使用LINQ可能有一种更优雅的方式,因为XmlNodeList实现了IEnumerable,但我会按照我所知道的去做。)

然后你可以编写一个这样的递归方法:

static string GetXPathToNode(XmlNode node)
{
    if (node.NodeType == XmlNodeType.Attribute)
    {
        // attributes have an OwnerElement, not a ParentNode; also they have
        // to be matched by name, not found by position
        return String.Format(
            "{0}/@{1}",
            GetXPathToNode(((XmlAttribute)node).OwnerElement),
            node.Name
            );            
    }
    if (node.ParentNode == null)
    {
        // the only node with no parent is the root node, which has no path
        return "";
    }
    // the path to a node is the path to its parent, plus "/node()[n]", where 
    // n is its position among its siblings.
    return String.Format(
        "{0}/node()[{1}]",
        GetXPathToNode(node.ParentNode),
        GetNodePosition(node)
        );
}

正如你所看到的,我在某种程度上也是为了找到属性。

当我写作我的时候,乔恩插入了他的版本。关于他的代码有一些东西会让我现在有点吵了,如果听起来我对Jon很讨厌,我会提前道歉。 (我不是。我很确定Jon必须向我学习的内容非常简短。)但我认为,对于任何使用XML的人来说,我要说的是非常重要的一点。想想。

我怀疑Jon的解决方案来自我看到很多开发人员所做的事情:将XML文档视为元素和属性的树。我认为这主要来自于主要使用XML作为序列化格式的开发人员,因为他们习惯使用的所有XML都是以这种方式构建的。您可以发现这些开发人员,因为他们可以互换地使用术语“节点”和“元素”。这使他们想出了将所有其他节点类型视为特殊情况的解决方案。 (很长一段时间,我自己就是其中一个人。)

当你正在制作时,这感觉就像是一个简化的假设。但事实并非如此。它使问题更难,代码更复杂。它会引导您绕过专门设计用于处理所有节点类型的XML技术(如XPath中的node()函数)。

Jon的代码中有一个红色标记,即使我不知道要求是什么,也是GetElementsByTagName,我会在代码审查中查询它。每当我看到使用该方法时,跳到脑海中的问题始终是“它为什么必须成为一个元素?”答案经常是“哦,这段代码是否也需要处理文本节点?”

答案 2 :(得分:6)

我知道,老帖子,但我最喜欢的版本(名字的版本)有缺陷: 当父节点具有不同名称的节点时,它会在找到第一个不匹配的节点名后停止对索引进行计数。

这是我的固定版本:

/// <summary>
/// Gets the X-Path to a given Node
/// </summary>
/// <param name="node">The Node to get the X-Path from</param>
/// <returns>The X-Path of the Node</returns>
public string GetXPathToNode(XmlNode node)
{
    if (node.NodeType == XmlNodeType.Attribute)
    {
        // attributes have an OwnerElement, not a ParentNode; also they have             
        // to be matched by name, not found by position             
        return String.Format("{0}/@{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name);
    }
    if (node.ParentNode == null)
    {
        // the only node with no parent is the root node, which has no path
        return "";
    }

    // Get the Index
    int indexInParent = 1;
    XmlNode siblingNode = node.PreviousSibling;
    // Loop thru all Siblings
    while (siblingNode != null)
    {
        // Increase the Index if the Sibling has the same Name
        if (siblingNode.Name == node.Name)
        {
            indexInParent++;
        }
        siblingNode = siblingNode.PreviousSibling;
    }

    // the path to a node is the path to its parent, plus "/node()[n]", where n is its position among its siblings.         
    return String.Format("{0}/{1}[{2}]", GetXPathToNode(node.ParentNode), node.Name, indexInParent);
}

答案 3 :(得分:3)

我的10p价值是Robert和Corey的答案的混合体。我只能对额外代码行的实际打字声称有用。

    private static string GetXPathToNode(XmlNode node)
    {
        if (node.NodeType == XmlNodeType.Attribute)
        {
            // attributes have an OwnerElement, not a ParentNode; also they have
            // to be matched by name, not found by position
            return String.Format(
                "{0}/@{1}",
                GetXPathToNode(((XmlAttribute)node).OwnerElement),
                node.Name
                );
        }
        if (node.ParentNode == null)
        {
            // the only node with no parent is the root node, which has no path
            return "";
        }
        //get the index
        int iIndex = 1;
        XmlNode xnIndex = node;
        while (xnIndex.PreviousSibling != null) { iIndex++; xnIndex = xnIndex.PreviousSibling; }
        // the path to a node is the path to its parent, plus "/node()[n]", where 
        // n is its position among its siblings.
        return String.Format(
            "{0}/node()[{1}]",
            GetXPathToNode(node.ParentNode),
            iIndex
            );
    }

答案 4 :(得分:3)

这是一个我用过的简单方法,为我工作。

    static string GetXpath(XmlNode node)
    {
        if (node.Name == "#document")
            return String.Empty;
        return GetXpath(node.SelectSingleNode("..")) + "/" +  (node.NodeType == XmlNodeType.Attribute ? "@":String.Empty) + node.Name;
    }

答案 5 :(得分:2)

没有节点的“xpath”这样的东西。对于任何给定节点,可能会有许多xpath表达式与之匹配。

您可以使用树来构建 表达式,该表达式将匹配它,考虑到特定元素的索引等,但它不会是非常好的代码。

你为什么需要这个?可能有更好的解决方案。

答案 6 :(得分:1)

如果你这样做,你将获得一个名为节点名称的路径和位置,如果你有这样的名字的节点: “/服务[1] /系统[1] /组[1] /文件夹[2] /文件[2]”

public string GetXPathToNode(XmlNode node)
{         
    if (node.NodeType == XmlNodeType.Attribute)
    {             
        // attributes have an OwnerElement, not a ParentNode; also they have             
        // to be matched by name, not found by position             
        return String.Format("{0}/@{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name);
    }
    if (node.ParentNode == null)
    {             
        // the only node with no parent is the root node, which has no path
        return "";
    }

    //get the index
    int iIndex = 1;
    XmlNode xnIndex = node;
    while (xnIndex.PreviousSibling != null && xnIndex.PreviousSibling.Name == xnIndex.Name)
    {
         iIndex++;
         xnIndex = xnIndex.PreviousSibling; 
    }

    // the path to a node is the path to its parent, plus "/node()[n]", where
    // n is its position among its siblings.         
    return String.Format("{0}/{1}[{2}]", GetXPathToNode(node.ParentNode), node.Name, iIndex);
}

答案 7 :(得分:1)

我发现以上都没有使用XDocument,因此我编写了自己的代码来支持XDocument并使用了递归。我认为这个代码比其他代码更好地处理多个相同的节点,因为它首先尝试深入到XML路径,然后备份以仅构建所需的代码。因此,如果您有/home/white/bob/home/white/mike并且想要创建/home/white/bob/garage,则代码将知道如何创建它。但是,我不想搞乱谓词或通配符,所以我明确禁止那些;但是添加对它们的支持会很容易。

Private Sub NodeItterate(XDoc As XElement, XPath As String)
    'get the deepest path
    Dim nodes As IEnumerable(Of XElement)

    nodes = XDoc.XPathSelectElements(XPath)

    'if it doesn't exist, try the next shallow path
    If nodes.Count = 0 Then
        NodeItterate(XDoc, XPath.Substring(0, XPath.LastIndexOf("/")))
        'by this time all the required parent elements will have been constructed
        Dim ParentPath As String = XPath.Substring(0, XPath.LastIndexOf("/"))
        Dim ParentNode As XElement = XDoc.XPathSelectElement(ParentPath)
        Dim NewElementName As String = XPath.Substring(XPath.LastIndexOf("/") + 1, XPath.Length - XPath.LastIndexOf("/") - 1)
        ParentNode.Add(New XElement(NewElementName))
    End If

    'if we find there are more than 1 elements at the deepest path we have access to, we can't proceed
    If nodes.Count > 1 Then
        Throw New ArgumentOutOfRangeException("There are too many paths that match your expression.")
    End If

    'if there is just one element, we can proceed
    If nodes.Count = 1 Then
        'just proceed
    End If

End Sub

Public Sub CreateXPath(ByVal XDoc As XElement, ByVal XPath As String)

    If XPath.Contains("//") Or XPath.Contains("*") Or XPath.Contains(".") Then
        Throw New ArgumentException("Can't create a path based on searches, wildcards, or relative paths.")
    End If

    If Regex.IsMatch(XPath, "\[\]()@='<>\|") Then
        Throw New ArgumentException("Can't create a path based on predicates.")
    End If

    'we will process this recursively.
    NodeItterate(XDoc, XPath)

End Sub

答案 8 :(得分:1)

使用类扩展怎么样? ;) 我的版本(基于其他工作)使用语法名称[索引] ...索引省略是元素没有“兄弟”。 获取元素索引的循环在独立例程(也是类扩展)之外。

在任何实用程序类(或主程序类)中快速浏览以下内容

static public int GetRank( this XmlNode node )
{
    // return 0 if unique, else return position 1...n in siblings with same name
    try
    {
        if( node is XmlElement ) 
        {
            int rank = 1;
            bool alone = true, found = false;

            foreach( XmlNode n in node.ParentNode.ChildNodes )
                if( n.Name == node.Name ) // sibling with same name
                {
                    if( n.Equals(node) )
                    {
                        if( ! alone ) return rank; // no need to continue
                        found = true;
                    }
                    else
                    {
                        if( found ) return rank; // no need to continue
                        alone = false;
                        rank++;
                    }
                }

        }
    }
    catch{}
    return 0;
}

static public string GetXPath( this XmlNode node )
{
    try
    {
        if( node is XmlAttribute )
            return String.Format( "{0}/@{1}", (node as XmlAttribute).OwnerElement.GetXPath(), node.Name );

        if( node is XmlText || node is XmlCDataSection )
            return node.ParentNode.GetXPath();

        if( node.ParentNode == null )   // the only node with no parent is the root node, which has no path
            return "";

        int rank = node.GetRank();
        if( rank == 0 ) return String.Format( "{0}/{1}",        node.ParentNode.GetXPath(), node.Name );
        else            return String.Format( "{0}/{1}[{2}]",   node.ParentNode.GetXPath(), node.Name, rank );
    }
    catch{}
    return "";
}   

答案 9 :(得分:1)

我为Excel制作了VBA来为工作项目执行此操作。它输出Xpath的元组和元素或属性的相关文本。目的是允许业务分析人员识别和映射一些xml。感谢这是一个C#论坛,但认为这可能是有意义的。

Sub Parse2(oSh As Long, inode As IXMLDOMNode, Optional iXstring As String = "", Optional indexes)


Dim chnode As IXMLDOMNode
Dim attr As IXMLDOMAttribute
Dim oXString As String
Dim chld As Long
Dim idx As Variant
Dim addindex As Boolean
chld = 0
idx = 0
addindex = False


'determine the node type:
Select Case inode.NodeType

    Case NODE_ELEMENT
        If inode.ParentNode.NodeType = NODE_DOCUMENT Then 'This gets the root node name but ignores all the namespace attributes
            oXString = iXstring & "//" & fp(inode.nodename)
        Else

            'Need to deal with indexing. Where an element has siblings with the same nodeName,it needs to be indexed using [index], e.g swapstreams or schedules

            For Each chnode In inode.ParentNode.ChildNodes
                If chnode.NodeType = NODE_ELEMENT And chnode.nodename = inode.nodename Then chld = chld + 1
            Next chnode

            If chld > 1 Then '//inode has siblings of the same nodeName, so needs to be indexed
                'Lookup the index from the indexes array
                idx = getIndex(inode.nodename, indexes)
                addindex = True
            Else
            End If

            'build the XString
            oXString = iXstring & "/" & fp(inode.nodename)
            If addindex Then oXString = oXString & "[" & idx & "]"

            'If type is element then check for attributes
            For Each attr In inode.Attributes
                'If the element has attributes then extract the data pair XString + Element.Name, @Attribute.Name=Attribute.Value
                Call oSheet(oSh, oXString & "/@" & attr.Name, attr.Value)
            Next attr

        End If

    Case NODE_TEXT
        'build the XString
        oXString = iXstring
        Call oSheet(oSh, oXString, inode.NodeValue)

    Case NODE_ATTRIBUTE
    'Do nothing
    Case NODE_CDATA_SECTION
    'Do nothing
    Case NODE_COMMENT
    'Do nothing
    Case NODE_DOCUMENT
    'Do nothing
    Case NODE_DOCUMENT_FRAGMENT
    'Do nothing
    Case NODE_DOCUMENT_TYPE
    'Do nothing
    Case NODE_ENTITY
    'Do nothing
    Case NODE_ENTITY_REFERENCE
    'Do nothing
    Case NODE_INVALID
    'do nothing
    Case NODE_NOTATION
    'do nothing
    Case NODE_PROCESSING_INSTRUCTION
    'do nothing
End Select

'Now call Parser2 on each of inode's children.
If inode.HasChildNodes Then
    For Each chnode In inode.ChildNodes
        Call Parse2(oSh, chnode, oXString, indexes)
    Next chnode
Set chnode = Nothing
Else
End If

End Sub

使用以下方式管理元素的计数:

Function getIndex(tag As Variant, indexes) As Variant
'Function to get the latest index for an xml tag from the indexes array
'indexes array is passed from one parser function to the next up and down the tree

Dim i As Integer
Dim n As Integer

If IsArrayEmpty(indexes) Then
    ReDim indexes(1, 0)
    indexes(0, 0) = "Tag"
    indexes(1, 0) = "Index"
Else
End If
For i = 0 To UBound(indexes, 2)
    If indexes(0, i) = tag Then
        'tag found, increment and return the index then exit
        'also destroy all recorded tag names BELOW that level
        indexes(1, i) = indexes(1, i) + 1
        getIndex = indexes(1, i)
        ReDim Preserve indexes(1, i) 'should keep all tags up to i but remove all below it
        Exit Function
    Else
    End If
Next i

'tag not found so add the tag with index 1 at the end of the array
n = UBound(indexes, 2)
ReDim Preserve indexes(1, n + 1)
indexes(0, n + 1) = tag
indexes(1, n + 1) = 1
getIndex = 1

End Function

答案 10 :(得分:1)

您问题的另一个解决方案可能是标记&#39;您希望稍后使用自定义属性识别的xmlnodes:

var id = _currentNode.OwnerDocument.CreateAttribute("some_id");
id.Value = Guid.NewGuid().ToString();
_currentNode.Attributes.Append(id);

例如,您可以存储在词典中。 然后您可以使用xpath查询识别节点:

newOrOldDocument.SelectSingleNode(string.Format("//*[contains(@some_id,'{0}')]", id));

我知道这不是你问题的直接答案,但是如果你想知道节点的xpath的原因是要有一种“达到”的方法,它会有所帮助。在您在代码中丢失对它的引用之后的节点。

这也克服了文档添加/移动元素时的问题,这可能会弄乱xpath(或其他答案中建议的索引)。

答案 11 :(得分:0)

这更容易

 ''' <summary>
    ''' Gets the full XPath of a single node.
    ''' </summary>
    ''' <param name="node"></param>
    ''' <returns></returns>
    ''' <remarks></remarks>
    Private Function GetXPath(ByVal node As Xml.XmlNode) As String
        Dim temp As String
        Dim sibling As Xml.XmlNode
        Dim previousSiblings As Integer = 1

        'I dont want to know that it was a generic document
        If node.Name = "#document" Then Return ""

        'Prime it
        sibling = node.PreviousSibling
        'Perculate up getting the count of all of this node's sibling before it.
        While sibling IsNot Nothing
            'Only count if the sibling has the same name as this node
            If sibling.Name = node.Name Then
                previousSiblings += 1
            End If
            sibling = sibling.PreviousSibling
        End While

        'Mark this node's index, if it has one
        ' Also mark the index to 1 or the default if it does have a sibling just no previous.
        temp = node.Name + IIf(previousSiblings > 0 OrElse node.NextSibling IsNot Nothing, "[" + previousSiblings.ToString() + "]", "").ToString()

        If node.ParentNode IsNot Nothing Then
            Return GetXPath(node.ParentNode) + "/" + temp
        End If

        Return temp
    End Function

答案 12 :(得分:0)

 public static string GetFullPath(this XmlNode node)
        {
            if (node.ParentNode == null)
            {
                return "";
            }
            else
            {
                return $"{GetFullPath(node.ParentNode)}\\{node.ParentNode.Name}";
            }
        }

答案 13 :(得分:0)

我最近不得不这样做。只需要考虑因素。这就是我想出的:

void use(const B& v)
{
    static_assert(B::MAX == 10, "");
               // ^^^
}

template<typename X>
void use2(X&& v)
{
    static_assert(std::remove_reference_t<X>::MAX == 10, "");
               // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
}