在VBA中获取XML的所有节点

时间:2016-05-12 20:58:30

标签: xml vba dom

首先,如果我在XML文件的命名方法中犯了错误 - 对不起!我们假设我在XML文件中有以下语法:

<book id="bk101">
  <author>Gambardella, Matthew</author>
  <title>XML Developer's Guide</title>
  <genre>Computer</genre>
  <price>44.95</price>
  <publish_date>2000-10-01</publish_date>
  <description>An in-depth look at creating applications 
  with XML.</description>
</book>
<book id="bk102">
  <author>Ralls, Kim</author>
  <title>Midnight Rain</title>
  <genre>Fantasy</genre>
  <price>5.95</price>
  <publish_date>2000-12-16</publish_date>
  <description>A former architect battles corporate zombies, 
  an evil sorceress, and her own childhood to become queen 
  of the world.</description>
</book>

等。

然而,有些书籍有额外的节点,例如<author_birth>,<authors_favorite_tvshow>等。

我想把我的XML文件中的所有书籍都转换成列,每行一本。 我试图获取书籍的所有节点值,但是由于缺少一些<author_birth>节点,我无法使用简单的For循环,因为有不同数量的&#34; price&#34;节点和"<author_birth>"的不同数量。

我说最好把所有书籍都循环遍历,然后取出各个节点的值。但是,我不知道这可能是什么功能..

谢谢!

1 个答案:

答案 0 :(得分:0)

我有选择地将publisherpreordercover属性添加到典型的XML代码中,因此测试代码如下:

<catalog>
    <book id="bk101">
        <author>Gambardella, Matthew</author>
        <title>XML Developer's Guide</title>
        <genre>Computer</genre>
        <price>44.95</price>
        <publish_date>2000-10-01</publish_date>
        <description>An in-depth look at creating applications with XML.</description>
    </book>
    <book id="bk102">
        <author>Ralls, Kim</author>
        <title>Midnight Rain</title>
        <genre>Fantasy</genre>
        <price>5.95</price>
        <preorder>2.49</preorder>
        <publish_date>2000-12-16</publish_date>
        <description>A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.</description>
    </book>
    <book id="bk103">
        <author>Corets, Eva</author>
        <title>Maeve Ascendant</title>
        <genre>Fantasy</genre>
        <price>5.95</price>
        <preorder>1.99</preorder>
        <publish_date>2000-11-17</publish_date>
        <cover>case binding</cover>
        <description>After the collapse of a nanotechnology society in England, the young survivors lay the foundation for a new society.</description>
    </book>
    <book id="bk104">
        <publisher>Pearson</publisher>
        <author>Corets, Eva</author>
        <title>Oberon's Legacy</title>
        <genre>Fantasy</genre>
        <price>5.95</price>
        <publish_date>2001-03-10</publish_date>
        <description>In post-apocalypse England, the mysterious agent known only as Oberon helps to create a new life for the inhabitants of London. Sequel to Maeve Ascendant.</description>
    </book>
</catalog>

这是一个示例,显示了一种可能的解决方案,它允许处理存储为XML的表格式数据并检索表示带有标题的表的2d数组。它根据提供的XPath选择器处理项目,将项目子节点视为属性,提取属性名称和值,在右侧列中找到属性。

Option Explicit

Sub Test()

    Dim strBooksXML As String
    Dim arrBooks() As Variant

    ' get certain XML code
    strBooksXML = MyXMLData
    ' pass XML code and XPath selector to retrieve table-form array
    arrBooks = ConvertXMLToArray(strBooksXML, "//catalog/book")
    ' resulting array output
    Output Sheets(1), arrBooks

End Sub

Function ConvertXMLToArray(strXML As String, strItemSelector As String) As Variant()

    Dim objDOMDocument As Object
    Dim objPrpIdx As Object
    Dim objPrpVal As Object
    Dim lngItemNumber As Long
    Dim colItems As Object
    Dim objItem As Variant
    Dim objItemProperty As Variant
    Dim strPrev As String
    Dim strName As String
    Dim lngIndex As Long
    Dim arrItems() As Variant
    Dim varPrpName As Variant
    Dim varItemIndex As Variant

    Set objDOMDocument = CreateObject("MSXML2.DOMDocument")
    If Not objDOMDocument.LoadXML(strXML) Then
        Err.Raise objDOMDocument.parseError.ErrorCode, , objDOMDocument.parseError.reason
    End If
    Set objPrpIdx = CreateObject("Scripting.Dictionary") ' dictionary of property order indexes
    Set objPrpVal = CreateObject("Scripting.Dictionary") ' dictionary of property values
    lngItemNumber = 1
    Set colItems = objDOMDocument.SelectNodes(strItemSelector)
    For Each objItem In colItems
        strPrev = "" ' previous processed property name
        For Each objItemProperty In objItem.ChildNodes
            strName = objItemProperty.BaseName ' name of the property being processed
            If Not objPrpIdx.Exists(strName) Then ' no such property yet
                If strPrev = "" Then ' the property is the first
                    lngIndex = 0
                Else ' the property placed after another
                    lngIndex = objPrpIdx(strPrev) + 1
                End If
                ' increase all indexes that are greater or equal to processing property assigned index
                ' i. e. shift existing properties to insert new one
                For Each varPrpName In objPrpIdx
                    If objPrpIdx(varPrpName) >= lngIndex Then objPrpIdx(varPrpName) = objPrpIdx(varPrpName) + 1
                Next
                ' add new property name to dictionary of property order indexes with assigned index
                objPrpIdx(strName) = lngIndex
                ' add new property name to dictionary of property values, instantiate subdictionary of values
                Set objPrpVal(strName) = CreateObject("Scripting.Dictionary")
            End If
            objPrpVal(strName)(lngItemNumber) = objItemProperty.Text ' put property value with item index to the subdictionary
            strPrev = strName ' reassign previous property name
        Next
        lngItemNumber = lngItemNumber + 1
    Next
    ' rebuild dictionaries into 2d array for further output to worksheet
    ReDim arrItems(lngItemNumber - 1, objPrpIdx.Count - 1)
    For Each varPrpName In objPrpIdx ' process each
        arrItems(0, objPrpIdx(varPrpName)) = varPrpName ' put property name to header
        For Each varItemIndex In objPrpVal(varPrpName) ' process each item having the property
            arrItems(varItemIndex, objPrpIdx(varPrpName)) = objPrpVal(varPrpName)(varItemIndex)
        Next
    Next
    ConvertXMLToArray = arrItems

End Function

Sub Output(objSheet As Worksheet, arrCells() As Variant)

    With objSheet
        .Select
        .Cells.Delete
        With .Range(.Cells(1, 1), Cells(UBound(arrCells, 1) + 1, UBound(arrCells, 2) + 1))
            .NumberFormat = "@"
            .Value = arrCells
        End With
        .Columns.AutoFit
    End With
    With ActiveWindow
        .SplitColumn = 0
        .SplitRow = 1
        .FreezePanes = True
    End With

End Sub

Function MyXMLData()

    Dim strXML

    strXML = _
    "<catalog>"
        strXML = strXML & _
        "<book id=""bk101"">" & _
            "<author>Gambardella, Matthew</author>" & _
            "<title>XML Developer's Guide</title>" & _
            "<genre>Computer</genre>" & _
            "<price>44.95</price>" & _
            "<publish_date>2000-10-01</publish_date>" & _
            "<description>An in-depth look at creating applications " & _
            "with XML.</description>" & _
        "</book>"
        strXML = strXML & _
        "<book id=""bk102"">" & _
            "<author>Ralls, Kim</author>" & _
            "<title>Midnight Rain</title>" & _
            "<genre>Fantasy</genre>" & _
            "<price>5.95</price>" & _
            "<preorder>2.49</preorder>" & _
            "<publish_date>2000-12-16</publish_date>" & _
            "<description>A former architect battles corporate zombies, " & _
            "an evil sorceress, and her own childhood to become queen " & _
            "of the world.</description>" & _
        "</book>"
        strXML = strXML & _
        "<book id=""bk103"">" & _
            "<author>Corets, Eva</author>" & _
            "<title>Maeve Ascendant</title>" & _
            "<genre>Fantasy</genre>" & _
            "<price>5.95</price>" & _
            "<preorder>1.99</preorder>" & _
            "<publish_date>2000-11-17</publish_date>" & _
            "<cover>case binding</cover>" & _
            "<description>After the collapse of a nanotechnology " & _
            "society in England, the young survivors lay the " & _
            "foundation for a new society.</description>" & _
        "</book>"
        strXML = strXML & _
        "<book id=""bk104"">" & _
            "<publisher>Pearson</publisher>" & _
            "<author>Corets, Eva</author>" & _
            "<title>Oberon's Legacy</title>" & _
            "<genre>Fantasy</genre>" & _
            "<price>5.95</price>" & _
            "<publish_date>2001-03-10</publish_date>" & _
            "<description>In post-apocalypse England, the mysterious " & _
            "agent known only as Oberon helps to create a new life " & _
            "for the inhabitants of London. Sequel to Maeve " & _
            "Ascendant.</description>" & _
        "</book>"
        strXML = strXML & _
    "</catalog>"
    MyXMLData = strXML

End Function

结果输出如下:

output

这种方法使用的字典对于大型XML数据来说可能相当慢,因此更好地重做代码以使用数组而不是字典like here with JSON processing