如何从多个XML中提取值到excel?

时间:2017-06-19 00:56:29

标签: xml excel excel-vba vba

我有大量的XML都是这种形式的松散(有些字段有不同的字段,但都包含我想要的某些信息)

<message>
  <stdHeader> don't want </stdheader>
  <formdata>
     <field1>
        <subfield1>
            <type> don’t want </type>
            <name> want </name>
        </subfield1>
        <subfield2> want </subfield2>
        <subfield3> don't want </subfield4>
    </field1>
    <field2> don't want  </field2>
    <field3>
        <subfield1>
            <givenName> want  </givenName>
            <familyName> want  </familyName>
    </field3>
 </formdata>
 <aaaa>don't want </aaaa>  
 <bbbb>don't want</bbbb>  
 <cccc>don't want</cccc>  
 <dddd>don't want</dddd>  
 <eeee>don't want</eeee>  
 <ffff>don't want</ffff>  
</message>

我想在excel中使用列标题'name','subfield3','givenName','familyName'(来自上面),每行都有来自每个XML的值。我是编程的初学者,所以我不知道如何1.从单个xml中仅提取我想要的那些值,并且2.为文件夹中的每个XML文件编写一些代码。任何人都可以帮助我吗?

编辑:

示例实际的xml

 <?xml version="1.0" encoding="UTF-8"?> <message submitted="y" xmlns="u">
 <s><m>4</m><me>0</me>
 <oc>I</oc>
 <os>E</os><dr>21</dr>
 <tr>1</tr><dc>20/dc>
 <tc>1</tc><ds>2</ds>
 <ts>1</ts></sh><formData><c><identifier edgeitem="ZCO01b">
 <type edgeitem="ZCO01b">C</type><value edgeitem="ZCO01b">172</value>
 </identifier><name edgeitem="ZCO01a">JMTGN</name></c>
 <pb><ch><of><ef edgeitem="ZRP04b">20</ef><ad edgeitem="ZRG03c">
 <adL edgeitem="ZRP04d">2MR1</adL>
 <co edgeitem="ZRP04d">A</co><ov>true</ov>
 </ad></of></ch></pb><of><ch edgeitem="ZSD06a">
 <of><pe><ne edgeitem="ZSD06c"><gi edgeitem="ZSD06c">k</gi>
 <fa edgeitem="ZSD06c">o</fa>
 </ne><bi edgeitem="ZSD06d"><da edgeitem="ZSD06d">196</da>
 <ci edgeitem="ZSD06d">MNE</ci><st edgeitem="ZSD06d">VC</st>
 <co edgeitem="ZSD06d">Aua</co></bi>
 </pe><ad edgeitem="ZSD06h"><ad edgeitem="ZSD06h">24IC86</adL><co edgeitem="ZSD06h">uia</co><ov edgeitem="ZSD06i">true</ov><not><daC edgeitem="ZSD06b">29</daC>
 </not></ad></of><of><pe><name edgeitem="ZSD06c"><gs edgeitem="ZSD06c">jane</gs>
 <fae edgeitem="ZSD06c">ci</fae></name><bi edgeitem="ZSD06d"><da edgeitem="ZSD06d">198</da><ci edgeitem="ZSD06d">MLB</ci><st edgeitem="ZSD06d">VC</st>
 <co edgeitem="ZSD06d">Aul</co></bi></pe><ad edgeitem="ZSD06h"><adL edgeitem="ZSD06h">24IC</adL><co edgeitem="ZSD06h">uia</co><ov edgeitem="ZSD06i">true</ov>
 <not><daC edgeitem="ZSD06b">209</daC>
 </not></ad></of></ch></of><si><name edgeitem="ZDC00a"><givenNames edgeitem="ZDC00a">John </givenNames>
 <familyName edgeitem="ZDC00a">Citizen</familyName></name><ca edgeitem="ZDC00b">DI</ca><daS edgeitem="ZDC00c">200</daS><dec edgeitem="ZDC00d">true</dec></si></formData>
 <mes><asi><ebu><re>746</re>
 </ebu><asc><doc>181</doc></asc></asi>
 <cus><edg><re><type>RE</type>
 <qu>42</qu></re><ac>A08</ac>
 <tra>60</tra>
 <seq>1</se><tr>7046</tr>
 <mailbox>PR</mailbox><mode>PROCESS</mode></edge></customer></messageIdentifier>
 <asc><lo><ag>442</ag></loy></asc>
 <asco><re><dod>
 <dete>true</dete><fe>
 <lod>258</lod><lod>213</lod>
 <tot>0.00</tot></fe></dod></re>
 <prs><m>PRS</m><wa>false</wa><deb>false</deb>
 <maid>DP2</maid>
 <re>false</re></pro></asco>
 <wo><aga><ag>2</ag><agn>ATD</agn><co>LNY</co><pos>
 <adL>PO60</adL><adL>C3145</adL><co>AUA</co><asd>15055</asd>
 </pos><pe><te><nr>077</nr>
 </te></ph><fx><te><nr>057</nr>
 </te></fx></aga></wa></message>

1 个答案:

答案 0 :(得分:0)

此子句遍历文件夹中的所有文件,如果找到任何XML,则调用第二个子

Option Explicit

'in code editor: Tools > References > checkbox in Microsoft Scripting Runtime

Public Sub ProcessXMLs()
    Const FOLDER_NAME   As String = "C:\Tmp"        '<- update this path

    Dim tags As Variant, hdrs As Variant, rowID As Long
    Dim fso As FileSystemObject, f As File

    Set fso = New Scripting.FileSystemObject

    hdrs = Array("FileName", "ItemID", "Name", "GivenName", "FamilyName")
    tags = Array("FileName", "value", "name", "givenNames", "familyName")

    With Sheet1
        .Range(.Cells(1, 1), .Cells(1, UBound(tags) + 1)).Value2 = hdrs
        rowID = 2
        Application.ScreenUpdating = False
        For Each f In fso.GetFolder(FOLDER_NAME).Files   'iterate through files
            If LCase(fso.GetExtensionName(f)) = "xml" Then
                .Cells(rowID, 1).Value2 = fso.GetBaseName(f) & ".xml"
                ReadTags Sheet1, fso.OpenTextFile(f.Path, ForReading), rowID, tags
                rowID = rowID + 1
            End If
        Next
        .UsedRange.Columns.AutoFit
        Application.ScreenUpdating = True
    End With
End Sub

此子提取4个标签中的值,当时只有一个文件:

Private Sub ReadTags(ByVal ws As Worksheet, ByVal fsoFile As TextStream, _
                     ByVal rowID As Long, ByVal tags As Variant)

    Dim ln As String, val As String, i As Long, s1 As Long, s2 As Long

    With fsoFile
        Do While Not .AtEndOfStream       'file stream is open

            ln = Trim(.ReadLine)          'read each line

            If Len(ln) > 0 Then           'if text line is not empty extract tags
                For i = 1 To UBound(tags) 'find each tag - start and closing
                    s1 = InStr(1, ln, "<" & tags(i), 0)
                    s2 = InStr(s1 + 1, ln, "</" & tags(i) & ">", 0)
                    If s1 > 0 And s2 > 0 Then
                        s1 = InStr(s1, ln, """>", 0) + 2
                        ws.Cells(rowID, i + 1).Value2 = Trim(Mid(ln, s1, s2 - s1))
                        Exit For
                    End If
                Next
            End If
        Loop
    End With
End Sub

结果:

enter image description here