我有大量的XML都是这种形式的松散(有些字段有不同的字段,但都包含我想要的某些信息)
<message>
<stdHeader> don't want </stdheader>
<formdata>
<field1>
<subfield1>
<type> don’t want </type>
<name> want </name>
</subfield1>
<subfield2> want </subfield2>
<subfield3> don't want </subfield4>
</field1>
<field2> don't want </field2>
<field3>
<subfield1>
<givenName> want </givenName>
<familyName> want </familyName>
</field3>
</formdata>
<aaaa>don't want </aaaa>
<bbbb>don't want</bbbb>
<cccc>don't want</cccc>
<dddd>don't want</dddd>
<eeee>don't want</eeee>
<ffff>don't want</ffff>
</message>
我想在excel中使用列标题'name','subfield3','givenName','familyName'(来自上面),每行都有来自每个XML的值。我是编程的初学者,所以我不知道如何1.从单个xml中仅提取我想要的那些值,并且2.为文件夹中的每个XML文件编写一些代码。任何人都可以帮助我吗?
编辑:
示例实际的xml
<?xml version="1.0" encoding="UTF-8"?> <message submitted="y" xmlns="u">
<s><m>4</m><me>0</me>
<oc>I</oc>
<os>E</os><dr>21</dr>
<tr>1</tr><dc>20/dc>
<tc>1</tc><ds>2</ds>
<ts>1</ts></sh><formData><c><identifier edgeitem="ZCO01b">
<type edgeitem="ZCO01b">C</type><value edgeitem="ZCO01b">172</value>
</identifier><name edgeitem="ZCO01a">JMTGN</name></c>
<pb><ch><of><ef edgeitem="ZRP04b">20</ef><ad edgeitem="ZRG03c">
<adL edgeitem="ZRP04d">2MR1</adL>
<co edgeitem="ZRP04d">A</co><ov>true</ov>
</ad></of></ch></pb><of><ch edgeitem="ZSD06a">
<of><pe><ne edgeitem="ZSD06c"><gi edgeitem="ZSD06c">k</gi>
<fa edgeitem="ZSD06c">o</fa>
</ne><bi edgeitem="ZSD06d"><da edgeitem="ZSD06d">196</da>
<ci edgeitem="ZSD06d">MNE</ci><st edgeitem="ZSD06d">VC</st>
<co edgeitem="ZSD06d">Aua</co></bi>
</pe><ad edgeitem="ZSD06h"><ad edgeitem="ZSD06h">24IC86</adL><co edgeitem="ZSD06h">uia</co><ov edgeitem="ZSD06i">true</ov><not><daC edgeitem="ZSD06b">29</daC>
</not></ad></of><of><pe><name edgeitem="ZSD06c"><gs edgeitem="ZSD06c">jane</gs>
<fae edgeitem="ZSD06c">ci</fae></name><bi edgeitem="ZSD06d"><da edgeitem="ZSD06d">198</da><ci edgeitem="ZSD06d">MLB</ci><st edgeitem="ZSD06d">VC</st>
<co edgeitem="ZSD06d">Aul</co></bi></pe><ad edgeitem="ZSD06h"><adL edgeitem="ZSD06h">24IC</adL><co edgeitem="ZSD06h">uia</co><ov edgeitem="ZSD06i">true</ov>
<not><daC edgeitem="ZSD06b">209</daC>
</not></ad></of></ch></of><si><name edgeitem="ZDC00a"><givenNames edgeitem="ZDC00a">John </givenNames>
<familyName edgeitem="ZDC00a">Citizen</familyName></name><ca edgeitem="ZDC00b">DI</ca><daS edgeitem="ZDC00c">200</daS><dec edgeitem="ZDC00d">true</dec></si></formData>
<mes><asi><ebu><re>746</re>
</ebu><asc><doc>181</doc></asc></asi>
<cus><edg><re><type>RE</type>
<qu>42</qu></re><ac>A08</ac>
<tra>60</tra>
<seq>1</se><tr>7046</tr>
<mailbox>PR</mailbox><mode>PROCESS</mode></edge></customer></messageIdentifier>
<asc><lo><ag>442</ag></loy></asc>
<asco><re><dod>
<dete>true</dete><fe>
<lod>258</lod><lod>213</lod>
<tot>0.00</tot></fe></dod></re>
<prs><m>PRS</m><wa>false</wa><deb>false</deb>
<maid>DP2</maid>
<re>false</re></pro></asco>
<wo><aga><ag>2</ag><agn>ATD</agn><co>LNY</co><pos>
<adL>PO60</adL><adL>C3145</adL><co>AUA</co><asd>15055</asd>
</pos><pe><te><nr>077</nr>
</te></ph><fx><te><nr>057</nr>
</te></fx></aga></wa></message>
答案 0 :(得分:0)
此子句遍历文件夹中的所有文件,如果找到任何XML,则调用第二个子
Option Explicit
'in code editor: Tools > References > checkbox in Microsoft Scripting Runtime
Public Sub ProcessXMLs()
Const FOLDER_NAME As String = "C:\Tmp" '<- update this path
Dim tags As Variant, hdrs As Variant, rowID As Long
Dim fso As FileSystemObject, f As File
Set fso = New Scripting.FileSystemObject
hdrs = Array("FileName", "ItemID", "Name", "GivenName", "FamilyName")
tags = Array("FileName", "value", "name", "givenNames", "familyName")
With Sheet1
.Range(.Cells(1, 1), .Cells(1, UBound(tags) + 1)).Value2 = hdrs
rowID = 2
Application.ScreenUpdating = False
For Each f In fso.GetFolder(FOLDER_NAME).Files 'iterate through files
If LCase(fso.GetExtensionName(f)) = "xml" Then
.Cells(rowID, 1).Value2 = fso.GetBaseName(f) & ".xml"
ReadTags Sheet1, fso.OpenTextFile(f.Path, ForReading), rowID, tags
rowID = rowID + 1
End If
Next
.UsedRange.Columns.AutoFit
Application.ScreenUpdating = True
End With
End Sub
此子提取4个标签中的值,当时只有一个文件:
Private Sub ReadTags(ByVal ws As Worksheet, ByVal fsoFile As TextStream, _
ByVal rowID As Long, ByVal tags As Variant)
Dim ln As String, val As String, i As Long, s1 As Long, s2 As Long
With fsoFile
Do While Not .AtEndOfStream 'file stream is open
ln = Trim(.ReadLine) 'read each line
If Len(ln) > 0 Then 'if text line is not empty extract tags
For i = 1 To UBound(tags) 'find each tag - start and closing
s1 = InStr(1, ln, "<" & tags(i), 0)
s2 = InStr(s1 + 1, ln, "</" & tags(i) & ">", 0)
If s1 > 0 And s2 > 0 Then
s1 = InStr(s1, ln, """>", 0) + 2
ws.Cells(rowID, i + 1).Value2 = Trim(Mid(ln, s1, s2 - s1))
Exit For
End If
Next
End If
Loop
End With
End Sub
结果: