Question

我一直在学习如何使用dom.minidom函数提取XML的部分内容，并且我可以成功返回特定的元素和属性。

我想要解析许多大型XML文件，并将所有结果推送到数据库中。是否有一个像os.walk这样的函数可以用来以保留层次结构的逻辑方式从XML中提取元素？

XML非常简单，非常直接：

<InternalSignature ID="9" Specificity="Generic">
 <ByteSequence Reference="BOFoffset">
  <SubSequence Position="1" SubSeqMinOffset="0" SubSeqMaxOffset="0" MinFragLength="0">
  <Sequence>49492A00</Sequence> 
  <DefaultShift>5</DefaultShift> 
  <Shift Byte="00">1</Shift> 
  <Shift Byte="2A">2</Shift> 
  <Shift Byte="49">3</Shift> 
  </SubSequence>
 </ByteSequence>
</InternalSignature>
<InternalSignature ID="10" Specificity="Generic">
 <ByteSequence Reference="BOFoffset">
  <SubSequence Position="1" SubSeqMinOffset="0" SubSeqMaxOffset="0" MinFragLength="0">
  <Sequence>4D4D002A</Sequence> 
  <DefaultShift>5</DefaultShift> 
  <Shift Byte="2A">1</Shift> 
  <Shift Byte="00">2</Shift> 
  <Shift Byte="4D">3</Shift> 
  </SubSequence>
 </ByteSequence>
</InternalSignature>

是否有正式的XML爬行方法和（在这个小例子中）提取与每个特定InternalSignature相关的元素？我可以看到如何使用minidom.parse和.GetElementsByName方法通过列表调用事物，但我不确定如何将元素关联到它们的层次结构表示中。

到目前为止，我找到了一个教程，展示了如何返回各种值：

xmldoc = minidom.parse("file.xml")
Versionlist = xmldoc.getElementsByTagName('FFSignatureFile')
VersionRef = Versionlist[0]
Version = VersionRef.attributes["Version"]
DateCreated = VersionRef.attributes["DateCreated"]
print Version.value
print DateCreated.value
InternalSignatureList = xmldoc.getElementsByTagName('InternalSignature')
InternalSignatureRef = InternalSignatureList[0]
SigID = InternalSignatureRef.attributes["ID"]
SigSpecificity = InternalSignatureRef.attributes["Specificity"]
print SigID.value 
print SigSpecificity.value
print len(InternalSignatureList)

我可以从最后一行（len）看到InternalSignatureList中有134个元素，基本上我希望能够将每个InternalSignature中的所有元素作为单个记录提取出来并将其轻弹到数据库中。

Answer 1

（你试过什么？）

from xml.etree import ElementTree

e = ElementTree.fromstring(xmlstring)
e.findall("ByteSequence")

是否有一种在Python中“行走”XML的正式方法？

1 个答案: