解析XML数据,我的循环问题

时间:2017-03-25 01:56:31

标签: python xml for-loop xml-parsing

我正在学习python作为我的第一个编程语言,我目前正在解析一个xml文件作为练习。我去打印时遇到了问题。

在AWARD_CONTRACT标签内,有多个承包商,已经获得了合同。当我去打印AWARD_CONTRACT时,只打印最后一个承包商。请参阅下面的代码。此外,任何有关如何清理代码或提高效率的提示都将不胜感激!

<AWARD_CONTRACT ITEM="1"> 
  <CONTRACT_NO>1</CONTRACT_NO>  
  <LOT_NO>1</LOT_NO>  
  <TITLE> 
    <P>Vállalkozási szerződés</P> 
  </TITLE>  
  <AWARDED_CONTRACT> 
    <DATE_CONCLUSION_CONTRACT>2016-12-28</DATE_CONCLUSION_CONTRACT>  
    <NB_TENDERS_RECEIVED>5</NB_TENDERS_RECEIVED>  
    <NB_TENDERS_RECEIVED_SME>0</NB_TENDERS_RECEIVED_SME>  
    <NB_TENDERS_RECEIVED_OTHER_EU>0</NB_TENDERS_RECEIVED_OTHER_EU>  
    <NB_TENDERS_RECEIVED_NON_EU>0</NB_TENDERS_RECEIVED_NON_EU>  
    <NB_TENDERS_RECEIVED_EMEANS>0</NB_TENDERS_RECEIVED_EMEANS>  
    <AWARDED_TO_GROUP/>  
    <CONTRACTOR> 
      <ADDRESS_CONTRACTOR> 
        <OFFICIALNAME>SWIETELSKY Magyarország Kft.</OFFICIALNAME>  
        <ADDRESS>Irinyi J. u. 4-20. B. épület V. emelet</ADDRESS>  
        <TOWN>Budapest</TOWN>  
        <POSTAL_CODE>1117</POSTAL_CODE>  
        <COUNTRY VALUE="HU"/>  
        <NUTS CODE="HU101"/> 
      </ADDRESS_CONTRACTOR>  
      <NO_SME/> 
    </CONTRACTOR>  
    <CONTRACTOR> 
      <ADDRESS_CONTRACTOR> 
        <OFFICIALNAME>HE-DO Kft.</OFFICIALNAME>  
        <ADDRESS>Váci út 76.</ADDRESS>  
        <TOWN>Budapest</TOWN>  
        <POSTAL_CODE>1133</POSTAL_CODE>  
        <COUNTRY VALUE="HU"/>  
        <NUTS CODE="HU101"/> 
      </ADDRESS_CONTRACTOR>  
      <NO_SME/> 
    </CONTRACTOR>  
    <CONTRACTOR> 
      <ADDRESS_CONTRACTOR> 
        <OFFICIALNAME>KM Építő Kft.</OFFICIALNAME>  
        <ADDRESS>Bánki Donát u. 5.</ADDRESS>  
        <TOWN>Szigetszentmiklós</TOWN>  
        <POSTAL_CODE>2310</POSTAL_CODE>  
        <COUNTRY VALUE="HU"/>  
        <NUTS CODE="HU102"/> 
      </ADDRESS_CONTRACTOR>  
      <NO_SME/> 
    </CONTRACTOR>  
    <VAL_ESTIMATED_TOTAL CURRENCY="HUF">9000000000</VAL_ESTIMATED_TOTAL>  
    <VAL_TOTAL CURRENCY="HUF">9270494617</VAL_TOTAL> 
  </AWARDED_CONTRACT> 
</AWARD_CONTRACT>

我的代码如下:

from xml.dom import minidom

xmldoc = minidom.parse('91414-2017.xml')

award_contract = xmldoc.getElementsByTagName('AWARD_CONTRACT')
for award in award_contract:
    item_no = award.getAttribute("ITEM")
    contract_no = award.getElementsByTagName('CONTRACT_NO')[0]
    lot = award.getElementsByTagName('LOT_NO')[0]
    title = award.getElementsByTagName('TITLE')[0]
    date = award.getElementsByTagName('DATE_CONCLUSION_CONTRACT')[0]

    contractors = xmldoc.getElementsByTagName('CONTRACTOR')
    for contractor in contractors:
        name = contractor.getElementsByTagName('OFFICIALNAME')[0]
        address = contractor.getElementsByTagName('ADDRESS')[0]
        town = contractor.getElementsByTagName('TOWN')[0]
        zip_code = contractor.getElementsByTagName('POSTAL_CODE')[0]
        c = contractor.getElementsByTagName('COUNTRY')[0]
        country = c.getAttribute("VALUE")

    value = award.getElementsByTagName('VAL_TOTAL')[0]
    currency = value.getAttribute("CURRENCY")

    print(item_no, ',', contract_no.firstChild.data,',', lot.firstChild.data,
      ',', title.firstChild.data,',', date.firstChild.data,',',
      name.firstChild.data,',', address.firstChild.data,',', town.firstChild.data,',',
      zip_code.firstChild.data,',', country, ',', value.firstChild.data,',', currency)

1 个答案:

答案 0 :(得分:0)

这是因为您在内部name循环的每次迭代中都会替换变量addresstownfor等的值,所以只剩下最后一次迭代的值。由于在这种情况下您只需要这些打印值,因此您可以在内部循环内移动打印代码以及当前仍在内部循环外部的所有要打印的变量。这样它就会在变量值被下一次迭代中的新值替换之前打印出来:

for contractor in contractors:
    ......
    country = c.getAttribute("VALUE")

    value = award.getElementsByTagName('VAL_TOTAL')[0]
    currency = value.getAttribute("CURRENCY")

    print(item_no, ',', contract_no.firstChild.data,',', ...... )

例如,当你的解析任务变得更复杂时,你需要找到具有speicifc标准的元素而不是只读取每个元素,使用minidom会很痛苦,我建议lxml ,或者,至少,xml.etree