Question

迫切需要帮助。我是Python的初学者，并且已经尝试了几天（和晚上）这样做但没有成功。拥有大型xml文件，其中包含具有子元素（即属性）和可变子子元素（即。attributeValue）的元素（即帐户）。由于子子元素是可变的，所以我不知道如何在需要拾取所有内容并将其放入.csv时进行深入研究。因此，根据帐户，可能会有很多记录。我想要一个带有帐户ID的行，后跟属性名称，然后是属性值。如果一个帐户具有许多属性，则可以有多行。

非常感谢您提供的任何帮助！ :)

<?xml version="1.0" encoding="UTF-8"?>
<rbacx>
  <namespace namespaceName="ABC RSS : xxxxxxx" namespaceShortName="RSS" />
  <attributeValues />
  <accounts>
    <account id="AAGALY2">
      <name>AAGALY2</name>
      <endPoint>ABCD</endPoint>
      <domain>ABCD</domain>
      <comments />
      <attributes>  ### one account can have many attribute records
        <attribute name="appUserName">
          <attributeValues>
            <attributeValue>
              <value><![CDATA[A, Agglya]]></value>
            </attributeValue>
          </attributeValues>
        </attribute>
        <attribute name="costCentre">
          <attributeValues>
            <attributeValue>
              <value><![CDATA[6734]]></value>
            </attributeValue>
          </attributeValues>
        </attribute>
        <attribute name="App ID">
          <attributeValues>
            <attributeValue>
              <value><![CDATA[AAGALY2]]></value>
            </attributeValue>
          </attributeValues>
        </attribute>
        <attribute name="Last Access Date">
          <attributeValues>
            <attributeValue>
              <value><![CDATA[00000000]]></value>

etc......

希望csv看起来像这样：

AcctName   Endpoint     Domain     AttribName     AttribValue
AAGALY2     ABCD        ABCD       appUserName    A, Agalya
AAGALY2     ABCD        ABCD       CostCentre     333333
AAGALY2     ABCD        ABCD       App ID         AAGALY2
AAGALY2     ABCD        ABCD       Jobtemplate    A12-can read
JSMITH1     EFG         ABCD       appUserName    J, Smith
JSMITH1     ABCD        ABCD       CostCentre     12345
JSMITH1     ABCD        ABCD       Jobtemplate    A22-perm to write
ZZMITH3     EFG         GHI        appUserName    Z, Zmith
ZZMITH3     EFG         GHI        CostCentre     3456

Answer 1

如果xml etree没有帮助，我发现xmltodict是一个非常简单的方法来解决xml解析。

那么您的代码可能是什么样的：

import xmltodict
import csv

xmldict = xmltodict.parse(yourxml)

f = csv.writer(open('yourcsv.csv', "w"))

#write field names to file keys of the dict, or you can specify the ones you outlined in your output eg.
f.writerow(xmldict.keys())

#write the contents
for key in xmldict:
    f.writerow(key['attrs'], key['attrs'] etc. etc.)

你显然必须根据xml的嵌套进行映射并访问你想要的'attrs'，但它应该通过dict结构非常直接。希望这有帮助！

Python -parse xml，带有可变嵌套元素到csv中

1 个答案: