使用Python解析XML,标题和值在不同的行

时间:2015-11-02 11:40:44

标签: python xml parsing

我有以下XML文档,我想写入csv文件。

<items>
  <item>
    <attribute type="set" identifier="naadloos">
      <name locale="nl_NL">Naadloos</name>
      <value locale="nl_NL" identifier="nee">Nee</value>
    </attribute>
    <attribute type="asset" identifier="short_description">
      <value locale="nl_NL">Tom beugel bh</value>
    </attribute>
    <attribute type="text" identifier="name">
      <name locale="nl_NL">Naam</name>
      <value>Marie Jo L'Aventure Tom beugel bh</value>
    </attribute>
    <attribute type="int" identifier="is_backorder">
      <name locale="nl_NL">Backorder</name>
      <value>2</value>
    </attribute>
  </item>
</items>

如何从此格式检索数据?我需要以下输出

naadloos, short_description, name, is_Backorder
Nee, Tom beugel bh, Marie Jo L'Adventure Tom beugel bh, 2

所以我需要属性行中的标识符和值行中的文本。

有什么想法吗?

非常感谢

1 个答案:

答案 0 :(得分:0)

这是我elements attribute的{​​{1}}尝试,并dictwriter将其写入指定的文件!

import lxml.etree as et
import csv

#headers={}
xml= """<items>
  <item>
    <attribute type="set" identifier="naadloos">
      <name locale="nl_NL">Naadloos</name>
      <value locale="nl_NL" identifier="nee">Nee</value>
    </attribute>
    <attribute type="asset" identifier="short_description">
      <value locale="nl_NL">Tom beugel bh</value>
    </attribute>
    <attribute type="text" identifier="name">
      <name locale="nl_NL">Naam</name>
      <value>Marie Jo L'Aventure Tom beugel bh</value>
    </attribute>
    <attribute type="int" identifier="is_backorder">
      <name locale="nl_NL">Backorder</name>
      <value>2</value>
    </attribute>
  </item>
</items>
"""

tree = et.fromstring(xml)
header = []
for i in tree.xpath("//attribute/@identifier"):
    header.append(i)
def dicter(x):
    exp = r"//attribute[@identifier='%s']/value/text()"%x
    tmp = ''.join(tree.xpath(exp))
    d = [x,tmp]
    return d
data = dict(dicter(i) for i in header)
#Now write data into file
with open(r"C:\Users\User_Name\Desktop\output.txt",'wb') as wrt:
    writer = csv.DictWriter(wrt,header)
    writer.writeheader()
    writer.writerow(data)

书面文件内容 -

naadloos,short_description,name,is_backorder
Nee,Tom beugel bh,Marie Jo L'Aventure Tom beugel bh,2