PYTHON中的XML到CSV:为每个节点提取一系列子节点

时间:2018-01-31 22:29:50

标签: python xml loops csv nodes

我的目标是将.XML文件转换为.CSV文件。 这部分代码已经正常运行。

但是,我还想提取其中一个“父”节点的子子节点。

也许一个例子会更加自我解释;

以下是我的XML结构:

import xml.etree.ElementTree as ET
import csv

tree = ET.parse("/Users/BE07861/Documents/nedis_catalog_2018-01-23_nl_BE_32191_v1-0_xml")
root = tree.getroot()

f = open('/Users/BE07861/Documents/test2.csv', 'w')

csvwriter = csv.writer(f, delimiter='ç')

count = 0

head = ['Nedis Part Number', 'Nedis Article ID', 'Vendor Part Number', 'Brand', 'EAN', 'Header text', 'Internet Text', 'General Text', 'categories']
prdlist = root[1]
prdct = prdlist[5]
cat = prdct[12]
tree1=cat[0]

csvwriter.writerow(head)

for time in prdlist.findall('product'):
    row = []
    nedis_number = time.find('nedisPartnr').text
    row.append(nedis_number)
    nedis_art_id = time.find('nedisArtlid').text
    row.append(nedis_art_id)
    vendor_part_nbr = time.find('vendorPartnr').text
    row.append(vendor_part_nbr)
    Brand = time.find('brand').text
    row.append(Brand)
    ean = time.find('EAN').text
    row.append(ean)
    header_text = time.find('headerText').text
    row.append(header_text)
    internet_text = time.find('internetText').text
    row.append(internet_text)
    general_text = time.find('generalText').text
    row.append(general_text)
    categ = time.find('categories').find('tree').find('entry').text
    row.append(categ)
    csvwriter.writerow(row)

f.close()

这是我现在的代码:

Nedis Part Number   Nedis Article ID         Vendor Part Number   
VS-150/63BA         17005              TONFREQ-ELKOS / BIPOL 150, 5390  


Brand     EAN           Header text               Internet Text 
Visaton   4,00754E+12   Crossover Foil capacitor  Bipolaire elco …



General Text              Category1    Categroy2     Category3
Dimensions 16 x 35 mm     Audio        Speakers      Accessoires

如果您运行代码,您将看到我只检索类别/树的第一个“条目”;这是正常的。但是,我不知道如何创建一个循环,为每个节点“类别”创建一个新的列,如categories1,categories2& categories3的值为:“entry”。

我的结果应该是这样的

//*[@class='ui-menu-item' and contains(text(),'Name Address Maintenance')]

我已经尽力了,但没有找到解决方案。

非常感谢任何帮助! :)

非常感谢,

阿伦

1 个答案:

答案 0 :(得分:0)

我认为这是你正在寻找的东西:

for child in time.find('categories').find('tree'):
    categ = child.text
    row.append(categ)

这是一个解决方案,它循环遍历xml一次,以确定要添加的标题数,添加标题,然后遍历每个产品的类别列表:

**已更新以迭代图像以及类别。这是最大的区别:

for child in time.find('categories').find('tree'):
    categ = child.text
    row.append(categ)
    curcat += 1

while curcat < maxcat:
    row.append('')
    curcat += 1

它将计算单个记录上的最大类别数,然后是那么多列。如果特定记录的类别较少,则此代码会将空白值粘贴在占位符中,以便列标题始终与数据对齐。

例如:

Cat1     Cat2     Cat3     Img1     Img2     Img3
A        B        C        1        2        3
D        E        <blank>  4        <blank>  <blank>

以下是完整的解决方案:

import xml.etree.ElementTree as ET
import csv

tree = ET.parse("c:\\python\\xml.xml")
root = tree.getroot()

f = open('c:\\python\\xml.csv', 'w')

csvwriter = csv.writer(f, delimiter=',')

count = 0

head = ['Nedis Part Number', 'Nedis Article ID', 'Vendor Part Number', 'Brand', 'EAN', 'Header text', 'Internet Text', 'General Text']
prdlist = root[1]

maxcat = 0
for time in prdlist.findall('product'):
    cur = 0
    for child in time.find('categories').find('tree'):
        cur += 1
    if cur > maxcat:
        maxcat = cur

for cnt in range (0, maxcat):
    head.append('Category ' + str(cnt + 1))

maximg = 0
for time in prdlist.findall('product'):
    cur = 0
    for child in time.find('images'):
        cur += 1
    if cur > maximg:
        maximg = cur

for cnt in range(0, maximg):
    head.append('Image ' + str(cnt + 1))

csvwriter.writerow(head)

for time in prdlist.findall('product'):
    row = []
    nedis_number = time.find('nedisPartnr').text
    row.append(nedis_number)
    nedis_art_id = time.find('nedisArtlid').text
    row.append(nedis_art_id)
    vendor_part_nbr = time.find('vendorPartnr').text
    row.append(vendor_part_nbr)
    Brand = time.find('brand').text
    row.append(Brand)
    ean = time.find('EAN').text
    row.append(ean)
    header_text = time.find('headerText').text
    row.append(header_text)
    internet_text = time.find('internetText').text
    row.append(internet_text)
    general_text = time.find('generalText').text
    row.append(general_text)

    curcat = 0

    for child in time.find('categories').find('tree'):
        categ = child.text
        row.append(categ)
        curcat += 1

    while curcat < maxcat:
        row.append('')
        curcat += 1

    curimg = 0

    for img in time.find('images'):
        image = img.text
        row.append(image)
        curimg += 1

    while curimg < maximg:
        row.append('')
        curimg += 1

    csvwriter.writerow(row)

f.close()