Question

我有一个XML文件，其中包含书籍记录，其中包含作者，出版日期，标签等标签。我要解析这个文件来制作3个列表，一个将有书名，另一个列表中有作者，最后是第三个列表中的标签，稍后我将使用openpyxl将这些列表写入Excel列。问题是某些图书记录没有标签标签。使用Beautiful soup的常规解析技术将产生具有相同长度的前两个列表，但标签列表将具有更短的长度。

我有三个问题：

1-如何创建长度相等的所有三个列表（没有标签标签的书籍的空条目？ 2-标签清单看起来像这样['能源;绿色建筑;高性能建筑'，'计算'，'计算;设计;绿色建筑'，......]。我创建了另外15列，标题为我所拥有的标签名称，例如“Computing”和“Design”。如果书籍包含特定标签，我是否可以使用openpyXL为书籍标签组合创建X标记或彩色单元格，例如，如果第5行中标题为“Architecture”的书具有“Design”标签，我需要在单元格中有一个X标记或有色单元格（row'5'，col'Design'）。 3-有没有更简单的方法来实现这一点（解析XML文件并在Excel中有效编写）？

以下是XML文件和我编写的代码的快照（也可以从此处下载XML文件和Python文件：http://www.ranialabib.com/#!python/icfwa

import xml.etree.ElementTree as ET
fhand = open('My_Collection.xml')
data = fhand.read()
Label_lst=list()
for record in tree.find("records/record") :
    label = record.find("label")

for l in label:    
        if label is not None: label = label_lst.append(label.text)
    else:
        label = label_lst.append(' ') 
print label_lst

这是我根据查理的建议编写的代码，代码不起作用。我收到一条错误消息“TypeError：'NoneType'对象不可迭代”。我不知道问题是什么。另外，如何在一个列表中获取每个记录的所有3个标签（标题，年份，标签）的文本，以及使用openpylx将如此大量的列表（200个书籍的200个列表）写入Excel是多么容易？ / p>

$event

Answer 1

如果要保留记录结构，则应逐个记录解析而不是仅创建属性列表。您可以遍历记录并提取相关字段或for record in parsed_xml.find("records/record"); label = record.find("label"); if label is not None: label = label.text然后您可以直接将行写入Excel而无需压缩列。

Answer 2

我只是想通了。我仍然使用了专栏。

from openpyxl import Workbook 
import xml.etree.ElementTree as ET



fhand = open ('My_Collection')    
tree =ET.parse('My_Collection.xml')
data= fhand.read()
root = tree.getroot()
tree = ET.fromstring(data)

title_list= ['Title']
year_list = ['Year']
author_list= ['Author']
label_list = ['Label']



for child in tree:
    for children in child:
        if children.find('.//title')is None :
            t='N'
        else:
            t=children.find('.//title').text
        title_list.append(t)
    print title_list
    print len(title_list)


for child in tree:
    for children in child:
        if children.find('.//year')is None :
            y='N'
        else:
            y=children.find('.//year').text
        year_list.append(y)
    print year_list
    print len(year_list)


for child in tree:
    for children in child:
        if children.find('.//author')is None :
            a='N'
        else:
            a=children.find('.//author').text
        author_list.append(a)
    print author_list
    print len(author_list)

for child in tree:
    for children in child:
        if children.find('label')is None :
            l='N'
        else:
            l=children.find('label').text
        label_list.append(l)
    print label_list
print len(author_list) 

for item in label_list:





wb = Workbook() 
ws = wb.active 

for row in zip(title_list, year_list, author_list, label_list): 
        ws.append(row) 

wb.save("Test3.xlsx")

解析不等数量的标签的XML以制作相等长度的列表。 openpyxl和Beautifulsoup

2 个答案: