使用Python lxml收集子标记的值

时间:2015-11-11 16:15:57

标签: python xml lxml

我在Python 2.6中使用lxml库从xml文件中提取数据。在文档中,我有许多<Employee>标记。我遍历每个<Employee>标记,创建我的Employee类的新实例,并使用Employee标记的值设置其成员变量。

    read_CA_tree = etree.parse(xml_tree, parser)
    all_employees = []
    for employee_tag in read_CA_tree.iter("Employee"):
        employee = Employee(employee_tag)
        all_employees.append(employee)

<Employee>标记也可能包含一个或多个<EmailAddress>子标记,如下所示:

<Employee ID="124" Name="Foo Bar" Title="Baz">
   <EmailAddress ID="124" Address="foobar@fizzbang.com" />
 </Employee>

我的Employee对象通过lxml Element调用get()方法实例化

class Employee(object):

    def __init__(self, employee_tag):
        self.Employee_ID = employee_tag.get("EmployeeID")
        self.First_Name = employee_tag.get("FirstName")
        self.Email_Addresses = self._collect_email(read_CA_tree, "EmailAddress")

    def _collect_emails(self,tree,tag):
        known_addr = []
        for i in tree.iter(tag):
            known_addr.append(i)
        return known_addr

对于每个Employee代码,如何在子Address代码中收集<EmailAddress>的值,并向我的Employee添加电子邮件地址列表类构造函数?

1 个答案:

答案 0 :(得分:2)

From the dox

  

元素将属性作为dict

所以,你可以尝试:

def _collect_emails(self,tree,tag):
    known_addr = []
    email_addr = []
    for i in tree.iter(tag):
        known_addr.append(i)
        email_addr.append(i.get('Address', '')
    return known_addr