我在Python 2.6中使用lxml
库从xml文件中提取数据。在文档中,我有许多<Employee>
标记。我遍历每个<Employee>
标记,创建我的Employee
类的新实例,并使用Employee
标记的值设置其成员变量。
read_CA_tree = etree.parse(xml_tree, parser)
all_employees = []
for employee_tag in read_CA_tree.iter("Employee"):
employee = Employee(employee_tag)
all_employees.append(employee)
<Employee>
标记也可能包含一个或多个<EmailAddress>
子标记,如下所示:
<Employee ID="124" Name="Foo Bar" Title="Baz">
<EmailAddress ID="124" Address="foobar@fizzbang.com" />
</Employee>
我的Employee对象通过lxml Element
调用get()
方法实例化
class Employee(object):
def __init__(self, employee_tag):
self.Employee_ID = employee_tag.get("EmployeeID")
self.First_Name = employee_tag.get("FirstName")
self.Email_Addresses = self._collect_email(read_CA_tree, "EmailAddress")
def _collect_emails(self,tree,tag):
known_addr = []
for i in tree.iter(tag):
known_addr.append(i)
return known_addr
对于每个Employee
代码,如何在子Address
代码中收集<EmailAddress>
的值,并向我的Employee
添加电子邮件地址列表类构造函数?
答案 0 :(得分:2)
元素将属性作为dict
所以,你可以尝试:
def _collect_emails(self,tree,tag):
known_addr = []
email_addr = []
for i in tree.iter(tag):
known_addr.append(i)
email_addr.append(i.get('Address', '')
return known_addr