我正在尝试在XML文件中提取特定标记的内容。
示例XML:
<facts>
<fact>
<name>crash</name>
<full_name>Crash</full_name>
<variables>
<variable>
<name>id</name>
<proper_name>Crash Instance</proper_name>
<type>INT</type>
<interpretation>key</interpretation>
</variable>
<variable>
<name>accident_key</name>
<proper_name>Case Identifier</proper_name>
<interpretation>string</interpretation>
<type>CHAR(9)</type>
</variable>
<variable>
<name>accident_year</name>
<proper_name>Crash Year</proper_name>
<interpretation>dim</interpretation>
<type>INT</type>
</variable>
</variables>
</fact>
<fact>
<name>vehicle</name>
<full_name>Vehicle</full_name>
<variables>
<variable>
<name>id</name>
<proper_name>Vehicle Instance</proper_name>
<type>INT</type>
</variable>
<variable>
<name>crash_id</name>
<proper_name>Crash Instance</proper_name>
<type>INT</type>
</variable>
</variables>
</fact>
</facts>
我想从节点中提取标记的所有内容,但仅限于崩溃事实。
到目前为止,这是我的代码。
def header(filename, fact):
lst = []
tree = ET.parse(filename) #read in the XML
for fact in tree.iter(tag = 'fact'):
factname = fact.find('name').text
if factname == fact: #choose the fact to pull from
for var in fact.iter(tag = 'variable'):
name = var.find('name').text
lst.append(name)
return lst #return a list of all the <name> tags from the Crash fact
newlst = header('schema.xml','crash')
我的输出newlst应该是Crash事实中所有标签的列表。但它一直空着。
奇怪的是,如果我对所有内容进行硬编码(并删除函数),它会返回正确的输出:
lst = []
tree = ET.parse('schema.xml')
for fact in tree.iter(tag = 'fact'):
factname = fact.find('name').text
if factname == 'crash':
for var in fact.iter(tag = 'variable'):
name = var.find('name').text
lst.append(name)
print(lst)
Output: ['id',
'accident_key',
'accident_year']
答案 0 :(得分:4)
在函数中,您将变量fact
用作参数,并将其作为第一个for
循环变量。试试这个版本:
def header(filename, target_factname):
lst = []
tree = ET.parse(filename) #read in the XML
for fact in tree.iter(tag = 'fact'):
factname = fact.find('name').text
if factname == target_factname: #choose the fact to pull from
for var in fact.iter(tag = 'variable'):
name = var.find('name').text
lst.append(name)
return lst #return a list of all the <name> tags from the Crash fact