我希望在标签<en>
之间打印任何文字,只要x =&#39; PERS&#39;,我在下面尝试过,但输出不是我想要的。
XML示例
<Text>
<PHRASE>
<en x='PERS'> John </en>
<V> Went </V>
<prep> to </prep>
<V> meet </V>
<en x='PERS'> Alex </en>
</PHRASE>
<PHRASE>
<en x='PERS'> Mark </en>
<V> lives </V>
<prep> in </prep>
<en x='LOC'> Florida </en>
</PHRASE>
<PHRASE>
<en x='PERS'> Nick </en>
<V> visited</V>
<en x='PERS'> Anna </en>
</PHRASE>
</TEXT>
我想要输出: John-Alex,Nick-Anna。 但我得到了:Mark-Mark。这意味着我只想在一个短语出现时打印2个PERS
这是我写的代码,我使用了元素树。
import xml.etree.ElementTree as ET
tree = ET.parse('output.xml')
root = tree.getroot()
print("------------------------PERS-PERS-------------------------------")
PERS_PERScount=0
for phrase in root.findall('./PHRASE'):
ens = {en.get('x'): en.text for en in phrase.findall('en')}
if 'PERS' in ens and 'PERS' in ens:
print("PERS is: {}, PERS is: {} /".format(ens["PERS"], ens["PERS"]))
#print(ens["ORG"])
#print(ens["PERS"])
PERS_PERScount = PERS_PERScount + 1
print("Number of PERS-PERS relation", PERS_PERScount)
我不确定问题是打印还是if条件,或两者都有?!
答案 0 :(得分:1)
只有当属性if
等于en
的 x
元素的数量为时,您才可以添加简单的"PERS"
检查以进行增量和打印2(一对):
for phrase in root.findall('./PHRASE'):
# get all inner text of elements where `x` attribute equals `"PERS"`
names = [p.text.strip() for p in phrase.findall('./en[@x="PERS"]')]
# if therea are 2 of them, increment counter and print
if len(names) == 2:
PERS_PERScount += 1
print('-'.join(names))
print("Number of PERS-PERS relation: ", PERS_PERScount)
<强> eval.in demo
强>
输出
John-Alex
Nick-Anna
Number of PERS-PERS relation: 2
答案 1 :(得分:0)
此:
#!/usr/bin/env python3
import xml.etree.ElementTree as ET
tree = ET.parse('output.xml')
root = tree.getroot()
print("------------------------PERS-PERS-------------------------------")
for phrase in root:
if phrase.tag == 'PHRASE':
collected_names = []
for elt in phrase:
if elt.tag == 'en':
if 'x' in elt.attrib and elt.attrib['x'] == 'PERS':
collected_names += [elt.text]
if len(collected_names) >= 2:
print(collected_names[0] + " - " + collected_names[1])
将输出:
$ ./test_script
------------------------PERS-PERS-------------------------------
John - Alex
Nick - Anna
但我不确定这是你想要的方式。