有一个像这样的xml文件:
sample.xml中
<root>
<keyword_group>
<headword>sell/buy</headword>
</keyword_group>
</root>
我想用'/'分割headword.text然后用标签包装它们。最后我需要删除标签。我期望的输出是:
<root>
<keyword_group>
<word>sell</word>
<word>buy</word>
</keyword_group>
</root>
我丑陋的剧本是:
import lxml.etree as ET
xml = '''\
<root>
<keyword_group>
<headword>sell/buy</headword>
</keyword_group>
</root>
'''
root = ET.fromstring(xml)
headword = root.find('.//headword')
if headword is not None:
words = headword.text.split('/')
for word in words:
ET.SubElement(headword, 'word')
for wr in headword.iter('word'):
if not wr.text:
wr.text = word
headword.text = ''
print(ET.tostring(root, encoding='unicode'))
但这太复杂了,我没能删除词条标签。
答案 0 :(得分:2)
使用lxml:
import lxml.etree as ET
xml = '''\
<root>
<keyword_group>
<headword>sell/buy</headword>
</keyword_group>
</root>
'''
root = ET.fromstring(xml)
headword = root.find('.//headword')
if headword is not None:
words = headword.text.split('/')
parent = headword.getparent()
parent.remove(headword)
for word in words:
ET.SubElement(parent, 'word').text = word
print(ET.tostring(root, encoding='unicode'))
产量
<root>
<keyword_group>
<word>sell</word><word>buy</word></keyword_group>
</root>