我想分割节点的文本,然后将它们中的每一个放入独立的元素中

时间:2013-01-31 13:05:14

标签: python lxml elementtree

有一个像这样的xml文件:

sample.xml中

<root>
    <keyword_group>
        <headword>sell/buy</headword>
    </keyword_group>
</root>

我想用'/'分割headword.text然后用标签包装它们。最后我需要删除标签。我期望的输出是:

<root>
    <keyword_group>
        <word>sell</word>
        <word>buy</word>
    </keyword_group>
</root>

我丑陋的剧本是:

import lxml.etree as ET

xml = '''\
<root>
    <keyword_group>
        <headword>sell/buy</headword>
    </keyword_group>
</root>
'''

root = ET.fromstring(xml)
headword = root.find('.//headword')
if headword is not None:
    words = headword.text.split('/')
    for word in words:
        ET.SubElement(headword, 'word')
        for wr in headword.iter('word'):
            if not wr.text:
                wr.text = word
    headword.text = ''

print(ET.tostring(root, encoding='unicode'))

但这太复杂了,我没能删除词条标签。

1 个答案:

答案 0 :(得分:2)

使用lxml

import lxml.etree as ET

xml = '''\
<root>
    <keyword_group>
        <headword>sell/buy</headword>
    </keyword_group>
</root>
'''

root = ET.fromstring(xml)
headword = root.find('.//headword')
if headword is not None:
    words = headword.text.split('/')
    parent = headword.getparent()
    parent.remove(headword)
    for word in words:
        ET.SubElement(parent, 'word').text = word

print(ET.tostring(root, encoding='unicode'))

产量

<root>
    <keyword_group>
        <word>sell</word><word>buy</word></keyword_group>
</root>