如何将xml文档拆分为某个标记之间的字符串?

时间:2015-06-14 17:11:53

标签: python xml xml-parsing

说我有以下XML:

<foo>
<spam taste="great"> stuff</spam> <spam taste="moldy"> stuff</spam>
<bar taste="eww"> stuff </bar> <bar> stuff </bar> 
<bacon taste="yum"> stuff </bacon><bacon taste="yum"> stuff </bacon><bacon taste="yum"> stuff </bacon>
</foo>

垃圾邮件,酒吧和培根是里面有更多标签的数据标签,我想把XML拆分成这个

  • <spam taste="great"> stuff</spam> <spam taste="moldy"> stuff</spam>
  • <bar taste="eww"> stuff </bar> <bar> stuff </bar>
  • <bacon taste="yum"> stuff </bacon><bacon taste="yum"> stuff </bacon><bacon taste="yum"> stuff </bacon>

为了重新排序以进行解析。

这样的基本结构,块可以是任何顺序。

<foo>
block of bar tags
block of spam tags
block of bacon tags
</foo>

2 个答案:

答案 0 :(得分:1)

如果您不知道标签在运行时的名称+只是想按组分解元素,您可以尝试将itertools.groupby与您想要的任何xml解析库结合使用:

[[<Element 'spam' at 0x218ecb0>, <Element 'spam' at 0x218ee10>], 
 [<Element 'bar' at 0x218ee90>, <Element 'bar' at 0x218eeb0>], 
 [<Element 'bacon' at 0x218ef30>, <Element 'bacon' at 0x218ef50>, <Element 'bacon' at 0x218ef90>], 
 [<Element 'spam' at 0x218efd0>]]

输出将是:

print [[et.tostring(element) for element in group] for group in groups]

如果您需要实际的字符串值,可以执行以下操作:

[['<spam taste="great"> stuff</spam> ', '<spam taste="moldy">stuff</spam>\n'],
 ['<bar taste="eww"> stuff </bar> ', '<bar> stuff </bar> \n'], 
 ['<bacon taste="yum"> stuff </bacon>', '<bacon taste="yum"> stuff </bacon>', '<bacon taste="yum">stuff </bacon>\n'], 
 ['<spam taste="Great">stuff2</spam>\n']]

......哪能得到你:

var $all_msg = $('#welcome_msg');
        function animate(i) {
            $all_msg.hide();
            $all_msg.text.each(function(index) {
                $(this).delay(700 + index).fadeIn(1100);
            })
        }

答案 1 :(得分:0)

你看过ElementTree methods吗?

import xml.etree.ElementTree as ET

document = ET.parse("file.xml")
spams = document.findall("spam")
bars = document.findall("bar")
bacon = 'document.findall("bacon")