Question

说我有以下XML：

<foo>
<spam taste="great"> stuff</spam> <spam taste="moldy"> stuff</spam>
<bar taste="eww"> stuff </bar> <bar> stuff </bar> 
<bacon taste="yum"> stuff </bacon><bacon taste="yum"> stuff </bacon><bacon taste="yum"> stuff </bacon>
</foo>

垃圾邮件，酒吧和培根是里面有更多标签的数据标签，我想把XML拆分成这个

<spam taste="great"> stuff</spam> <spam taste="moldy"> stuff</spam>，
<bar taste="eww"> stuff </bar> <bar> stuff </bar>，
<bacon taste="yum"> stuff </bacon><bacon taste="yum"> stuff </bacon><bacon taste="yum"> stuff </bacon>，

为了重新排序以进行解析。

这样的基本结构，块可以是任何顺序。

<foo>
block of bar tags
block of spam tags
block of bacon tags
</foo>

Answer 1

如果您不知道标签在运行时的名称+只是想按组分解元素，您可以尝试将itertools.groupby与您想要的任何xml解析库结合使用：

[[<Element 'spam' at 0x218ecb0>, <Element 'spam' at 0x218ee10>], 
 [<Element 'bar' at 0x218ee90>, <Element 'bar' at 0x218eeb0>], 
 [<Element 'bacon' at 0x218ef30>, <Element 'bacon' at 0x218ef50>, <Element 'bacon' at 0x218ef90>], 
 [<Element 'spam' at 0x218efd0>]]

输出将是：

print [[et.tostring(element) for element in group] for group in groups]

如果您需要实际的字符串值，可以执行以下操作：

[['<spam taste="great"> stuff</spam> ', '<spam taste="moldy">stuff</spam>\n'],
 ['<bar taste="eww"> stuff </bar> ', '<bar> stuff </bar> \n'], 
 ['<bacon taste="yum"> stuff </bacon>', '<bacon taste="yum"> stuff </bacon>', '<bacon taste="yum">stuff </bacon>\n'], 
 ['<spam taste="Great">stuff2</spam>\n']]

......哪能得到你：

var $all_msg = $('#welcome_msg');
        function animate(i) {
            $all_msg.hide();
            $all_msg.text.each(function(index) {
                $(this).delay(700 + index).fadeIn(1100);
            })
        }

Answer 2

你看过ElementTree methods吗？

import xml.etree.ElementTree as ET

document = ET.parse("file.xml")
spams = document.findall("spam")
bars = document.findall("bar")
bacon = 'document.findall("bacon")

如何将xml文档拆分为某个标记之间的字符串？

2 个答案: