我必须获取特定标签内的每个标签和值。
例如:
<xml>
<new>
<post>
<text>New Text</text>
<category>New Category</category>
</post>
</new>
<specific>
<line> Line.... </line>
New Line ends ......!!!!
</specific>
Python脚本:
root = et.fromstring('Xml from path')
target_elements = root.findall('.//post')
如果我给出标签手段,我需要输出为:
预期产出:
<text>New Text</text>
<category>New Category</category>
对于标签:
输出:
<line> Line.... </line>
New Line ends ......!!!!
答案 0 :(得分:0)
注意:XML片段末尾缺少</xml>
标记。
content = """\
<xml>
<new>
<post>
<text>New Text</text>
<category>New Category</category>
</post>
</new>
<specific>
<line> Line.... </line>
New Line ends ......!!!!
</specific>
</xml>"""
使用lxml时没有真正的困难:
from lxml import etree
root = etree.XML(content)
for elem in root.findall(".//post"):
for child in iter(elem):
print(child.tag + ": " + child.text)
如果要将XML片段输出为字符串,只需使用tostring
函数:
for elem in root.findall(".//post"):
for child in iter(elem):
print(etree.tostring(child, encoding="unicode", with_tail=False))
你会得到:
<text>New Text</text>
<category>New Category</category>
要进一步了解,请阅读在线教程:http://lxml.de/tutorial.html
答案 1 :(得分:0)
我会选择Beautifulsoup
from bs4 import BeautifulSoup
xml_doc = '''<xml>
<new>
<post>
<text>New Text</text>
<category>New Category</category>
</post>
</new>
<specific>
<line> Line.... </line>
New Line ends ......!!!!
</specific>'''
soup = BeautifulSoup(xml_doc)
print(soup.find_all('post'))
输出:
[<post>
<text>New Text</text>
<category>New Category</category>
</post>]