您好, 我的XML文件是这样的,有人可以帮助我 从XML文件中获取特定标记:
<A1>
<A><B>TEST1</B></A>
<A><B>TEST2</B></A>
<A><B>TEST3</B></A>
</A1>
<A1>
<A><B>TEST4</B></A>
<A><B>TEST5</B></A>
<A><B>TEST6</B></A>
</A1>
直到现在我在python中处理它:
for A in A1.findall('A'):
B = A.find('B').text
print B
print B is giving me output like this:
Test1
Test2
Test3
Test4
Test5
Test6
I want output from only first tag like this:
Test1
Test4
What changes should I do to make it work?
答案 0 :(得分:0)
好的,我们再试一次。因此,在修订之后,我们想要搜索doc,并且每次出现父标记(A1)时,我们都想获取每个集合中第一个标记的内容。
让我们尝试一个递归函数:
xmlData = open('xml.txt').readlines()
xml = ''.join(xmlData)
def grab(xml):
""" Recursively walks through the whole XML data until <A1> is not found"""
# base case, if the parent tag (<A1>) isn't there, then return
if xml.find('<A1>') == -1:
return
else:
# find the location of the parent tag
open_parent = xml.find('<A1>')
close_parent = open_parent + 4
# find the first child tag
open_child = xml.find('<a><b>', close_parent)
close_child = xml.find('</b></a>', open_child)
# grab the data within that tag
child_data = xml[open_child + 6 : close_child]
print(child_data)
# recursively call the grab() function
return grab(xml[close_child:])
出于兴趣,您是否已经有了一个解决方案,您不介意分享?