使用python从多个具有相同名称的XML文件的标签打印选定的部分

时间:2013-03-18 09:39:43

标签: python xml-parsing

  1. 您好,            我的XML文件是这样的,有人可以帮助我            从XML文件中获取特定标记:

               <A1>
                <A><B>TEST1</B></A>
                <A><B>TEST2</B></A>
                <A><B>TEST3</B></A>
               </A1>
    
                <A1>
                <A><B>TEST4</B></A>
                <A><B>TEST5</B></A>
                <A><B>TEST6</B></A>
               </A1>
    
  2. 直到现在我在python中处理它:

                  for A in A1.findall('A'):
                       B = A.find('B').text
                       print B
    
          print B is giving me output like this:
    
              Test1
              Test2
              Test3
              Test4
              Test5
              Test6
    
    
       I want output from only first tag like this:
    
              Test1
              Test4
    
    
       What changes should I do to make it work?
    

1 个答案:

答案 0 :(得分:0)

好的,我们再试一次。因此,在修订之后,我们想要搜索doc,并且每次出现父标记(A1)时,我们都想获取每个集合中第一个标记的内容。

让我们尝试一个递归函数:

xmlData = open('xml.txt').readlines()
xml = ''.join(xmlData)

def grab(xml):
        """ Recursively walks through the whole XML data until <A1> is not found"""

        # base case, if the parent tag (<A1>) isn't there, then return
    if xml.find('<A1>') == -1:
        return 
    else:
                # find the location of the parent tag
        open_parent = xml.find('<A1>')
        close_parent = open_parent + 4

        # find the first child tag
        open_child = xml.find('<a><b>', close_parent)
        close_child = xml.find('</b></a>', open_child)

                # grab the data within that tag
        child_data = xml[open_child + 6 : close_child]
        print(child_data)

                # recursively call the grab() function
        return grab(xml[close_child:])

出于兴趣,您是否已经有了一个解决方案,您不介意分享?