Question

您好，我的XML文件是这样的，有人可以帮助我从XML文件中获取特定标记：

           <A1>
            <A><B>TEST1</B></A>
            <A><B>TEST2</B></A>
            <A><B>TEST3</B></A>
           </A1>

            <A1>
            <A><B>TEST4</B></A>
            <A><B>TEST5</B></A>
            <A><B>TEST6</B></A>
           </A1>

直到现在我在python中处理它：

              for A in A1.findall('A'):
                   B = A.find('B').text
                   print B

      print B is giving me output like this:

          Test1
          Test2
          Test3
          Test4
          Test5
          Test6


   I want output from only first tag like this:

          Test1
          Test4


   What changes should I do to make it work?

Answer 1

好的，我们再试一次。因此，在修订之后，我们想要搜索doc，并且每次出现父标记（A1）时，我们都想获取每个集合中第一个标记的内容。

让我们尝试一个递归函数：

xmlData = open('xml.txt').readlines()
xml = ''.join(xmlData)

def grab(xml):
        """ Recursively walks through the whole XML data until <A1> is not found"""

        # base case, if the parent tag (<A1>) isn't there, then return
    if xml.find('<A1>') == -1:
        return 
    else:
                # find the location of the parent tag
        open_parent = xml.find('<A1>')
        close_parent = open_parent + 4

        # find the first child tag
        open_child = xml.find('<a><b>', close_parent)
        close_child = xml.find('</b></a>', open_child)

                # grab the data within that tag
        child_data = xml[open_child + 6 : close_child]
        print(child_data)

                # recursively call the grab() function
        return grab(xml[close_child:])

出于兴趣，您是否已经有了一个解决方案，您不介意分享？

使用python从多个具有相同名称的XML文件的标签打印选定的部分

1 个答案: