使用精美的汤解析XML-循环/展平数据

时间:2018-09-06 16:00:35

标签: python beautifulsoup xml-parsing

我有一个看起来像这样的xml文件

<xml>
   <description> this is a description</description>
       <foo>
          <main_title>this is a main title</main_title>
             <listings>
                 <a_listing>
                     <sub_title>this is a sub title</sub_title>
                        <info name = "Bob" />
                        <info name = "Ann"/>
                 </a_listing>

        <a_listing>
             <sub_title>this is a different sub title</sub_title>
                  <info name = “Peter” />
                  <info name = “Steve”/>
         </a_listing>

              </listings>
        </foo>

    <foo>
        <main_title>this is another main title</main_title>
            <listings>
               <a_listing>
                  <sub_title>this is another sub title</sub_title>
                     <info name = "Dave" />
               </a_listing>
             </listings>
        </foo>

</xml>

我希望能够展平该结构,使其看起来像这样

this is a main title | this is a sub title | bob 
this is a main title | this is a sub title | Ann
this is a main title |  this is a different sub title | Peter 
this is a main title |  this is a different sub title | Steve
this is another main title | this is another sub title | Dave

此刻我正在使用beautifulsoup。

我已经走了

parse = soup.foo.children

for i in parse:
    print(i)
在这种情况下,

i会给我一部分XML。我正在努力进入各个部分以根据需要展平数据

任何帮助将不胜感激!谢谢

1 个答案:

答案 0 :(得分:1)

尝试这个

from bs4 import BeautifulSoup

file = open('./test.xml')
data = "\n".join(file.readlines())

soup = BeautifulSoup(data, "lxml")

titles = soup.find_all('main_title')

for title in titles:
    lsts = title.parent.find_all('a_listing')

    for lst in lsts:
        infos = lst.find_all('info')

        for info in infos:
            print(f"{title.text} | {lst.sub_title.text} | {info['name']}")