Python Beautifulsoup XML Tags with Same Name

时间:2018-04-18 17:53:41

标签: python xml beautifulsoup

Suppose I have the following minimal xml with a nested hierarchy. How can I isolate the first occurrence and then isolate the subsequent, nested occurrences?

<test name='something'>
<tag max='10' min='20'>
    <tag max='5' min='20'/>
    <tag max='5' min='20'/>
</first>

Ideally, I would be able to parse out the information from the first tag and then parse the information from the nested tags.

I have tried utilizing the contents of the first tag, but I get all nested tags as well.

Expected output would be:

  1. <tag max='10' min='20'>
  2. <tag max='5' min='20'/> <tag max='5' min='20'/>

1 个答案:

答案 0 :(得分:1)

我尽力使用您提供的XML。我假设你提供了一个不完整的XML。

我使用BeautifulSoup中的decompose()功能来帮助您实现目标。

代码:

from bs4 import BeautifulSoup
import requests

data = '''
<test name='something'>
<tag max='10' min='20'>
    <tag max='5' min='20'/>
    <tag max='5' min='20'/>
</first>
'''

soup = BeautifulSoup(data, 'html.parser')
[print(i) for i in soup.find_all('tag', max='5')]
print('*********************************')
[i.decompose() for i in soup.find_all('tag', max='5')]
print(soup.find('tag', max='10'))

输出:

<tag max="5" min="20"></tag>
<tag max="5" min="20"></tag>
*********************************
<tag max="10" min="20">


</tag>