Question

Suppose I have the following minimal xml with a nested hierarchy. How can I isolate the first occurrence and then isolate the subsequent, nested occurrences?

<test name='something'>
<tag max='10' min='20'>
    <tag max='5' min='20'/>
    <tag max='5' min='20'/>
</first>

Ideally, I would be able to parse out the information from the first tag and then parse the information from the nested tags.

I have tried utilizing the contents of the first tag, but I get all nested tags as well.

Expected output would be:

<tag max='10' min='20'>
<tag max='5' min='20'/> <tag max='5' min='20'/>

Answer 1

我尽力使用您提供的XML。我假设你提供了一个不完整的XML。

我使用BeautifulSoup中的decompose()功能来帮助您实现目标。

代码：

from bs4 import BeautifulSoup
import requests

data = '''
<test name='something'>
<tag max='10' min='20'>
    <tag max='5' min='20'/>
    <tag max='5' min='20'/>
</first>
'''

soup = BeautifulSoup(data, 'html.parser')
[print(i) for i in soup.find_all('tag', max='5')]
print('*********************************')
[i.decompose() for i in soup.find_all('tag', max='5')]
print(soup.find('tag', max='10'))

输出：

<tag max="5" min="20"></tag>
<tag max="5" min="20"></tag>
*********************************
<tag max="10" min="20">


</tag>

Python Beautifulsoup XML Tags with Same Name

1 个答案: