Suppose I have the following minimal xml with a nested hierarchy. How can I isolate the first occurrence and then isolate the subsequent, nested occurrences?
<test name='something'>
<tag max='10' min='20'>
<tag max='5' min='20'/>
<tag max='5' min='20'/>
</first>
Ideally, I would be able to parse out the information from the first tag and then parse the information from the nested tags.
I have tried utilizing the contents
of the first tag, but I get all nested tags as well.
Expected output would be:
<tag max='10' min='20'>
<tag max='5' min='20'/>
<tag max='5' min='20'/>
答案 0 :(得分:1)
我尽力使用您提供的XML。我假设你提供了一个不完整的XML。
我使用BeautifulSoup中的decompose()
功能来帮助您实现目标。
代码:
from bs4 import BeautifulSoup
import requests
data = '''
<test name='something'>
<tag max='10' min='20'>
<tag max='5' min='20'/>
<tag max='5' min='20'/>
</first>
'''
soup = BeautifulSoup(data, 'html.parser')
[print(i) for i in soup.find_all('tag', max='5')]
print('*********************************')
[i.decompose() for i in soup.find_all('tag', max='5')]
print(soup.find('tag', max='10'))
输出:
<tag max="5" min="20"></tag>
<tag max="5" min="20"></tag>
*********************************
<tag max="10" min="20">
</tag>