如何使用BeautifulSoup4在没有(唯一)类的标签之间提取文本“ Joanna K. Rowling”?
<li class="Feature-item">
<span class="Feature-label"><span>Auteur</span></span>
<span class="Feature-desc"><span >Joanne K. Rowling</span></span>
</li>
<li class="Feature-item">
<span class="Feature-label"><span>Helden</span></span>
<span class="Feature-desc"><span ><a href="url">Harry Potter</a></span></span>
</li>
<li class="Feature-item">
<span class="Feature-label"><span>Uitgeverij</span></span>
<span class="Feature-desc"><span ><a href="url">Bloomsbury Libri</a></span></span>
</li>
有什么建议吗?
答案 0 :(得分:2)
from bs4 import BeautifulSoup as bs
html = '''<li class="Feature-item">
<span class="Feature-label"><span>Auteur</span></span>
<span class="Feature-desc"><span >Joanne K. Rowling</span></span>
</li>
<li class="Feature-item">
<span class="Feature-label"><span>Helden</span></span>
<span class="Feature-desc"><span ><a href="url">Harry Potter</a></span></span>
</li>
<li class="Feature-item">
<span class="Feature-label"><span>Uitgeverij</span></span>
<span class="Feature-desc"><span ><a href="url">Bloomsbury Libri</a></span></span>
</li>'''
soup = bs(html, 'lxml')
names = soup.findAll('span', {'class':'Feature-desc'})
for name in names:
name = name.find('span').get_text().strip()
print(name)
#Output:
Joanne K. Rowling
Harry Potter
Bloomsbury Libri
答案 1 :(得分:0)
您还可以使用split
保存到字典中,因为文本前后都有\n
。 ['\nAuteur\nJoanne K. Rowling\n', '\nHelden\nHarry Potter\n', '\nUitgeverij\nBloomsbury Libri\n']
from bs4 import BeautifulSoup
html = '''
<li class="Feature-item">
<span class="Feature-label"><span>Auteur</span></span>
<span class="Feature-desc"><span >Joanne K. Rowling</span></span>
</li>
<li class="Feature-item">
<span class="Feature-label"><span>Helden</span></span>
<span class="Feature-desc"><span ><a href="url">Harry Potter</a></span></span>
</li>
<li class="Feature-item">
<span class="Feature-label"><span>Uitgeverij</span></span>
<span class="Feature-desc"><span ><a href="url">Bloomsbury Libri</a></span></span>
</li>
'''
soup = BeautifulSoup(html, 'lxml')
li_list = soup.find_all('li', {'class':'Feature-item'})
data_dict = {li.span.text:li.text.split("\n")[2] for li in li_list}
print(data_dict)
# {'Auteur': 'Joanne K. Rowling', 'Uitgeverij': 'Bloomsbury Libri', 'Helden': 'Harry Potter'}