不同的div类的网站名称相同,如下所示:
<div class="pi-item pi-data pi-item-spacing pi-border-color" data-source="boggart">
<div class="pi-item pi-data pi-item-spacing pi-border-color" data-source="wand">
<div class="pi-item pi-data pi-item-spacing pi-border-color" data-source="patronus">
for article in soup.find_all('article'):
blood_status = article.find('div', class_='pi-item pi-data pi-item-spacing pi-border-color')
因此,当我运行这段代码时,我只会得到第一个div类。我的问题是,我怎样才能只获得第三个div类?
URL:https://harrypotter.fandom.com/wiki/Ronald_Weasley。
所以我想选择boggart div类并获得“ Spider”作为回报。
答案 0 :(得分:0)
import requests
from bs4 import BeautifulSoup
def main(url):
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
target = soup.find(text="Boggart").find_next("a").text
print(target)
main("https://harrypotter.fandom.com/wiki/Ronald_Weasley")
输出:
Spiders
答案 1 :(得分:0)
这只是一个示例,供您参考。
from simplified_scrapy import SimplifiedDoc,req,utils
html = '''
<article>
<div class="pi-item pi-data pi-item-spacing pi-border-color" data-source="boggart"></div>
<div class="pi-item pi-data pi-item-spacing pi-border-color" data-source="wand"></div>
<div class="pi-item pi-data pi-item-spacing pi-border-color" data-source="patronus"></div>
</article>
'''
doc = SimplifiedDoc(html)
articles = doc.selects('article')
for article in articles:
print(article.select('div@data-source=patronus'))
结果:
{'class': 'pi-item pi-data pi-item-spacing pi-border-color', 'data-source': 'patronus', 'tag': 'div', 'html': ''}