Python bs4 div类具有相同的名称问题

时间:2020-05-13 03:45:38

标签: python beautifulsoup

不同的div类的网站名称相同,如下所示:

<div class="pi-item pi-data pi-item-spacing pi-border-color" data-source="boggart">
<div class="pi-item pi-data pi-item-spacing pi-border-color" data-source="wand">
<div class="pi-item pi-data pi-item-spacing pi-border-color" data-source="patronus">
for article in soup.find_all('article'):
    blood_status = article.find('div', class_='pi-item pi-data pi-item-spacing pi-border-color')

因此,当我运行这段代码时,我只会得到第一个div类。我的问题是,我怎样才能只获得第三个div类?

URL:https://harrypotter.fandom.com/wiki/Ronald_Weasley

所以我想选择boggart div类并获得“ Spider”作为回报。

2 个答案:

答案 0 :(得分:0)

import requests
from bs4 import BeautifulSoup


def main(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.content, 'html.parser')
    target = soup.find(text="Boggart").find_next("a").text
    print(target)


main("https://harrypotter.fandom.com/wiki/Ronald_Weasley")

输出:

Spiders

答案 1 :(得分:0)

这只是一个示例,供您参考。

from simplified_scrapy import SimplifiedDoc,req,utils

html = '''
<article>
<div class="pi-item pi-data pi-item-spacing pi-border-color" data-source="boggart"></div>
<div class="pi-item pi-data pi-item-spacing pi-border-color" data-source="wand"></div>
<div class="pi-item pi-data pi-item-spacing pi-border-color" data-source="patronus"></div>
</article>
'''
doc = SimplifiedDoc(html)
articles = doc.selects('article')
for article in articles:
  print(article.select('div@data-source=patronus'))

结果:

{'class': 'pi-item pi-data pi-item-spacing pi-border-color', 'data-source': 'patronus', 'tag': 'div', 'html': ''}