BeautifulSoup和urllib从网站查找数据

时间:2019-01-18 14:09:13

标签: python beautifulsoup urllib

背景

我正在尝试了解使用beautifulsoup4urllib库从网站中提取特定数据的过程。

如果出现以下情况,如何从网站上获得DVD的特定价格:

  • div类为<div class="productPrice" data-component="productPrice">
  • p类为<p class="productPrice_price" data-product-price="price">£9.99 </p>

到目前为止的代码:

from bs4 import BeautifulSoup
from urllib.request import urlopen
html = urlopen("https://www.zavvi.com/dvd/rampage-includes-digital-download/11729469.html ")
bsObj = BeautifulSoup(html.read(), features='html.parser')

all_divs = bsObj.find_all('div', {'class':'productPrice'}) # 1. get all divs 

寻找价格的剩余过程是什么?

网站(https://www.zavvi.com/dvd/rampage-includes-digital-download/11729469.html

1 个答案:

答案 0 :(得分:2)

您快到了,再走一步。您只需要遍历元素并找到带有类=“ productPrice_price”的<p>标签,然后抓取文本:

from bs4 import BeautifulSoup
from urllib.request import urlopen
html = urlopen("https://www.zavvi.com/dvd/rampage-includes-digital-download/11729469.html ")
bsObj = BeautifulSoup(html.read(), features='html.parser')

all_divs = bsObj.find_all('div', {'class':'productPrice'}) # 1. get all divs 

for ele in all_divs:
    price = ele.find('p', {'class':'productPrice_price'}).text
    print (price)

输出:

£9.99