Question

背景

我正在尝试了解使用beautifulsoup4和urllib库从网站中提取特定数据的过程。

如果出现以下情况，如何从网站上获得DVD的特定价格：

div类为<div class="productPrice" data-component="productPrice">
p类为<p class="productPrice_price" data-product-price="price">£9.99 </p>

到目前为止的代码：

from bs4 import BeautifulSoup
from urllib.request import urlopen
html = urlopen("https://www.zavvi.com/dvd/rampage-includes-digital-download/11729469.html ")
bsObj = BeautifulSoup(html.read(), features='html.parser')

all_divs = bsObj.find_all('div', {'class':'productPrice'}) # 1. get all divs

寻找价格的剩余过程是什么？

网站（https://www.zavvi.com/dvd/rampage-includes-digital-download/11729469.html）

Answer 1

您快到了，再走一步。您只需要遍历元素并找到带有类=“ productPrice_price”的<p>标签，然后抓取文本：

from bs4 import BeautifulSoup
from urllib.request import urlopen
html = urlopen("https://www.zavvi.com/dvd/rampage-includes-digital-download/11729469.html ")
bsObj = BeautifulSoup(html.read(), features='html.parser')

all_divs = bsObj.find_all('div', {'class':'productPrice'}) # 1. get all divs 

for ele in all_divs:
    price = ele.find('p', {'class':'productPrice_price'}).text
    print (price)

输出：

£9.99

BeautifulSoup和urllib从网站查找数据

背景

到目前为止的代码：

1 个答案: