我正在尝试了解使用beautifulsoup4
和urllib
库从网站中提取特定数据的过程。
如果出现以下情况,如何从网站上获得DVD的特定价格:
<div class="productPrice" data-component="productPrice">
<p class="productPrice_price" data-product-price="price">£9.99 </p>
from bs4 import BeautifulSoup
from urllib.request import urlopen
html = urlopen("https://www.zavvi.com/dvd/rampage-includes-digital-download/11729469.html ")
bsObj = BeautifulSoup(html.read(), features='html.parser')
all_divs = bsObj.find_all('div', {'class':'productPrice'}) # 1. get all divs
寻找价格的剩余过程是什么?
网站(https://www.zavvi.com/dvd/rampage-includes-digital-download/11729469.html)
答案 0 :(得分:2)
您快到了,再走一步。您只需要遍历元素并找到带有类=“ productPrice_price”的<p>
标签,然后抓取文本:
from bs4 import BeautifulSoup
from urllib.request import urlopen
html = urlopen("https://www.zavvi.com/dvd/rampage-includes-digital-download/11729469.html ")
bsObj = BeautifulSoup(html.read(), features='html.parser')
all_divs = bsObj.find_all('div', {'class':'productPrice'}) # 1. get all divs
for ele in all_divs:
price = ele.find('p', {'class':'productPrice_price'}).text
print (price)
输出:
£9.99