如何为某个特定商品刮价?
在html中,有多个class="pb-current-price"
的div,但是,我只对$2,299.99
的价格感兴趣。我该怎么做?
谢谢。
<div class="pb pb-large-view pb-theme-default">
<div class="pb-current-price ">
<span class="">
$2,299.99
</span>
</div>
</div>
import requests
import bs4 as bs
from lxml import html
url = ""
agent = {"User-Agent":""}
url_get = requests.get(url,headers=agent) #, cookies=cookies)
tree = html.fromstring(url_get.content)
prices = tree.xpath('//div[@class="pb-sale-price "]/span/text()')
print(prices)
运行以上代码将返回价格为[]
。
答案 0 :(得分:0)
Ciao
我正在处理您的代码。摘录前的几件事:
1)您正在搜索"pb-sale-price "
而不是"pb-current-price "
2)如评论所述,我无法使用您的html页面,因此我根据您提供给我们的html代码段模拟了答案
3)为了完整起见,我还模拟了另一篇文章
现在输入代码:
import requests
import bs4 as bs
from lxml import html
# simulating the html answer
string="""
<div class="pb pb-large-view pb-theme-default">
<div class="pb-current-price ">
<span class="">
$2,299.99
</span>
</div>
</div>
<div class="pb pb-large-view pb-theme-default">
<div class="pb-current-price ">
<span class="">
$799.99
</span>
</div>
</div>
"""
url = "https://www.bestbuy.com/site/lg-65-class-oled-b9-series-2160p-smart-4k-uhd-tv-with-hdr/6360611.p?skuId=6360611"
agent = {"User-Agent":'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36'}
# cookies = {"cookie":"COPY_HERE_YOUR_COOKIE_FROM_BROWSER"}
#url_get = requests.get(url,headers=agent) #, cookies=cookies)
#tree = html.fromstring(url_get.content)
tree = html.fromstring(string)
#print(html.tostring(tree).decode("utf-8"))
prices = tree.xpath('//div[@class="pb-current-price "]/span/text()')
# output cleaning
prices = [x.strip(' ,\n') for x in prices]
print(prices)
输出
['$2,299.99', '$799.99']
希望有帮助,
安东尼诺
PS-我强烈建议您也阅读this beautiful article
答案 1 :(得分:0)
您显示的价格是正常价格。您可以按照以下方式从其中一个脚本标签中获取
import requests, json, re
headers = {'User-Agent':'Mozilla/5.0'}
r = requests.get('https://www.bestbuy.com/site/lg-65-class-oled-b9-series-2160p-smart-4k-uhd-tv-with-hdr/6360611.p?skuId=6360611&intl=nosplash', headers = headers)
p = re.compile(r'regularPrice\\":([\d.]+),')
price = p.findall(r.text)[0]
print(price)