如何使用Beautifulsoup从网站上抓取产品价格?

时间:2020-09-21 20:32:43

标签: python web-scraping beautifulsoup screen-scraping

全部。我正在尝试从StockX提取此运动鞋的最新出价,但是由于某些原因,sneaker_price变成空白,因此出现错误“ IndexError:列表索引超出范围”。谁能帮忙吗?:

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
driver.get("https://stockx.com/air-jordan-6-retro-travis-scott")

soup = BeautifulSoup(driver.page_source,"lxml")
driver.quit()

sneaker_price = soup.select("div.en-us stat-value stat-small")[0]

2 个答案:

答案 0 :(得分:0)

尝试CSS选择器div.en-us.stat-value.stat-small

sneaker_price = soup.select("div.en-us.stat-value.stat-small")[0]
print(sneaker_price.text)

打印:

€523

注意:如果您获得验证码页面,请尝试指定更多HTTP标头和/或Cookies。例如:

import requests
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0',
    'Accept-Language': 'en-US,en;q=0.5'
}

cookies = {
    'stockx_homepage': "sneakers",
}

soup = BeautifulSoup(requests.get("https://stockx.com/air-jordan-6-retro-travis-scott", headers=headers, cookies=cookies).content,"lxml")

sneaker_price = soup.select("div.en-us.stat-value.stat-small")[0]
print(sneaker_price.text)

答案 1 :(得分:0)

您可以使用.ask div.en-us.stat-value.stat-small选择器来获取包含最新卖出价的div。并且由于有多个元素,您可以选择最后一个元素,例如:

ask_price = soup.select('.ask div.en-us.stat-value.stat-small')[-1]
print(ask_price.text) # $655