我正在从Amazon抓取一些网页,并且遇到了此错误(标题中提到了)。
这是我的代码:
import requests
from bs4 import BeautifulSoup
import smtplib
URL = 'https://www.amazon.co.uk/UGREEN-Adapter-Samsung-Oneplus- Blackview/dp/B072V9CNTK/ref=sr_1_2_sspa?keywords=otg+cable&qid=1578610622&sr=8-2-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUEzRzRRUUdaR05RVlRJJmVuY3J5cHRlZElkPUEwNjExNjM4MVI4NVZaTFlYTlhGSCZlbmNyeXB0ZWRBZElkPUEwMjg1MTU0OEhROERWQTBSRFAzJndpZGdldE5hbWU9c3BfYXRmJmFjdGlvbj1jbGlja1JlZGlyZWN0JmRvTm90TG9nQ2xpY2s9dHJ1ZQ=='
headers = {
"User Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64 AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36'}
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
title = soup.find(id="productTitle").get_text()
price = soup.find(id="priceblock_ourprice").get_text()
converted_price = float(price[0:3])
def check_price():
print(soup.find(id="priceblock_ourprice").get_text())
converted_price = float(price[0:3])
if(converted_price < 7.00):
send_mail()
答案 0 :(得分:1)
这是因为该页面是使用javascript动态加载的。您可以使用selenium获取网站的html代码,如下所示:
from selenium import webdriver
URL = 'https://www.amazon.co.uk/UGREEN-Adapter-Samsung-Oneplus- Blackview/dp/B072V9CNTK/ref=sr_1_2_sspa?keywords=otg+cable&qid=1578610622&sr=8-2-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUEzRzRRUUdaR05RVlRJJmVuY3J5cHRlZElkPUEwNjExNjM4MVI4NVZaTFlYTlhGSCZlbmNyeXB0ZWRBZElkPUEwMjg1MTU0OEhROERWQTBSRFAzJndpZGdldE5hbWU9c3BfYXRmJmFjdGlvbj1jbGlja1JlZGlyZWN0JmRvTm90TG9nQ2xpY2s9dHJ1ZQ=='
driver = webdriver.Chrome()
driver.get(URL)
time.sleep(5)
page = driver.page_source
driver.close()
因此,这是完整的代码:
from bs4 import BeautifulSoup
from selenium import webdriver
import time
URL = 'https://www.amazon.co.uk/UGREEN-Adapter-Samsung-Oneplus- Blackview/dp/B072V9CNTK/ref=sr_1_2_sspa?keywords=otg+cable&qid=1578610622&sr=8-2-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUEzRzRRUUdaR05RVlRJJmVuY3J5cHRlZElkPUEwNjExNjM4MVI4NVZaTFlYTlhGSCZlbmNyeXB0ZWRBZElkPUEwMjg1MTU0OEhROERWQTBSRFAzJndpZGdldE5hbWU9c3BfYXRmJmFjdGlvbj1jbGlja1JlZGlyZWN0JmRvTm90TG9nQ2xpY2s9dHJ1ZQ=='
driver = webdriver.Chrome()
driver.get(URL)
time.sleep(5)
page = driver.page_source
driver.close()
soup = BeautifulSoup(page, 'html5lib')
title = soup.find(id="productTitle")
price = soup.find(id="priceblock_ourprice")
print(soup.find(id="priceblock_ourprice").get_text())
输出:
£6.99