当我尝试运行以下代码时,出现以下错误
这些ID存在productTitle
和priceblock_ourprice
错误
title= soup.find(id='productTitle').get_text(strip=True)
AttributeError: 'NoneType' object has no attribute 'get_text'
import requests
from bs4 import BeautifulSoup
url='https://www.amazon.com/Canon-PowerShot-SX420-Complete-Accessory/dp/B01D0PKF0Q/ref=sr_1_2?crid=H9FUF2YIZOLC&keywords=camera&qid=1578179990&sprefix=cam%2Caps%2C147&sr=8-2'
headers ={"User-Agent":'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763'}
page =requests.get(url,headers=headers)
soup=BeautifulSoup(page.content,'html.parser')
print(soup.prettify())
title= soup.find(id='productTitle').get_text(strip=True)
price=soup.find(id='priceblock_ourprice').get_text()
print(title)
print(price)
答案 0 :(得分:0)
您的代码正确,但是html.parser
不能很好地解析Amazon的HTML。更改为lxml
或html5lib
解析器,您将看到输出:
import requests
from bs4 import BeautifulSoup
url = 'https://www.amazon.com/Canon-PowerShot-SX420-Complete-Accessory/dp/B01D0PKF0Q/ref=sr_1_2?crid=H9FUF2YIZOLC&keywords=camera&qid=1578179990&sprefix=cam%2Caps%2C147&sr=8-2'
headers = {"User-Agent":'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763'}
page = requests.get(url,headers=headers)
soup = BeautifulSoup(page.content,'lxml') # <-- change to 'lxml' or 'html5lib'
title = soup.find(id='productTitle').get_text(strip=True)
price = soup.find(id='priceblock_ourprice').get_text()
print(title)
print(price)
打印:
Canon PowerShot SX420 IS Digital Camera (Black) with 20MP, 42x Optical Zoom, 720p HD Video & Built-In Wi-Fi + 64GB Card + Reader + Grip + Spare Battery and Charger + Tripod + Complete Accessory Bundle
$289.95