抓取Python脚本返回None

时间:2019-07-05 12:40:34

标签: beautifulsoup python-requests python-3.7

im试图从亚马逊抓取数据,特别是产品标题,但运行我的脚本只会返回None

import requests
from bs4 import BeautifulSoup

URL = 'https://www.amazon.com/Dell-Inspiron-5570-Touchscreen-Laptop/dp/B07FKRFTYW/ref=sxbs_sxwds-deals?keywords=laptops&pd_rd_i=B07FKRFTYW&pd_rd_r=38a464f1-5fc2-4e1e-91a3-c209f68e2b8c&pd_rd_w=IbLEX&pd_rd_wg=l5Ewu&pf_rd_p=8ea1b18a-72f9-4e02-9dad-007df8eca556&pf_rd_r=SWJJFWF3WM0ZQZGMN8XA&qid=1562328911&s=computers-intl-ship&smid=A19N59FKNWHX7C'

headers = {"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/75.0.3770.100 Safari/537.36' }


page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')

title = soup.find(id="productTitle")

print(title)

预期结果应为包含产品标题的div,但输出为None

3 个答案:

答案 0 :(得分:0)

更改解析器:

DEFAULT_SERVER_DATETIME_FORMAT

您还可以从元标记之一的import requests from bs4 import BeautifulSoup URL = 'https://www.amazon.com/Dell-Inspiron-5570-Touchscreen-Laptop/dp/B07FKRFTYW/ref=sxbs_sxwds-deals?keywords=laptops&pd_rd_i=B07FKRFTYW&pd_rd_r=38a464f1-5fc2-4e1e-91a3-c209f68e2b8c&pd_rd_w=IbLEX&pd_rd_wg=l5Ewu&pf_rd_p=8ea1b18a-72f9-4e02-9dad-007df8eca556&pf_rd_r=SWJJFWF3WM0ZQZGMN8XA&qid=1562328911&s=computers-intl-ship&smid=A19N59FKNWHX7C' headers = {"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/75.0.3770.100 Safari/537.36' } page = requests.get(URL, headers=headers) soup = BeautifulSoup(page.content, 'lxml') title = soup.find(id="productTitle") print(title.text) 属性中提取

content

答案 1 :(得分:0)

您应该先安装lxml(如果尚未安装),则可以使用以下pip命令进行安装:

pip install lxml

一旦安装,请替换:

soup = BeautifulSoup(page.content, 'html.parser') 
title = soup.find(id="productTitle")

print(title)

具有:

soup = BeautifulSoup(page.content, 'lxml')    
title = soup.find(id = "productTitle")

print(title.getText().strip())

希望这会有所帮助

答案 2 :(得分:0)

我无法发表评论,但我想就@Fozoro所说的内容发表评论,以防将来有人遇到与我相同的问题。执行pip install lxml成功运行,但是当我尝试将其用作应用程序的解析器时,它仍然给我有关找不到所需功能的错误。但是,请执行以下操作: python3 -m pip install lxml允许我使用lxml解析器。