AttributeError:'NoneType'对象没有属性'get_text'python网络抓取

时间:2020-02-09 17:59:27

标签: python python-3.x web web-scraping attributeerror

我正在遵循本教程,即使我正确完成了所有操作,也遇到了此错误。这是教程链接https://www.youtube.com/watch?v=Bg9r_yLk7VY&t=241s,下面是我的代码

import requests
from bs4 import BeautifulSoup

URL = 'https://www.amazon.com/-/de/dp/B07RF1XD36/ref=lp_16225007011_1_6?s=computers-intl-ship&ie=UTF8&qid=1581249551&sr=1-6'

headers ={"User-Agent": 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36'}

page = requests.get(URL, headers=headers)

soup = BeautifulSoup(page.content, 'html.parser')

title = soup.find(id="productTitle").get_text()

print(title.strip())

这是我在运行代码时收到的错误消息

Traceback (most recent call last):
  File "scraper.py", line 26, in <module>
    title = soup.find(id="productTitle").get_text()
AttributeError: 'NoneType' object has no attribute 'get_text'

1 个答案:

答案 0 :(得分:3)

要从该页面获取产品标题,只需将解析器从html.parser更改为html5liblxml。后两个具有修复某些拙劣的html元素的功能,在这种情况下,这些元素不允许您解析标题。我还在脚本中实现了随机用户代理,以使其更健壮。

工作代码:

import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent

ua = UserAgent()

URL = 'https://www.amazon.com/-/de/dp/B07RF1XD36/ref=lp_16225007011_1_6?s=computers-intl-ship&ie=UTF8&qid=1581249551&sr=1-6'

page = requests.get(URL, headers={'User-Agent':ua.random})
soup = BeautifulSoup(page.text, 'html5lib')
title = soup.find(id="productTitle").get_text(strip=True)
print(title)