亚马逊网络抓取时在BS4中收到错误:AttributeError:'NoneType'对象没有属性'get_text'

时间:2020-02-18 17:20:08

标签: python web-scraping beautifulsoup

!pip install requests
!pip install bs4


import requests
from bs4 import BeautifulSoup

url = "https://www.amazon.in/Apple-iPhone-Pro-Max-256GB/dp/B07XVLH744/ref=sr_1_1_sspa?crid=2VCKZNOH3H6SR&keywords=apple+iphone+11+pro+max&qid=1582043410&sprefix=apple+iphone%2Caps%2C388&sr=8-1-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUEyVjdZSE83TzU4UUMmZW5jcnlwdGVkSWQ9QTAyNTI1ODZJUzZOVUwxWDNIUlAmZW5jcnlwdGVkQWRJZD1BMDkxNDg4MzFLMFpVT1M5OFM5Q0smd2lkZ2V0TmFtZT1zcF9hdGYmYWN0aW9uPWNsaWNrUmVkaXJlY3QmZG9Ob3RMb2dDbGljaz10cnVl"

headers = {"User-Agent": "in this section im adding my user agent after typing my user agent in google search"}

page = requests.get(url, headers=headers)

soup = BeautifulSoup(page.content, "html.parser")

print(soup.prettify()) 

title = soup.find(id = "productTitle").get_text()
price = soup.find(id = "priceblock_ourprice").get_text()

converted_price = price[0:8]

print(converted_price)
print(titles)

我运行此代码时正在使用Google colab,但出现此错误

AttributeError   Traceback (most recent call last)
<ipython-input-15-14696d9dc778> in <module>()
     16 print(soup.prettify())
     17 
---> 18 title = soup.find(id = "productTitle").get_text()
     19 price = soup.find(id = "priceblock_ourprice").get_text()
     20 

AttributeError: 'NoneType' object has no attribute 'get_text'

我尝试在整个Internet上进行搜索,但没有找到解决我的问题的答案。我正在尝试获取iPhone 11 Pro的最高价格。当我运行此代码时,出现上述错误。

4 个答案:

答案 0 :(得分:0)

  • soup.find(id = "productTitle")这将返回None,因为它无法找到id = "producTitle"。确保您正在搜索正确的元素。

  • 对于find语句,我建议始终写if条件来避免和处理此类错误。

title = soup.find(id = "productTitle")
if title:
    title = title.get_text()
else:
    title = "default_title"

price = soup.find(id = "priceblock_ourprice").get_text()
  • 您可以使用price做同样的事情。

答案 1 :(得分:0)

当您尝试从值为None的对象中提取数据时,会出现该错误。如果您在第18行看到此消息,则表明您的soup.find(id = "productTitle")不匹配任何内容,并返回None。

您需要将处理分为几个步骤。在访问返回值之前,请先检查返回值。所以...

title_info = soup.find(id = "productTitle")
if title_info:
    title = title_info.text
else:
    'handle the situation'

答案 2 :(得分:0)

好吧,我在这里测试了您的代码,它可以正常工作。但是,当您尝试在短时间内访问同一链接时,亚马逊会为您提供503代码...

class Word {
  int teste;

  Word({this.teste});

  Word.fromJson(Map<String, dynamic> json) {
    teste = json['teste'];
  }

  Map<String, dynamic> toJson() {
    final Map<String, dynamic> data = new Map<String, dynamic>();
    data['teste'] = this.teste;
    return data;
  }
}

请等待一段时间,然后再试一次,或者至少在请求之间进行较长时间的测试...

答案 3 :(得分:-1)

也尝试此代码

    title = soup.find(id="productTitle")
     if title:
       title = title.get_text()
     else:
       title = "default_title"
    price = soup.find(id="priceblock_ourprice")
      if price:
       price = price
      else:
       price = "default_title"

        # converted_price = price[0:8]
       convert = str(price)
       con = convert[-18:-11]

        print(con)
        print(title)

尝试使用其他IDE

使用repl.it = https://repl.it 创建一个新的repl并使用它