Question

我正在编写一个脚本来生成JSON文件，但遇到了一个问题。

import requests
from bs4 import BeautifulSoup

url = requests.get('https://www.perfectimprints.com/custom-promos/20492/Beach-Balls.html')
source = BeautifulSoup(url.text, 'html.parser')

product_feed = source.find('div', id_="pageBody")

products = product_feed.find_all('div', class_="product_wrapper")

single_product = products[0]

product_name = single_product.find('div', class_="product_name")
product_name = product_name.a.text

sku = single_product.find('div', class_="product_sku")
sku = sku.text

def get_product_details(product):
  product_name = product.find('div', class_="product_name").a.text
  sku = single_product.find('div', class_="product_sku").text
  return {
    "product_name": product_name,
    "sku": sku
  }

all_products = [get_product_details(product) for product in products]
print(all_products)

我得到的错误消息是：Traceback (most recent call last): File "scrape.py", line 9, in <module> products = product_feed.find_all('div', class_="product_wrapper") AttributeError: 'NoneType' object has no attribute 'find_all'

根据我的阅读，这是因为它没有在product_wrapper类中找到任何东西，但这没有任何意义。

Answer 1

问题是product_feed = source.find('h1', id_="pageBody")返回None。我尝试了您的代码，product_feed = source.find_all('h1')仅返回1个没有ID信息的项目。

Answer 2

看站点的源代码，id =“ pageBody”的元素是div，而不是h1。因此，当您使用source.find时，它将返回None。试试：

...
product_feed = source.find('div', id_="pageBody")
...

Answer 3

您不需要product_feed，将其删除并将下一行更改为：

products = source.find_all('div', class_="product_wrapper")

可以在末尾进行验证：print(len(all_products)) 48

使用BeautifulSoup时发生AttributeError

3 个答案: