如何避免在 BeautifulSoup 中出现“TypeError: 'NoneType' object is not subscriptable”?

时间:2021-01-20 00:14:22

标签: python web-scraping

我正在尝试从此 page 上的每个产品中提取图片网址,但收到以下错误:

<块引用>
Traceback (most recent call last):
    File "D:\Documentos\ZalandoDiscountGen-main\Zalando discout gen\scrapersnipes.py", line 98, in 
    <module>
    scraper()
    File "D:\Documentos\ZalandoDiscountGen-main\Zalando discout gen\scrapersnipes.py", line 92, in 
    scraper
    imagen = producto.find("img", {"class": "b-dynamic_image_content b-product-tile-image ls-is-cached h- 
    lazyloaded"})['src']
    TypeError: 'NoneType' object is not subscriptable

我尝试过的代码:

from bs4 import BeautifulSoup
from dhooks import Webhook, Embed
import requests
import pandas as pd
import time, datetime
import random
import numpy as np
import os


headers = {
    'authority': 'www.snipes.es',
    'cache-control': 'max-age=0',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0',
    }
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'document',
'accept-language': 'es-ES,es;q=0.9,en;q=0.8,de;q=0.7,eo;q=0.6',
'dnt': '1',
}


def scraper():

    response = requests.get("https://www.snipes.es/c/shoes?q=jordan%2B1&openCategory=true&sz=all&srule=New", headers=headers)
    soup = BeautifulSoup(response.content, 'html.parser')
    listadoproductos = soup.find_all('div', {'class': 'b-product-grid-tile js-tile-container'})
    for producto in listadoproductos:
        marca = producto.find("span", {"class":"b-product-tile-brand b-product-tile-text js-product-tile-link"}).text
        titulo = producto.find("span", {"class":"b-product-tile-link js-product-tile-link"}).text
        precio = producto.find("span", {"class":"b-product-tile-price-item"}).text
        imagen = producto.find("img", {"class": "b-dynamic_image_content b-product-tile-image ls-is-cached h-lazyloaded"})['src']
        imagen2 = "https://www.snipes.es" + str(imagen)
        print (marca.strip(), titulo.strip(), precio.strip(), imagen2)
    


scraper()

无法弄清楚出了什么问题,很高兴提示从哪里开始。

1 个答案:

答案 0 :(得分:0)

会发生什么?

您尝试找到具有多个类的 <img>,这种方法行不通且没有必要。

认为您也不会拥有 src 因为它是一个空白的 png,您可能想要的是 data-src

如何解决这个问题?

将您尝试查找图像的行更改为以下内容:

imagen = producto.select_one('div.b-product-tile-image-container img')['data-src']

也跳过 imagen2,你不需要它:

for producto in listadoproductos:
    marca = producto.find("span", {"class":"b-product-tile-brand b-product-tile-text js-product-tile-link"}).text
    titulo = producto.find("span", {"class":"b-product-tile-link js-product-tile-link"}).text
    precio = producto.find("span", {"class":"b-product-tile-price-item"}).text
    imagen = producto.select_one('div.b-product-tile-image-container img')['data-src']
    print (marca.strip(), titulo.strip(), precio.strip(), imagen)

输出

<块引用>

JORDAN WMNS Zoom '92 149,99 € https://www.snipes.es/dw/image/v2/BDCB_PRD/on/demandware.static/-/Sites-snse-master-eu/default/dw1986ce4d/1899597_P.jpg?sw=300&sh=300&sm=fit&sfrm=png JORDAN Air Jordan 1 Mid (PS) 64,99 € https://www.snipes.es/dw/image/v2/BDCB_PRD/on/demandware.static/-/Sites-snse-master-eu/default/dwe2e88c0b/1930682_P.jpg?sw=300&sh=300&sm=fit&sfrm=png JORDAN Air Jordan 11 Crib Bootie 59,99 € https://www.snipes.es/dw/image/v2/BDCB_PRD/on/demandware.static/-/Sites-snse-master-eu/default/dw2dd01aa4/1883653_P.jpg?sw=300&sh=300&sm=fit&sfrm=png 约旦 Jordan Air Max 200 129,99 € https://www.snipes.es/dw/image/v2/BDCB_PRD/on/demandware.static/-/Sites-snse-master-eu/default/dw21a7bda8/1829411_P.jpg?sw=300&sh=300&sm=fit&sfrm=png