蜘蛛错误处理URL

时间:2018-02-25 12:38:39

标签: python python-2.7 scrapy scrapy-spider

使用scrapy 1.5.0,python 2.7.14处理URL时遇到错误。

class FootLockerSpider(Spider):

name = "FootLockerSpider"
allowded_domains = ["footlocker.it"]
start_urls = [FootLockerURL]

def __init__(self):
    logging.critical("FootLockerSpider STARTED.")

def parse(self, response):
    products = Selector(response).xpath('//div[@class="fl-category--productlist"]')

    for product in products:
        item = FootLockerItem()
        item['name'] = product.xpath('.//a/span[@class="fl-product-tile--name"]/span').extract()[0]
        item['link'] = product.xpath('.//a/@href').extract()[0]
        # item['image'] = product.xpath('.//div/a/div/img/@data-original').extract()[0]
        # item['size'] = '**NOT SUPPORTED YET**'
        yield item

    yield Request(FootLockerURL, callback=self.parse, dont_filter=True, priority=14)

这是我的FootLockerSpider类,这是我得到的错误:

[scrapy.core.scraper] ERROR: Spider error processing <GET 
https://www.footlocker.it/it/uomo/scarpe/> (referer: None)
File "C:\Users\Traian\Downloads\Sneaker-Notify\main\main.py", line 484, in 
parse item['name'] = product.xpath('.//a/span[@class="fl-product-tile--
name"]/span').extract()[0]
IndexError: list index out of range

我该如何解决这个问题?

1 个答案:

答案 0 :(得分:1)

您需要始终检查源HTML:

<div class="fl-category--productlist--item" data-category-item><div class="fl-load-animation fl-product-tile--container"
data-lazyloading 
data-lazyloading-success-handler="lazyloadingInit" 
data-lazyloading-context="product-tile" 
data-lazyloading-content-handler="lazyloadingJSONContentHandler"
data-request="https://www.footlocker.it/INTERSHOP/web/WFS/Footlocker-Footlocker_IT-Site/it_IT/-/EUR/ViewProductTile-ProductTileJSON?BaseSKU=314213410104&ShowRating=true&ShowQuickBuy=true&ShowOverlay=true&ShowBadge=true"
data-scroll-to-target="fl-product-tile-314213410104"
>
<noscript> 
 <a href="https://www.footlocker.it/it/p/nike-air-max-97-ultra-17-uomo-scarpe-46994?v=314213410104"><span itemprop="name">Nike Air Max 97 Ultra '17 - Uomo Scarpe</span></a>
</noscript>
</div>
</div>

这将有效:

products = response.xpath('//div[@class="fl-category--productlist--item"]')

for product in products:
    item = FootLockerItem()
    item['name'] = product.xpath('.//a/span/text()').extract_first()
    item['link'] = product.xpath('.//a/@href').extract_first()
    yield item