使用scrapy 1.5.0,python 2.7.14处理URL时遇到错误。
class FootLockerSpider(Spider):
name = "FootLockerSpider"
allowded_domains = ["footlocker.it"]
start_urls = [FootLockerURL]
def __init__(self):
logging.critical("FootLockerSpider STARTED.")
def parse(self, response):
products = Selector(response).xpath('//div[@class="fl-category--productlist"]')
for product in products:
item = FootLockerItem()
item['name'] = product.xpath('.//a/span[@class="fl-product-tile--name"]/span').extract()[0]
item['link'] = product.xpath('.//a/@href').extract()[0]
# item['image'] = product.xpath('.//div/a/div/img/@data-original').extract()[0]
# item['size'] = '**NOT SUPPORTED YET**'
yield item
yield Request(FootLockerURL, callback=self.parse, dont_filter=True, priority=14)
这是我的FootLockerSpider类,这是我得到的错误:
[scrapy.core.scraper] ERROR: Spider error processing <GET
https://www.footlocker.it/it/uomo/scarpe/> (referer: None)
File "C:\Users\Traian\Downloads\Sneaker-Notify\main\main.py", line 484, in
parse item['name'] = product.xpath('.//a/span[@class="fl-product-tile--
name"]/span').extract()[0]
IndexError: list index out of range
我该如何解决这个问题?
答案 0 :(得分:1)
您需要始终检查源HTML:
<div class="fl-category--productlist--item" data-category-item><div class="fl-load-animation fl-product-tile--container"
data-lazyloading
data-lazyloading-success-handler="lazyloadingInit"
data-lazyloading-context="product-tile"
data-lazyloading-content-handler="lazyloadingJSONContentHandler"
data-request="https://www.footlocker.it/INTERSHOP/web/WFS/Footlocker-Footlocker_IT-Site/it_IT/-/EUR/ViewProductTile-ProductTileJSON?BaseSKU=314213410104&ShowRating=true&ShowQuickBuy=true&ShowOverlay=true&ShowBadge=true"
data-scroll-to-target="fl-product-tile-314213410104"
>
<noscript>
<a href="https://www.footlocker.it/it/p/nike-air-max-97-ultra-17-uomo-scarpe-46994?v=314213410104"><span itemprop="name">Nike Air Max 97 Ultra '17 - Uomo Scarpe</span></a>
</noscript>
</div>
</div>
这将有效:
products = response.xpath('//div[@class="fl-category--productlist--item"]')
for product in products:
item = FootLockerItem()
item['name'] = product.xpath('.//a/span/text()').extract_first()
item['link'] = product.xpath('.//a/@href').extract_first()
yield item