Scrapy IdentationError:预期一个已识别的块

时间:2018-09-01 07:00:49

标签: web-scraping scrapy scrapy-spider

相信您一切都好。请获得您的帮助,我正在获取此错误,但我不知道为什么:

    File "C:\Users\Luis\Amazon\mercado\spiders\spider.py", line 14
yield scrapy.Request("https://www.amazon.es/s/ref=sr_pg_2?rh=n%3A1951051031%2Cn%3A2424922031%2Ck%3Afebi&page=1&keywords=febi&ie=UTF8&qid=1535314254",self.parse_item)
    ^IndentationError: expected an indented block

# -*- coding: utf-8 -*-
import scrapy
import urllib
from mercado.items import MercadoItem


class MercadoSpider(CrawlSpider):
    name = 'mercado'
    item_count = 0
    allowed_domain = ['https://www.amazon.es']
    start_urls = ['https://www.amazon.es/s/ref=sr_pg_2rh=n%3A1951051031%2Cn%3A2424922031%2Ck%3Afebi&page=1&keywords=febi&ie=UTF8&qid=1 535314254']

    def start_requests(self):
        yield scrapy.Request("https://www.amazon.es/s/ref=sr_pg_2?rh=n%3A1951051031%2Cn%3A2424922031%2Ck%3Afebi&page=1&keywords=febi&ie=UTF8&qid=1535314254",self.parse_item)

        for i in range(2,400):
            yield scrapy.Request("https://www.amazon.es/s/ref=sr_pg_2?rh=n%3A1951051031%2Cn%3A2424922031%2Ck%3Afebi&page="+str(i)+"&keywords=febi&ie=UTF8&qid=1535314254",self.parse_item)


    def parse_item(self, response):
        ml_item = MercadoItem()

        #info de producto
        ml_item['articulo'] = response.xpath('normalize-space(//*[@id="productTitle"])').extract()
        ml_item['precio'] = response.xpath('normalize-space(//*[@id="priceblock_ourprice"])').extract()
        self.item_count += 1
        yield ml_item

你知道为什么吗? 我已经在这里添加了代码,可以轻松做到这一点。

1 个答案:

答案 0 :(得分:1)

您有缩进错误:

# -*- coding: utf-8 -*-
import scrapy
import urllib
from mercado.items import MercadoItem


class MercadoSpider(CrawlSpider):
    name = 'mercado'
    item_count = 0
    allowed_domain = ['https://www.amazon.es']
    start_urls = ['https://www.amazon.es/s/ref=sr_pg_2rh=n%3A1951051031%2Cn%3A2424922031%2Ck%3Afebi&page=1&keywords=febi&ie=UTF8&qid=1 535314254']

    def start_requests(self):
        yield scrapy.Request("https://www.amazon.es/s/ref=sr_pg_2?rh=n%3A1951051031%2Cn%3A2424922031%2Ck%3Afebi&page=1&keywords=febi&ie=UTF8&qid=1535314254",self.parse_item)

        for i in range(2,400):
            yield scrapy.Request("https://www.amazon.es/s/ref=sr_pg_2?rh=n%3A1951051031%2Cn%3A2424922031%2Ck%3Afebi&page="+str(i)+"&keywords=febi&ie=UTF8&qid=1535314254",self.parse_item)


    def parse_item(self, response):
        ml_item = MercadoItem()

        #info de producto
        ml_item['articulo'] = response.xpath('normalize-space(//*[@id="productTitle"])').extract()
        ml_item['precio'] = response.xpath('normalize-space(//*[@id="priceblock_ourprice"])').extract()
        self.item_count += 1
        yield ml_item   

更新,但现在您有了代码(不是最佳代码)来获取分页和解析详细信息页面。您需要添加代码以解析每个分页并获取每个项目的详细信息链接:

def start_requests(self):
    yield scrapy.Request("https://www.amazon.es/s/ref=sr_pg_2?rh=n%3A1951051031%2Cn%3A2424922031%2Ck%3Afebi&page=1&keywords=febi&ie=UTF8&qid=1535314254",self.parse_search)

    for i in range(2,400):
        yield scrapy.Request("https://www.amazon.es/s/ref=sr_pg_2?rh=n%3A1951051031%2Cn%3A2424922031%2Ck%3Afebi&page="+str(i)+"&keywords=febi&ie=UTF8&qid=1535314254",self.parse_search)

def parse_search(self, response):

    for item_link in response.xpath('//ul[@id="s-results-list-atf"]//a[contains(@class, "s-access-detail-page")]/@href').extract():
        yield scrapy.Request(item_link, self.parse_item)

def parse_item(self, response):
    ml_item = MercadoItem()

    #info de producto
    ml_item['articulo'] = response.xpath('normalize-space(//*[@id="productTitle"])').extract()
    ml_item['precio'] = response.xpath('normalize-space(//*[@id="priceblock_ourprice"])').extract()
    self.item_count += 1
    yield ml_item