当我在Scrapy Spider中使用规则时,它为以下函数提供了无效语法错误

时间:2017-06-23 15:00:53

标签: python-3.x scrapy scrapy-spider

我在scrapy,python 3.0中构建这个蜘蛛。问题是每当我使用规则时,它会为def parse_productPage提供错误“无效语法”。当我删除规则时它不会抱怨并且工作正常。我找不到代码有什么问题。你能帮我么。这是代码

import scrapy
from quo.items import QuoItem
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule

class ISpider(CrawlSpider):
    name='iShopE'
    allowed_domains = ['ishopping.pk']
    start_urls = ['https://www.ishopping.pk/electronics/home-theatres.html']
    rules = (
                Rule(LinkExtractor(restrict_xpaths=('//div["category-products-"]'), follow=True),
                Rule(LinkExtractor(restrict_xpaths=('//h2[@class="product-name"]/a/@href'), callback='parse_productPage'),
    )


    def parse_productPage(self,response):
      for rev in response.xpath('//div["product-essential"]'):
        item=QuoItem()
        price=response.xpath('//div[@class="price-box"]/span[@class="regular-price"]/meta[@itemprop="price"]/@content').extract()
        if price:
            item['price']=price
        Availability=response.xpath('//p[@class="availability in-stock"]/span[@class="value"]/text()').extract()
        if Availability:
            item['Availability']=Availability
        Brand=response.xpath('(//div[@class="box-p-attr"]/span)[1]/text()').extract()
        if Brand:
            item['Brand']=Brand
        deliveryTime=response.xpath('(//div[@class="box-p-attr"]/span)[2]/text()').extract()
        if deliveryTime:
            item['deliveryTime']=deliveryTime
        Waranty=response.xpath('(//div[@class="box-p-attr"]/span)[3]/text()').extract()
        if Waranty:
            item['Waranty']=Waranty

        yield item

这是输出日志Output log

1 个答案:

答案 0 :(得分:0)

与错误消息显示的不同,问题实际上在以前的行中:

rules = (
    Rule(LinkExtractor(restrict_xpaths=('//div["category-products-"]'), follow=True),
    Rule(LinkExtractor(restrict_xpaths=('//h2[@class="product-name"]/a/@href'), callback='parse_productPage'),
)

如果你仔细地计算括号,你可以看到有三个开括号但每个规则只有两个关闭:

Rule(
    LinkExtractor(
        restrict_xpaths=('//div["category-products-"]'),
        follow=True
    )

所以要解决这个问题,只需为每个Rule添加一个右括号:

rules = (
    Rule(LinkExtractor(restrict_xpaths=('//div["category-products-"]'), follow=True)),
    Rule(LinkExtractor(restrict_xpaths=('//h2[@class="product-name"]/a/@href'), callback='parse_productPage')),
)