如何在scrapy中执行多个方法

时间:2017-08-17 08:08:55

标签: python python-3.x scrapy scrapy-spider

def parse(self,response):
  print("parse!!!!!!!!!!!!!!!!!!!")
  yield  scrapy.Request("http://xx.com", callback=self.parseHeader,meta={'item': item})
  yield  scrapy.Request("http://xx.com ", callback=self.parseBody,meta={'item': item})
  yield  scrapy.Request("http://xx.com ", callback=self.parseFooter,meta={'item': item})


def parseHeader(self,response):
  print("parseHeader!!!!!!!!!!!!!!!!!!!")
  item = ItemHeader()
  #...
  yield item

def parseBody(self,response):
  print("parseBody!!!!!!!!!!!!!!!!!!!")
  item = ItemBody()
  #...
  yield item

def parseFooter(self,response):
  print("parseFooter!!!!!!!!!!!!!!!!!!!")
  item = ItemFooter()
  #...
  yield item

执行上述代码会产生以下结果。 目前的结果

parse!!!!!!!!!!!!!!!!!!!
↓
parseHeader!!!!!!!!!!!!!!!!!!!
↓
pipeline
↓
Closing spider (finished)

将执行“parseHeader”的唯一方法 在它下面没有被执行 将收益率改为回报并不会改变结果。

我想将上述结果更改如下。

parse!!!!!!!!!!!!!!!!!!!
↓
parseHeader!!!!!!!!!!!!!!!!!!!
↓
pipeline
↓
parseBody!!!!!!!!!!!!!!!!!!!
↓
pipeline
↓
parseFooter!!!!!!!!!!!!!!!!!!!
↓
pipeline
↓
Closing spider (finished)

我怎么能这样做? 如果你知道一些暗示的东西,请告诉我吗?

1 个答案:

答案 0 :(得分:1)

如果你有一个响应,并想要从中解析多个东西,你可以将解析逻辑分成不同的方法,只需将它们称为返回项目的普通python方法:

def parse(self, response):
    yield scrapy.Request("http://xx.com", 
                         callback=self.parse_item, 
                         meta={'item': item})

def parse_item(self, response):
    # either return everything as one item:
    item = response.meta['item']
    item['header'] = self.parse_header(response)
    item['body'] = self.parse_body(response)
    item['footer'] = self.parse_footer(response)
    yield item
    # or as multiple items:
    yield self.parse_header(response)
    yield self.parse_body(response)
    yield self.parse_footer(response)

def parse_header(self, response):
    print("parseHeader!!!!!!!!!!!!!!!!!!!")
    item = ItemHeader()
    return item

def parse_body(self, response):
    print("parseBody!!!!!!!!!!!!!!!!!!!")
    item = ItemBody()
    return item

def parse_footer(self, response):
    print("parseFooter!!!!!!!!!!!!!!!!!!!")
    item = ItemFooter()
    return item