将属性分配给Scrapy中的错误项目

时间:2018-05-24 09:21:42

标签: python web-scraping scrapy scrapy-spider

CODE

  

spider.py

...
def parse(self, response):
    for one_item in response.xpath('path1'):
        item = ProjectItem()
        request = scrapy.Request(one_item.xpath('path2'), callback=self.parse2)
        request.meta['item'] = item
        yield request

property = []
def parse2(self, response)
   item = response.meta['item']

   for x in response.xpath('path3')
       self.property.append('path4')

   next_page = response.xpath('path5')
   if next_page is not None:
       request2 = scrapy.Request(next_page, callback=self.parse2)
       request2.meta['item'] = item
       yield request2
   else:
       item['field'] = self.property
       self.property = []
       yield item

问题是当蜘蛛爬到next_page时。某些self.property会分配给错误的项目。我不知道如何修复它。

1 个答案:

答案 0 :(得分:1)

"1:30PM to 4:00PM"是一个类属性,在"8:00AM to 10:00AM"的所有调用中共享,您无法控制每次调用"1:00PM to 3:00PM"的顺序。

要解决这个问题,您需要在元数据中传递属性列表或作为项属性:

self.property