如何在scrapy功能中删除项目名称?

时间:2019-05-08 15:33:48

标签: python scrapy

运行编码时,它会给我商品价格,但商品名称相同。表示先给出transcription_price: 245,然后给出transcription_price: 240。应该有caption_pricetranscription_price。为什么以及如何解决这个问题?

import scrapy
from .. items import FetchingItem
import re

class SiteFetching(scrapy.Spider):
name = 'Site'
start_urls = ['https://www.rev.com/freelancers/transcription',
          'https://www.rev.com/freelancers/captions']

def parse(self, response):
    items = FetchingItem()
    Transcription_price = response.css('#middle-benefit .mt1::text').extract()

    items['Transcription_price'] = Transcription_price

    def next_parse(self, response):
        other_items = FetchingItem()
        Caption_price = response.css('#middle-benefit .mt1::text').extract()

        other_items['Caption_price'] = Caption_price
        yield other_items

    yield items

1 个答案:

答案 0 :(得分:1)

您的代码永远不会到达方法self.next_parse。默认情况下,Scrapy将回调self.parse调用到self.start_urls中的每个URL。 您可以通过覆盖方法start_requests使用自定义的回调。

这是您的操作方式:

import scrapy
from .. items import FetchingItem
import re

class SiteFetching(scrapy.Spider):
    name = 'Site'

    def start_requests(self):
        return [
            scrapy.Request('https://www.rev.com/freelancers/transcription', callback=self.parse_transcription),
            scrapy.Request('https://www.rev.com/freelancers/captions', callback=self.parse_caption)
        ]

    def parse_transcription(self, response):
        items = FetchingItem()
        Transcription_price = response.css('#middle-benefit .mt1::text').extract()

        items['Transcription_price'] = Transcription_price
        yield items

    def parse_caption(self, response):
        other_items = FetchingItem()
        Caption_price = response.css('#middle-benefit .mt1::text').extract()

        other_items['Caption_price'] = Caption_price
        yield other_items

有关更多信息,请参见Spider documentation