在start_urls中传递不同的url

时间:2018-10-17 22:40:22

标签: python web-scraping scrapy scrapy-spider

我对新手很陌生,并逐步构建了我的第一个蜘蛛,我试图将不同的url传递给start_urls,然后我想到了将它们添加到列表中,然后遍历该列表到start_urls ,这个问题是我执行时仅接受列表的网址,然后停止。

数据可以正确返回它们,但只能返回其中一个url,它不能构成完整的循环。 我究竟做错了什么?。 谢谢

class alquilerVehiculo1(CrawlSpider):

    plantilla = ("https://www.rentalcars.com/SearchResults.do?country=Argentina&doYear={año_devolucion}&doFiltering=true"
            "&fromLocChoose=true&filterTo=49&dropLocationName={localidad}&ftsType=C&ftsLocationSearch={codigoLocalidad}"
            "&dropFtsSearch=L&doDay={dia_devolucion}&searchType=allareasgeosearch&filterFrom=0&puMonth={mes_solicitud}&dropFtsInput={localidad}&dropCountry=Argentina"
            "&puDay={dia_solicitud}&dropFtsLocationSearch={codigoLocalidad}&puHour=10&dropFtsEntry=22776&enabler=&distance=10"
            "&ftsEntry=22776&city={localidad}&driverage=on&filterName=CarCategorisationSupplierFilter&dropCity={localidad}"
            "&dropFtsType=C&ftsAutocomplete={localidad}+Argentina&driversAge=30&dropFtsAutocomplete={localidad}+Argentina"
            "&dropFtsLocationName={localidad}&dropCountryCode=&doMinute=0&countryCode=&puYear={año_solicitud}&locationName=&puMinute=0&ftsInput={localidad}"
            "&coordinates={cordenadas}&dropLocation={codigoLocalidad}&doHour=10&dropCoordinates={cordenadas}" 
            "&ftsLocationName={localidad}&ftsSearch=L&location={codigoLocalidad}&doMonth={mes_devolucion}&reducedCategory=medium&filterAdditionalInfo=&advSearch=&exSuppliers=&ordering=price")

    casos = [{"localidad":"Salta",
            "codigoLocalidad": "161",
            "cordenadas":"-24.7833%2C-65.4167"},
            {"localidad":"Mendoza",
            "codigoLocalidad": "106",
            "cordenadas":"-32.889%2c-68.843"}]

    dias_Semana = date.today() + timedelta(7)
    dt_3 = dias_Semana + timedelta(3)

    for datos in casos:
        datos.update({"año_devolucion": dt_3.year,"dia_devolucion": dt_3.day,"mes_solicitud":dias_Semana.month ,"dia_solicitud": dias_Semana.day,"año_solicitud":dias_Semana.year,"mes_devolucion":dt_3.month})
        #print(plantilla.format(**datos))

        name = 'alquilerVehiculoMediano'
        start_urls = [plantilla.format(**datos)]

    def parse(self,response):
        for folow_url in response.css("a.show-cars-link::attr(href)").extract():

            url = response.urljoin(folow_url)
            yield Request(url,callback = self.populate_item)
        # yield self.paginate(response)

    def populate_item(self,response):
        item_loader = ItemLoader(item=ReporteinmobiliarioItem(),response=response)
        item_loader.default_input_procesor = MapCompose(remove_tags)

        item_loader.add_css('compania', 'div.carResultRow_OfferInfo_Supplier-wrap>h4::text')
        item_loader.add_css('valor','span[class="carResultRow_Price-now"]::text') #'span.carResultRow_Price-now::text')
        item_loader.add_css('dias', 'span.carResultRow_Price-duration::text')
        item_loader.add_value('tipoVehiculo','Coche Mediano')
        item_loader.add_css('modelo','td.carResultRow_CarSpec>h2::text')
        item_loader.add_css('recogida_devolucion','div.search-summary__location::text')

        yield item_loader.load_item()

1 个答案:

答案 0 :(得分:0)

您的代码在每个循环中都覆盖变量:

for datos in casos:
    start_urls = [plantilla.format(**datos)]
            ^^^^^

应该是:

start_urls = []
for datos in casos:
    start_urls.append(plantilla.format(**datos))