Scrapy-请求网址不正确

时间:2019-04-25 11:08:32

标签: url scrapy web-crawler

我正在尝试使用以下代码发送请求:

import scrapy
import json

class communes_spider(scrapy.Spider):
    name = "test"
    allowed_domains = ["www.drivy.com"]

    custom_settings = {
        'DOWNLOAD_DELAY' : 1,
    }

    start_urls = ['https://www.drivy.com/search.json?latitude={}&longitude={}&start_date=2019-05-06&start_time=09:00&end_date=2019-05-06&end_time=18:00']

    with open('C:/Users/coppe/drivy/communes.json') as json_file:  
        locations = json.load(json_file)[0:2]

    def start_request(self):
        for start_url in self.start_urls:
            for city in self.locations:
                url = start_url.format(city['lat'],city['long'])
                yield scrapy.Request(
                    url=url, 
                    callback=self.parse,
                )

    def parse(self,response):
        yield {'url':response.url}

简而言之,我想使用包含在JSON文件中的数据来格式化start_urls中的URL。在URL中,我要填充:latitude={}&longitude={},用一些数字代替大括号。这些数字位于键'lat''long'中的以下JSON文件中:

[
{"city": "Anvers  ", "region": "R\u00e9gion flamande", "lat": "51.217", "long": "4.4"},
{"city": "Zoersel", "region": "R\u00e9gion flamande", "lat": "51.267", "long": "4.717"},
]

但是,当我运行上面的代码时,请求的格式不正确,它为我提供了'https://www.drivy.com/search.json?latitude=%7B%7D&longitude=%7B%7D&start_date=2019-05-06&start_time=09:00&end_date=2019-05-06&end_time=18:00',但它应该产生'https://www.drivy.com/search.json?latitude=51.217&longitude=4.717&start_date=2019-05-06&start_time=09:00&end_date=2019-05-06&end_time=18:00'

有人能洞悉这里发生了什么吗?

0 个答案:

没有答案