在scrapy中给start_urls命名

时间:2015-11-22 22:55:23

标签: python scrapy

我正在从csv文件中抓取网址,每个网址都有一个名称。我如何下载这些网址并用他们的名字保存?

reader = csv.reader(open("source1.csv"))
for Name,Sources1 in reader:
    urls.append(Sources1)

class Spider(scrapy.Spider):
    name = "test"
    start_urls = urls[1:]

    def parse(self, response):
        filename = **Name** + '.pdf' //how can I get the names I read from the csv file?

1 个答案:

答案 0 :(得分:2)

也许你想覆盖start_requests()方法而不是使用start_urls?

示例:

class MySpider(scrapy.Spider):
    name = 'test'

    def start_requests(self):
        data = read_csv()
        for d in data:
            yield scrapy.Request(d.url, meta={'name': d.name})

请求的meta dict将被重新包含在响应中,以便稍后执行:

def parse(self, response):
    name = response.meta.get('name')
    ...