Scrapy将来自csv文件的额外数据传递给解析

时间:2017-03-02 14:38:16

标签: python csv scrapy scrapy-spider

我的scrapy蜘蛛查看csv文件并使用csv文件中的地址运行start_urls,如下所示:

 from csv import DictReader
   with open('addresses.csv') as rows:
     start_urls=['http://www.example.com/search/?where='+row["Address"].replace(',','').replace(' ','+') for row in DictReader(rows)]

但.csv文件还包含电子邮件和其他信息。如何将这些额外信息传递到解析中以将其添加到新文件中?

import scrapy
from csv import DictReader

with open('addresses.csv') as rows:
  names=[row["Name"].replace(',','') for row in DictReader(rows)]
  emails=[row["Email"].replace(',','') for row in DictReader(rows)]
  start_urls=['http://www.example.com/search/?where='+row["Address"].replace(',','').replace(' ','+') for row in DictReader(rows)]

 def parse(self,response):
   yield{
     'name': FROM CSV,
     'email': FROM CSV,
     'address' FROM SCRAPING: 
     'city' FROM SCRAPING: 
    }

1 个答案:

答案 0 :(得分:3)

import scrapy
from csv import DictReader

class MySpider(scrapy.Spider):

    def start_requests(self):

        with open('addresses.csv') as rows:

            for row in DictReader(rows):

                name=row["Name"].replace(',','')
                email=row["Email"].replace(',','')

                link = 'http://www.example.com/search/?where='+row["Address"].replace(',','').replace(' ','+')

                yield Request(url = link, callback = self.parse, method = "GET", meta={'name':name, 'email':email})


    def parse(self,response):
        yield{
         'name': resposne.meta['name'],
         'email': respose.meta['email'],
         'address' FROM SCRAPING: 
         'city' FROM SCRAPING: 
        }
  • 打开CSV文件。
  • start_requests方法中迭代它。
  • 将参数传递给回调函数,使用meta变量,可以在meta中传递Python字典。

注意: 请记住start_requests不是自定义方法,而是Python Scrapy的方法。见https://doc.scrapy.org/en/latest/topics/spiders.html#scrapy.spiders.Spider.start_requests