从csv中提取数据的问题

时间:2017-01-05 19:21:22

标签: python csv scrapy

class QuotesSpider(scrapy.Spider):
    name = "googlemailverif"

    with open('input.csv', "r") as csvfile:
        datareader = csv.reader(csvfile)

        start_urls=['https://www.google.fr/search?q=email'+str(row[2]) for row in datareader]



    # starting parsing
    def parse(self, response):
        yield {
                'url': response.url,
                'nom': "nom",
                'emails': re.findall(r"[a-zA-Z0-9_\.+-]+@[a-zA-Z0-9_\.+-]+\.[a-zA-Z]{2,6}",''.join(response.xpath("//body//text()").extract()).strip()),
                'SIRET':"SIRET",
                    }

这是一个代码,可以尝试从csv文件(在第3列中提取公司名称)来检查谷歌上的电子邮件。 第一列包含我想要在csv中提取的信息" SIRET"。 我该怎么做?

如果我在读取csv时在start_urls中提取它,我的网址会很糟糕。如果我使用它解析我不会:拥有与解析数据相关的良好数据,我可能会因为访问文件2次而出错。

如何在解析函数中将第一次读取信息转到SIRET?

我正在努力工作几个小时:(

最佳,

2 个答案:

答案 0 :(得分:0)

我们可以使用zip

sirets, start_urls = zip(*[(row[0], 'https://www.google.fr/search?q=email'+str(row[2])) for row in datareader])

现在您有一个包含SIRET值的列表和另一个包含URL的列表

答案 1 :(得分:0)

"SIRET","NIC","L1_NORMALISEE","L2_NORMALISEE","L3_NORMALISEE","L4_NORMALISEE","L5_NORMALISEE","L6_NORMALISEE","L7_NORMALISEE","L1_DECLAREE","L2_DECLAREE","L3_DECLAREE","L4_DECLAREE","L5_DECLAREE","L6_DECLAREE","L7_DECLAREE","NUMVOIE","INDREP","TYPVOIE","LIBVOIE","CODPOS","CEDEX","RPET","LIBREG","DEPET","ARRONET","CTONET","COMET","LIBCOM","DU","TU","UU","EPCI","TCD","ZEMET","SIEGE","ENSEIGNE","IND_PUBLIPO","DIFFCOM","AMINTRET","NATETAB","LIBNATETAB","APET700","LIBAPET","DAPET","TEFET","LIBTEFET","EFETCENT","DEFET","ORIGINE","DCRET","DATE_DEB_ETAT_ADM_ET","ACTIVNAT","LIEUACT","ACTISURF","SAISONAT","MODET","PRODET","PRODPART","AUXILT","NOMEN_LONG","SIGLE","NOM","PRENOM","CIVILITE","RNA","NICSIEGE","RPEN","DEPCOMEN","ADR_MAIL","NJ","LIBNJ","APEN700","LIBAPEN","DAPEN","APRM","ESSEN","DATEESS","TEFEN","LIBTEFEN","EFENCENT","DEFEN","CATEGORIE","DCREN","AMINTREN","MONOACT","MODEN","PRODEN","ESAANN","TCA","ESAAPEN","ESASEC1N","ESASEC2N","ESASEC3N","ESASEC4N","VMAJ","VMAJ1","VMAJ2","VMAJ3","DATEMAJ"
"005720164","00028","SA SAINTE ISABELLE","","","236 ROUTE D AMIENS","","80100 ABBEVILLE","FRANCE","SA SAINTE-ISABELLE","","","236 RTE D AMIENS","","80100 ABBEVILLE","","236","","RTE","D AMIENS","80100","","32","Nord-Pas-de-Calais-Picardie","80","1","98","001","ABBEVILLE","80","4","01","248000556","41","2209","1","","1","O","201209","","","8610Z","Activités hospitalières","2008","22","100 à 199 salariés","100","2015","1","19830928","19830928","NR","99","","P","S","O","","0","SA SAINTE-ISABELLE","","","","","","00028","32","80001","","5599","SA à conseil d'administration (s.a.i.)","8610Z","Activités hospitalières","2008","","","","22","100 à 199 salariés","100","2015","ETI","19570101","201209","1","S","O","","","","","","","","","","","","2014-07-30T00:00:00"
"005720784","00031","ETABLISSEMENTS DECAYEUX","","","ZONE INDUSTRIELLE","","80210 FEUQUIERES EN VIMEU","FRANCE","ETABLISSEMENTS DECAYEUX","","","ZONE INDUSTRIELLE","","80210 FEUQUIERES EN VIMEU","","","","","ZONE INDUSTRIELLE","80210","","32","Nord-Pas-de-Calais-Picardie","80","1","17","308","FEUQUIERES EN VIMEU","80","1","18","248000630","15","0055","0","","1","O","201209","","","2572Z","Fabrication de serrures et de ferrures","2008","22","100 à 199 salariés","100","2015","4","19930401","19930401","NR","99","","P","S","O","","0","ETABLISSEMENTS DECAYEUX","","","","","","00015","32","80308","","5710","SAS/// société par actions simplifiée","2599A","Fabrication d'articles métalliques ménagers","2008","","N","20160915","32","250 à 499 salariés","200","2015","ETI","19570101","201209","3","S","O","2012","6","2599A","2599A","2599B","2572Z","4649Z","","","","","2001-12-13T00:00:00"

这是csv的摘录

每次我有一个“SIRET”作为sirets值,但其他var每次都会增加和变化

非常感谢++