Python CSV到JSON数组对象具有CSV中的唯一值作为一个JSON对象,其中有多个

时间:2017-01-09 13:19:07

标签: python json csv

拥有此CSV:

Domain,IP,Server,PoweredBy,MetaGenerator,Email
http://www.example1.com,1.1.1.1,,,,
http://www.example2.com,2.2.2.2,Apache,PHP/5.5.9-1ubuntu4.20,,
http://www.example3.com,3.3.3.3,Apache,PHP/5.5.9-1ubuntu4.20,Easy Digital Downloads v2.4.9;Powered by Visual Composer - drag and drop page builder for WordPress.,info@example3.com;sales@example3.com

尝试构建一个JSON对象数组,其中每个Object都是CSV值的唯一组合,其中有很多(由";"分隔),即

我们可以看到我们为www.example3.com提供了不同的MetaGenerators和电子邮件

对于这种情况,JSON对象数组应该如下所示,每个组合作为数组中的JSON对象:

[{'Domain': 'http://www.example1.com',
  'Email': '',
  'IP': '1.1.1.1',
  'MetaGenerator': '',
  'PoweredBy': '',
  'Server': ''},
 {'Domain': 'http://www.example2.com',
  'Email': '',
  'IP': '2.2.2.2',
  'MetaGenerator': '',
  'PoweredBy': 'PHP/5.5.9-1ubuntu4.20',
  'Server': 'Apache'},
 {'Domain': 'http://www.example3.com',
  'Email': 'sales@example3.com',
  'IP': '2.2.2.2',
  'MetaGenerator': 'Easy Digital Downloads v2.4.9',
  'PoweredBy': 'PHP/5.5.9-1ubuntu4.20',
  'Server': 'Apache'},
 {'Domain': 'http://www.example3.com',
  'Email': 'sales@example3.com',
  'IP': '2.2.2.2',
  'MetaGenerator': 'Powered by Visual Composer - drag and drop page builder for WordPress.',
  'PoweredBy': 'PHP/5.5.9-1ubuntu4.20',
  'Server': 'Apache'},
 {'Domain': 'http://www.example3.com',
  'Email': 'info@example3.com',
  'IP': '2.2.2.2',
  'MetaGenerator': 'Easy Digital Downloads v2.4.9',
  'PoweredBy': 'PHP/5.5.9-1ubuntu4.20',
  'Server': 'Apache'},
 {'Domain': 'http://www.example3.com',
  'Email': 'info@example3.com',
  'IP': '2.2.2.2',
  'MetaGenerator': 'Powered by Visual Composer - drag and drop page builder for WordPress.',
  'PoweredBy': 'PHP/5.5.9-1ubuntu4.20',
  'Server': 'Apache'}]

拥有此Python代码:

import csv
import pprint
import json

with open("results.csv", 'r') as csvfile:
    reader = csv.DictReader(csvfile, delimiter=',')
    out=[]
    d=dict()
    for row in reader:
        if ';' in row['Email']:
          val = row['Email'].split(';')
          for v in val:
            d['Email']=v
            out.append(d)    
        if ';' in row['MetaGenerator']:
          val = row['MetaGenerator'].split(';')
          for v in val:
            d['MetaGenerator']=v
            out.append(d)
        else:
          d=row
          out.append(d) 


pprint.pprint(out)

但它无法正常工作。

如何实现我的目标?伪代码也行。订单并不重要。我应该使用哪些模块?

谢谢,

1 个答案:

答案 0 :(得分:3)

试试这个(查看itertools doc):

import csv
import pprint
import json
import itertools

out=[]
with open("results.csv", 'r') as csvfile:
    reader = csv.DictReader(csvfile, delimiter=',')
    for row in reader:

        Domains = row['Domain'].split(";")
        Ips = row['IP'].split(";")
        Servers = row['Server'].split(";")
        Emails = row['Email'].split(";")
        MetaGenerators = row['MetaGenerator'].split(";")
        PoweredBy = row['PoweredBy'].split(";")

        for comb in itertools.product(Domains, Ips, Servers, Emails, MetaGenerators, PoweredBy):
            (cDomain, cIp, cServer, cEmail, cMeta, cPowered) = comb

            out.append({
                    'Domain': cDomain,
                    'IP': cIp,
                    'Server': cServer,
                    'Email': cEmail,
                    'MeraGenerator': cMeta,
                    'PoweredBy': cPowered
                })

pprint.pprint(out)

检查这个与csv字段隔离的不太可读但更智能的解决方案:

out=[]
with open("results.csv", 'r') as csvfile:
    reader = csv.DictReader(csvfile, delimiter=',')
    headers = reader.fieldnames

    for row in reader:
        fields = [value.split(";") for key, value in row.iteritems()]
        out += [{headers[key]: value for key, value in enumerate(comb)} for comb in itertools.product(*fields)]

pprint.pprint(out)