拥有此CSV:
Domain,IP,Server,PoweredBy,MetaGenerator,Email
http://www.example1.com,1.1.1.1,,,,
http://www.example2.com,2.2.2.2,Apache,PHP/5.5.9-1ubuntu4.20,,
http://www.example3.com,3.3.3.3,Apache,PHP/5.5.9-1ubuntu4.20,Easy Digital Downloads v2.4.9;Powered by Visual Composer - drag and drop page builder for WordPress.,info@example3.com;sales@example3.com
尝试构建一个JSON对象数组,其中每个Object都是CSV值的唯一组合,其中有很多(由";"分隔),即
我们可以看到我们为www.example3.com提供了不同的MetaGenerators和电子邮件
对于这种情况,JSON对象数组应该如下所示,每个组合作为数组中的JSON对象:
[{'Domain': 'http://www.example1.com',
'Email': '',
'IP': '1.1.1.1',
'MetaGenerator': '',
'PoweredBy': '',
'Server': ''},
{'Domain': 'http://www.example2.com',
'Email': '',
'IP': '2.2.2.2',
'MetaGenerator': '',
'PoweredBy': 'PHP/5.5.9-1ubuntu4.20',
'Server': 'Apache'},
{'Domain': 'http://www.example3.com',
'Email': 'sales@example3.com',
'IP': '2.2.2.2',
'MetaGenerator': 'Easy Digital Downloads v2.4.9',
'PoweredBy': 'PHP/5.5.9-1ubuntu4.20',
'Server': 'Apache'},
{'Domain': 'http://www.example3.com',
'Email': 'sales@example3.com',
'IP': '2.2.2.2',
'MetaGenerator': 'Powered by Visual Composer - drag and drop page builder for WordPress.',
'PoweredBy': 'PHP/5.5.9-1ubuntu4.20',
'Server': 'Apache'},
{'Domain': 'http://www.example3.com',
'Email': 'info@example3.com',
'IP': '2.2.2.2',
'MetaGenerator': 'Easy Digital Downloads v2.4.9',
'PoweredBy': 'PHP/5.5.9-1ubuntu4.20',
'Server': 'Apache'},
{'Domain': 'http://www.example3.com',
'Email': 'info@example3.com',
'IP': '2.2.2.2',
'MetaGenerator': 'Powered by Visual Composer - drag and drop page builder for WordPress.',
'PoweredBy': 'PHP/5.5.9-1ubuntu4.20',
'Server': 'Apache'}]
拥有此Python代码:
import csv
import pprint
import json
with open("results.csv", 'r') as csvfile:
reader = csv.DictReader(csvfile, delimiter=',')
out=[]
d=dict()
for row in reader:
if ';' in row['Email']:
val = row['Email'].split(';')
for v in val:
d['Email']=v
out.append(d)
if ';' in row['MetaGenerator']:
val = row['MetaGenerator'].split(';')
for v in val:
d['MetaGenerator']=v
out.append(d)
else:
d=row
out.append(d)
pprint.pprint(out)
但它无法正常工作。
如何实现我的目标?伪代码也行。订单并不重要。我应该使用哪些模块?
谢谢,
答案 0 :(得分:3)
试试这个(查看itertools doc):
import csv
import pprint
import json
import itertools
out=[]
with open("results.csv", 'r') as csvfile:
reader = csv.DictReader(csvfile, delimiter=',')
for row in reader:
Domains = row['Domain'].split(";")
Ips = row['IP'].split(";")
Servers = row['Server'].split(";")
Emails = row['Email'].split(";")
MetaGenerators = row['MetaGenerator'].split(";")
PoweredBy = row['PoweredBy'].split(";")
for comb in itertools.product(Domains, Ips, Servers, Emails, MetaGenerators, PoweredBy):
(cDomain, cIp, cServer, cEmail, cMeta, cPowered) = comb
out.append({
'Domain': cDomain,
'IP': cIp,
'Server': cServer,
'Email': cEmail,
'MeraGenerator': cMeta,
'PoweredBy': cPowered
})
pprint.pprint(out)
检查这个与csv字段隔离的不太可读但更智能的解决方案:
out=[]
with open("results.csv", 'r') as csvfile:
reader = csv.DictReader(csvfile, delimiter=',')
headers = reader.fieldnames
for row in reader:
fields = [value.split(";") for key, value in row.iteritems()]
out += [{headers[key]: value for key, value in enumerate(comb)} for comb in itertools.product(*fields)]
pprint.pprint(out)