我已经删除了一些data,由于网站的结构方式,我将数据放入两个词典中。
>>>pprint(dict(data))
{u'Additional compensation': [u'$32,241'],
u'Agency': [u'Chesterfield County Schools', u'City of Richmond Schools'],
u'Bonuses or other allowances': [u'$12,500'],
u'COMMENTS': [u'$28,088 - Board Paid Annuity; $4,153 - Excess Health Benefit Contribution;',
u''],
u'Full Name': [u'Marcus J. Newsome', u'Dana T. Bedden'],
u'Total Compensation': [u'$282,258', u'']}
>>>pprint(dict(data2))
{u'Base Salary': [u'$229,758', u'$234,068'],
u'COMMENTS': [u'12,500 CAR ALLOWANCE, 40,000 DEFFERRED COMPENSATION'],
u'Deferred compensation': [u'$40,000'],
u'Job Title': [u'SUPERINTENDENT', u'SUPERINTENDENT'],
u'Total Compensation': [u'$266,309'],
u'Work location': [u'Office Of Superintendent']}
我已将数据合并到一个主词典中,并且我试图将其放入csv文件中。
for d in data2, data:
for k, v in d.iteritems():
master_data[k].append(v)
with open('test2.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(zip(*([k] + master_data[k] for k in sorted(master_data))))
问题是只有第一个人的(Marcus J. Newsome
)信息才会导出到csv。我认为这是因为Additional compensation
的数据中存在Dana T. Bedden
所属的键/值(例如Marcus J Newsome
)。
为了解决这个问题,我尝试将None
添加到位置以解决此问题。
for d in data2, data:
master_data.update((k, [None, master_data[k]]) for k in master_data if k not in d)
>>>pprint(dict(master_data))
{u'Additional compensation': [None, [[u'$32,241']]],
u'Agency': [None,
[[u'Chesterfield County Schools', u'City of Richmond Schools']]],
u'Base Salary': [None, [[u'$229,758', u'$234,068']]],
u'Bonuses or other allowances': [None, [[u'$12,500']]],
u'COMMENTS': [[u'12,500 CAR ALLOWANCE, 40,000 DEFFERRED COMPENSATION'],
[u'$28,088 - Board Paid Annuity; $4,153 - Excess Health Benefit Contribution;',
u'']],
u'Deferred compensation': [None, [[u'$40,000']]],
u'Full Name': [None, [[u'Marcus J. Newsome', u'Dana T. Bedden']]],
u'Job Title': [None, [[u'SUPERINTENDENT', u'SUPERINTENDENT']]],
u'Total Compensation': [[u'$266,309'], [u'$282,258', u'']],
u'Work location': [None, [[u'Office Of Superintendent']]]}
不幸的是,这似乎不像我想要的那样工作。最终我想让我的输出看起来像这样:
所需输出
{u'Additional compensation': [[None, [u'$32,241']]],
u'Agency': [[u'Chesterfield County Schools'], [u'City of Richmond Schools']]],
u'Base Salary': [[u'$229,758'], [u'$234,068']]],
u'Bonuses or other allowances': [[u'$12,500'], None]],
u'COMMENTS': [[u'12,500 CAR ALLOWANCE, 40,000 DEFFERRED COMPENSATION'],
[u'$28,088 - Board Paid Annuity; $4,153 - Excess Health Benefit Contribution;',
u'']],
u'Deferred compensation': [[u'$40,000'], None]],
u'Full Name': [[u'Marcus J. Newsome'], [u'Dana T. Bedden']]],
u'Job Title': [[u'SUPERINTENDENT'], [u'SUPERINTENDENT']]],
u'Total Compensation': [[u'$266,309'], [u'$282,258', u'']],
u'Work location': [None, [u'Office Of Superintendent']]]}
有人有什么想法吗?
答案 0 :(得分:1)
改变存储数据的方式会好得多。
伪代码:
data = []
for row in table:
person = get_data_from_row(row)
person.update(get_data_from_person_page(row))
data.append(person)
然后您可以使用csv.DictWriter
而无需任何复杂的数据操作:
with open('data.csv', 'w') as f:
fieldnames = data[0].keys()
writer = csv.DictWriter(f, fieldnames)
writer.writeheader()
for row in data:
writer.writerow(row)
答案 1 :(得分:0)
我会重新构建数据,以便您为每个人的所有相关字段都有一个字典。您可以轻松使用cssv的dictwriter类来导出该数据。