在事先不知道字段时使用DictWriter写入CSV

时间:2014-11-06 04:56:29

标签: python python-2.7 csv dictionary export-to-csv

我正在将一大段文本解析为字典,最终目标是创建一个CSV文件,其中键为列标题。

csv.DictWriter(csvfile, fieldnames, restval='', extrasaction='raise', dialect='excel', *args, **kwds)

问题出现了,因为任何'n'行的字典都可以包含一个新的,从未使用过的密钥。然后,我希望CSV也包含此新密钥的列。简而言之,我的所有字段都不是事先知道的,因此我无法在开头编译完整的fieldnames

是否建议让csv.DictWriter不忽略丢失的字段,而是将其添加到fieldnames?此时仅更改fieldnames会使前一行的字段数量不正确。

2 个答案:

答案 0 :(得分:3)

而不是使用 DictWriter ,这可能会让您感到困惑,因为词典没有被订购,我尝试使用 csv writerow 方法。 这是我做的:

"""
a) First took all the keys of dictionary and sorted it, which is not necessary.
b) Created a result list which appends value related the headers which is key of our input dict and if key is not available then .get() will return None. 
   So result list will contain lists for rows data.
c) Wrote header and each row from result list in csv file
"""

data_dict = [{ "Header_1":"data_1", "Header_2":"data_2", "Header_3":"data_3"},
             { "Header_1":"data_4", "Header_2":"data_5", "Header_3":"data_6"},
             { "Header_1":"data_7", "Header_2":"data_8", "Header_3":"data_9", "Header_4":"data_10"},
             { "Header_1":"data_11", "Header_3":"data_12"},
             { "Header_1":"data_13", "Header_2":"data_14", "Header_3":"data_15"}]

"""
   In the third dict we have extra key, value.
   In forth we dont have have header_2 were we aspect blank value in our csv file.
"""
process_data = [ [k,v] for _dict in data_dict for k,v in _dict.iteritems() ]           

headers = [ i[0] for i in process_data ]
headers = sorted(list(set(headers)))

result = []
for _dict in data_dict:
    row = []
    for header in headers:
        row.append(_dict.get(header, None))
    result.append(row)


import csv
with open('demo.csv', 'wb') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=';', dialect='excel', 
                            quotechar='|', quoting=csv.QUOTE_MINIMAL)
    spamwriter.writerow(headers)    
    for r in result:
        spamwriter.writerow(r)

enter image description here

答案 1 :(得分:0)

我执行了以下操作:收集标头的所有唯一值并创建这些值的列表。对于列表,您可以使用默认值 (restval='') 来忽略不在行中的值。