csv DictWriter不写所有单元格

时间:2014-05-15 19:25:29

标签: python csv file-io

我在使用DictWriter将dicts写入csv时遇到问题。我指定标头并将数据插入到csv中,但是某些列未被填充。具体而言,productID,userID和helpness。另一个问题是在移动到下一个条目之前,行被重复了几次。

我可以通过简单地打印它们来确认丢失的数据是否存在,但是它们在写入中丢失(以及其他数据重复)。

我的代码如下,我正在使用此处的数据集:http://snap.stanford.edu/data/web-FineFoods.html

import csv
list_of_dicts = []
dict_of_data = {}

filename = open('file.txt')
lines = filename.readlines()

cleanlines = [ line.strip() for line in lines ]

list_of_lists = []
group = []


print "cleaning the spaces"
for line in cleanlines:
    if line != '':       
        group.append(line)
    else:     
        list_of_lists.append(group)
        group = []

list_of_dicts = []

print "done cleaning spaces...making a dict for each group"
print "Also splitting each entry by ':' and '/'"
for group in list_of_lists:
    try:
        # Create a new dict for each group.
        group_dict = {}
        for line in group:
            #Split my ':' then by '/'
            longkey, value = line.split(': ', 1)
            # get second half
            shortkey = longkey.split('/')[1]
            group_dict[shortkey] = value
            list_of_dicts.append(group_dict)
           #print list_of_dicts
    except ValueError:
        #There could be inconsistent data
        pass
print "Finished! Setting the header for the CSV"
writer = csv.DictWriter(open('parsed.csv', 'w'),
                        ['productID','userID', 'profileName', 'helpfulness', 'review', 'time', 'summary', 'text'],
                        delimiter=',',
                        extrasaction='ignore')

writer.writeheader()
for review in list_of_dicts:
    writer.writerow(review)

这就是我所得到的(样本) - 数据也在重复:

  

的productID,用户ID,PROFILENAME,乐于助人,评论,时间,摘要,文本   ,, dll pa,0/0,,1182627213,不作为广告,"产品到达标记为   Jumbo Salted Peanuts ...花生实际上是小型的无盐。   不确定这是错误还是供应商打算代表   产品为"" Jumbo""。" ,, dll pa,0/0,,1182627213,不是   广告,"产品到达标记为Jumbo Salted Peanuts ...   花生实际上是小型的无盐。不确定这是不是   错误或供应商是否打算将产品表示为   ""巨型"""

1 个答案:

答案 0 :(得分:0)

CSV中的重复行是由于缩进错误造成的:

for group in list_of_lists:
    group_dict = {}
    for line in group:
        ...
        group_dict[shortkey] = value            
        list_of_dicts.append(group_dict)   #1

应该是

for group in list_of_lists:
    group_dict = {}
    for line in group:
        ...
        group_dict[shortkey] = value            
    list_of_dicts.append(group_dict)  #2

  1. 为组中的每一行在list_of_dicts中插入一次项目。
  2. list_of_dicts中为每个组插入一次项目 lists_of_lists