我在使用DictWriter将dicts写入csv时遇到问题。我指定标头并将数据插入到csv中,但是某些列未被填充。具体而言,productID,userID和helpness。另一个问题是在移动到下一个条目之前,行被重复了几次。
我可以通过简单地打印它们来确认丢失的数据是否存在,但是它们在写入中丢失(以及其他数据重复)。
我的代码如下,我正在使用此处的数据集:http://snap.stanford.edu/data/web-FineFoods.html
import csv
list_of_dicts = []
dict_of_data = {}
filename = open('file.txt')
lines = filename.readlines()
cleanlines = [ line.strip() for line in lines ]
list_of_lists = []
group = []
print "cleaning the spaces"
for line in cleanlines:
if line != '':
group.append(line)
else:
list_of_lists.append(group)
group = []
list_of_dicts = []
print "done cleaning spaces...making a dict for each group"
print "Also splitting each entry by ':' and '/'"
for group in list_of_lists:
try:
# Create a new dict for each group.
group_dict = {}
for line in group:
#Split my ':' then by '/'
longkey, value = line.split(': ', 1)
# get second half
shortkey = longkey.split('/')[1]
group_dict[shortkey] = value
list_of_dicts.append(group_dict)
#print list_of_dicts
except ValueError:
#There could be inconsistent data
pass
print "Finished! Setting the header for the CSV"
writer = csv.DictWriter(open('parsed.csv', 'w'),
['productID','userID', 'profileName', 'helpfulness', 'review', 'time', 'summary', 'text'],
delimiter=',',
extrasaction='ignore')
writer.writeheader()
for review in list_of_dicts:
writer.writerow(review)
这就是我所得到的(样本) - 数据也在重复:
的productID,用户ID,PROFILENAME,乐于助人,评论,时间,摘要,文本 ,, dll pa,0/0,,1182627213,不作为广告,"产品到达标记为 Jumbo Salted Peanuts ...花生实际上是小型的无盐。 不确定这是错误还是供应商打算代表 产品为"" Jumbo""。" ,, dll pa,0/0,,1182627213,不是 广告,"产品到达标记为Jumbo Salted Peanuts ... 花生实际上是小型的无盐。不确定这是不是 错误或供应商是否打算将产品表示为 ""巨型"""
答案 0 :(得分:0)
CSV中的重复行是由于缩进错误造成的:
for group in list_of_lists:
group_dict = {}
for line in group:
...
group_dict[shortkey] = value
list_of_dicts.append(group_dict) #1
应该是
for group in list_of_lists:
group_dict = {}
for line in group:
...
group_dict[shortkey] = value
list_of_dicts.append(group_dict) #2
list_of_dicts
中插入一次项目。list_of_dicts
中为每个组插入一次项目
lists_of_lists
。