我的第一列中有一个重复值的csv文件,例如:
mg,known,127
mg,unknown,142
pnt,known,37
pnt,unknown,0
lmo,known,75
lmo,unknown,3
sl,known,197
sl,unknown,21
oc,unknown,32
oc,known,163
sv,known,368
sv,unknown,308
az,unknown,6
az,known,241
bug,unknown,1
bug,known,167
li,unknown,15
li,known,174
lg,known,3
我想要做的是构建一个新的csv文件,例如:
header1, known, unknown
mg, 127, 142
pnt, 37, 0
我试图找出如何真正构建行:
def read_stats(path):
has_seen = set()
with open(writepath, 'wb') as write_csv:
with open(path, 'r') as csv_file:
data_reader = csv.reader(csv_file, delimiter=',')
for line in data_reader:
if line[0] in has_seen:
这是我目前被击中的地方,我是否必须保留指向下一行的指针?
答案 0 :(得分:3)
以下是一种在OrderedDict中累积结果的方法:
>>> import csv
>>> import collections
>>> d = collections.OrderedDict()
>>> for header1, category, value in csv.reader(datafile):
d.setdefault(header1, {})[category] = value
>>> for header1, m in d.items():
print ', '.join([header1, m['known'], m['unknown']])
mg, 127, 142
pnt, 37, 0
lmo, 75, 3
sl, 197, 21
oc, 163, 32
sv, 368, 308
az, 241, 6
bug, 167, 1
li, 174, 15
如果你可以假设这些行总是首先与已知组连续成对,你可以为知识创建一个中间结果,并为unkwowns发出一个完整的行:
>>> for header1, category, value in csv.reader(data):
if category == 'known':
result = [header1, value]
else:
result += [value]
print ', '.join(result)
mg, 127, 142
pnt, 37, 0
lmo, 75, 3
sl, 197, 21
oc, 163, 32
sv, 368, 308
az, 241, 6
bug, 167, 1
li, 174, 15