Python:根据第一列和第二列将列添加到csv文件

时间:2013-04-17 03:07:03

标签: python csv

我的第一列中有一个重复值的csv文件,例如:

mg,known,127
mg,unknown,142
pnt,known,37
pnt,unknown,0
lmo,known,75
lmo,unknown,3
sl,known,197
sl,unknown,21
oc,unknown,32
oc,known,163
sv,known,368
sv,unknown,308
az,unknown,6
az,known,241
bug,unknown,1
bug,known,167
li,unknown,15
li,known,174
lg,known,3

我想要做的是构建一个新的csv文件,例如:

header1, known, unknown
mg, 127, 142
pnt, 37, 0

我试图找出如何真正构建行:

def read_stats(path):
    has_seen = set()
    with open(writepath, 'wb') as write_csv:
        with open(path, 'r') as csv_file:
            data_reader = csv.reader(csv_file, delimiter=',')
            for line in data_reader:
                if line[0] in has_seen:

这是我目前被击中的地方,我是否必须保留指向下一行的指针?

1 个答案:

答案 0 :(得分:3)

以下是一种在OrderedDict中累积结果的方法:

>>> import csv
>>> import collections

>>> d = collections.OrderedDict()
>>> for header1, category, value in csv.reader(datafile):
        d.setdefault(header1, {})[category] = value

>>> for header1, m in d.items():
        print ', '.join([header1, m['known'], m['unknown']])

mg, 127, 142
pnt, 37, 0
lmo, 75, 3
sl, 197, 21
oc, 163, 32
sv, 368, 308
az, 241, 6
bug, 167, 1
li, 174, 15

如果你可以假设这些行总是首先与已知组连续成对,你可以为知识创建一个中间结果,并为unkwowns发出一个完整的行:

>>> for header1, category, value in csv.reader(data):
        if category == 'known':
            result = [header1, value]
        else:
            result += [value]
            print ', '.join(result)

mg, 127, 142
pnt, 37, 0
lmo, 75, 3
sl, 197, 21
oc, 163, 32
sv, 368, 308
az, 241, 6
bug, 167, 1
li, 174, 15