在Python中合并具有相似值的CSV行

时间:2019-11-12 04:41:58

标签: python dataframe

我需要提取并将第一列具有相同值的多行合并为一行。输入的csv如下所示:

来源:

20191111,test7,10,0,0,0
20191111,test6,0,9,0,0
20191111,test5,0,0,8,0
20191111,test3,0,0,0,7
20191111,test2,0,0,0,0
20191111,test1,0,0,0,0
20191110,test7,0,0,0,0
20191110,test6,0,0,0,0
20191110,test5,0,0,0,0
20191110,test3,0,0,0,0
20191110,test2,0,0,0,0
20191110,test1,0,0,0,0


target:

20191111,test7,10,0,0,0,test6,0,9,0,0,test5,0,0,8,0, .....
20191110,test7,0,0,0,0,test6,0,0,0,0,test5,0,0,0,0, .....

1 个答案:

答案 0 :(得分:0)

像这样的事情应该起作用。编写此代码不会伤害大熊猫。

import collections
import csv
import io
import itertools
import sys

file = io.StringIO(
    """
20191111,test7,10,0,0,0
20191111,test6,0,9,0,0
20191111,test5,0,0,8,0
20191111,test3,0,0,0,7
20191111,test2,0,0,0,0
20191111,test1,0,0,0,0
20191110,test7,0,0,0,0
20191110,test6,0,0,0,0
20191110,test5,0,0,0,0
20191110,test3,0,0,0,0
20191110,test2,0,0,0,0
20191110,test1,0,0,0,0
""".strip()
)

groups = collections.defaultdict(list)
for row in csv.reader(file):
    groups[row[0]].append(row)  # storing the full row here, for greater reusability

out = csv.writer(sys.stdout)

# NB: `groups` aren't (necessarily) in any sorted order;
#     could add e.g. `sorted(groups.items())` here to sort by the key
for group_key, rows in groups.items():
    # Build the transposed row from the group key, then the rows sans the first column of each
    transposed_row = [group_key] + list(itertools.chain(*[row[1:] for row in rows]))

    # Write to the CSV writer; you could append to a dataframe or anything else here.
    out.writerow(transposed_row)