Question

我很难写下面的程序。

我有一个csv文件

"SNo","Column1","Column2"
"A1","X","Y"
"A2","A","B"
"A1","X","Z"
"A3","M","N"
"A1","D","E"

我想缩短此csv以遵循这些规则

a.) If the SNo occurs more than once in the file, 
    combine all column1 and column2 entries of that serial number
b.) If same column1 entries and column2 entries occur more than once, 
    then do not combine them twice.

因此上面的输出应该是

"SNo","Column1","Column2"
"A1","X,D","Y,Z,E"
"A2","A","B"
"A3","M","N"

到目前为止，我正在阅读csv文件，迭代行。检查下一行的SNo是否与前一行相同。什么是最好的结合方式。

import csv
temp = "A1"
col1=""
col2=""
col3=""
with open("C:\\file\\file1.csv","rb") as f:
    reader = csv.reader(f)
    for row in reader:
        if row[0] == temp:
            continue
        col1 = col1+row[1]
        col2=col2+row[2]
        col3=col3+row[3]
        temp = row[0]
        print row[0]+";"+col1+";"+col2+";"+col3
    col1=""
    col2=""
    col3=""

请让我知道这样做的好方法。

由于

Answer 1

最简单的方法是使用键作为序列号维护字典，并设置包含列的字典。然后你可以做类似以下的事情：

my_dict = {}

for row in reader:
    if not row[0] in my_dict.keys():
        my_dict[row[0]] = [set(), set()]

    my_dict[row[0]][0].add(row[1])
    my_dict[row[0]][1].add(row[2])

将文件写出（打开为file_out的文件）就像使用join命令迭代字典一样简单：

for k in my_dict.keys(): 
    file_out.write("{0},\"{1}\",\"{2}\"\n".format(
        k,
        ','.join([x for x in my_dict[k][0]]),
        ','.join([x for x in my_dict[k][1]])
    ))

根据规则python缩短csv文件

1 个答案: