如何从csv文件中删除重复的排列?

时间:2014-12-01 14:52:55

标签: python csv permutation

我正在尝试从一个大的csv文件中获取3列并找到排列以便仅保留唯一的三元组并将其放入另一个csv中。

例如,如果我有:

[8,9,15]
[78,35,98]
[90,35,56]
[64,89,98]
[15,8,9]...etc

必须发现第一个三联体与第五个三联体相同,只保留其中一个。我写了以下内容,但它不起作用。

 import csv
 reader=csv.reader(open('file1.csv','r'), delimiter = ',')
 writer=csv.writer(open('mynew.csv', 'w'), delimiter=',')
 myset = set()
 for row in reader:
    if row[0] not in myset:
       writer.writerow(row)
    if row[1] not in myset:
       writer.writerow(row)
    if row[2] not in myset:
       writer.writerow(row)

1 个答案:

答案 0 :(得分:0)

试试这个:

#!/usr/bin/env python
import csv
reader=csv.reader(open('file1.csv','r'), delimiter = ',')
writer=csv.writer(open('mynew.csv', 'w'), delimiter=',')
myset = set()
for row in reader:
    print "adding %s" % row
    # a frozen set is hashable and can be inserted to a set
    # this assumes no duplicates exist within the row like 1,1,2,3,4 (two 1's)
    # (otherwise you'll have to hash the row yourself)
    myset.add(frozenset(row))
    print "set size: %d" % len(myset)

print myset