python csv不打印重复项

时间:2014-06-17 22:30:29

标签: python-2.7 csv duplicates

我们有很多csv文件如下:

Name,Type
1,Fuji
2,Fuji
3,Fuji
4,Fuji
5,Washington
6,Washington
7,Washington
8,Washington
9,Washington

我们打印出苹果类型而不打印重复项。

Fuji:6  Washington:4 Gaza:1

或者

Fuji  Washington Gaza  

以下是我们的尝试。虽然它似乎并不是出于不明原因而起作用。

# Python 2.7 
import csv

import glob

import collections

from collections import Counter

list = glob.glob('C:Apple*.csv')

for file in list:

infile = open(file, "rb")

reader = csv.reader(infile)   

    for column in reader:

    Discipline = column[1]

    print collections.Counter(Discipline)   

 infile.close()

1 个答案:

答案 0 :(得分:0)

之前我没有使用csv模块,但这里是对我认为您可能尝试实现的内容的快速尝试。

import csv

src = r'C:\apples_before.csv'
dst = r'C:\apples_after.csv'

apples = set([])

# Read file.
with open(src, 'r') as srcfile:
    reader = csv.reader(srcfile, delimiter=',')
    for index, row in enumerate(reader):
        if index == 0:
            continue

        apples.add(row[1])

# Write file. 
# @warning: Please note that I am making an assumption in terms of the number
# component. I am assuming it is a row number.
with open(dst, 'w') as dstfile:
    writer = csv.writer(dstfile, delimiter=',')
    for index, apple in enumerate(apples):
        if index == 0:
            writer.writerow(['Name', 'Type'])

        writer.writerow([index + 1, apple])