查找CSV文件中的重复项总数

时间:2016-11-02 17:40:34

标签: python csv

我正在解析CSV文件并需要您的帮助。我的CSV文件中有重复项。我想告诉Python向我提供重复地址的总数和唯一地址的总数,然后列出它们。我已成功到达地址显示的部分,如果它是唯一的或重复的,但现在我想告诉Python为我提供受尊重的数字。

import csv

csv_data = csv.reader(file('T:\DataDump\Book1.csv'))

next(csv_data)

already_seen = set()

for row in csv_data:
    Address = row[6]
    if Address in already_seen:
        print('{} is a duplicate Address'.format(Address))
    else:
        print('{} is a unique Address'.format(Address))
        already_seen.add(Address)

3 个答案:

答案 0 :(得分:3)

您可以通过1次单独传递检测重复项,但您必须完全阅读该文件,以确定它是否重复并计算有多少重复项。

这里需要2次传球。像这样使用collections.Counter

import csv
import collections

with open(r"T:\DataDump\Book1.csv") as f:
    csv_data = csv.reader(f,delimiter=",")

    next(csv_data)  # skip title line

    count = collections.Counter()

    # first pass: read the file
    for row in csv_data:
        address = row[6]
        count[address] += 1

    # second pass: display duplicate info & compute total
    total_dups = 0
    for address,nb in count.items():
        if nb>1:
            total_dups += nb
            print('{} is a duplicate address, seen {} times'.format(address,nb))
        else:
            print('{} is a unique address'.format(address))
    print("Total duplicate addresses {}".format(toal_dups))

打印您可以直接执行的重复地址总数:

    print("Total duplicate addresses {}".format(sum(x for x in count.values() if x > 1)))

答案 1 :(得分:0)

使用此:

my_dict = { i:My_List.count(i) for i in My_List}

它将返回包括重复项在内的每个实例的计数

答案 2 :(得分:0)

这应该像使用字典来存储地址数一样简单:

import csv

csv_data = csv.reader(file('T:\DataDump\Book1.csv'))
next(csv_data)

address_count = {}

for row in csv_data:
    Address = row[6]
    if Address in address_count.keys():
        print('{} is a duplicate Address'.format(Address))
        address_count[Address] = address_count[Address] + 1 
    else:
        print('{} is a unique Address'.format(Address))
        address_count[Address] = 1 

print address_count