获取每个合同的唯一custId的数量

时间:2019-07-09 08:56:55

标签: python

我下面有data.csv

custId,contract,zone,teamcode,projectcode,time
2,2345,us_east,Red,A,5s
1,2345,us_west,Blue,B,1s
2,2346,eu_west,Yellow,C,2s
1,2345,us_west,Blue,D,1s
3,2346,eu_west,Yellow,E,2s

我不想在这里使用熊猫。

我是python的新手,我也不知道如何解决这个问题。我设法使用csv读取数据,但是我不知道下一步如何进行。

import csv
with open('data.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)

编辑:我需要找到每个合同的唯一custId的数量。

2 个答案:

答案 0 :(得分:3)

看起来您需要collections.defaultdictset

例如:

import csv
from collections import defaultdict
unique_C = defaultdict(set)
with open(filename, 'rU') as f:
    reader = csv.reader(f)
    next(reader)   #Skip header
    for row in reader:
        unique_C[row[1]].add(row[0])
print(unique_C)

输出:

defaultdict(<type 'set'>, {'2345': set(['1', '2']), '2346': set(['3', '2'])})

答案 1 :(得分:2)

我想您想计算每个合同中涉及多少客户。

如果是这种情况,那么您可以在不使用熊猫的情况下实现此目的

import csv

file = open('data.csv', 'r')
reader = csv.reader(f)

# We create a list of all unique contracts
contracts = set([row[1] for row in reader])

# We create an array that will contain how many customers in each contract
array = []

# For each contract
for contract in contracts:

    # We initialize the number of customers
    count = 0

    # We loop through the lines
    for row in reader:

        row_contract = row[1]

        # If we find a line containing the contract
        if row_contract  == contract:

            # We increment the number of customers for the current contract
            count += 1

    array.append([contract, count])

输出:

[[2345, 3], [2346, 2]]