我下面有data.csv
custId,contract,zone,teamcode,projectcode,time
2,2345,us_east,Red,A,5s
1,2345,us_west,Blue,B,1s
2,2346,eu_west,Yellow,C,2s
1,2345,us_west,Blue,D,1s
3,2346,eu_west,Yellow,E,2s
我不想在这里使用熊猫。
我是python的新手,我也不知道如何解决这个问题。我设法使用csv读取数据,但是我不知道下一步如何进行。
import csv
with open('data.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
print(row)
编辑:我需要找到每个合同的唯一custId的数量。
答案 0 :(得分:3)
看起来您需要collections.defaultdict
和set
。
例如:
import csv
from collections import defaultdict
unique_C = defaultdict(set)
with open(filename, 'rU') as f:
reader = csv.reader(f)
next(reader) #Skip header
for row in reader:
unique_C[row[1]].add(row[0])
print(unique_C)
输出:
defaultdict(<type 'set'>, {'2345': set(['1', '2']), '2346': set(['3', '2'])})
答案 1 :(得分:2)
我想您想计算每个合同中涉及多少客户。
如果是这种情况,那么您可以在不使用熊猫的情况下实现此目的
import csv
file = open('data.csv', 'r')
reader = csv.reader(f)
# We create a list of all unique contracts
contracts = set([row[1] for row in reader])
# We create an array that will contain how many customers in each contract
array = []
# For each contract
for contract in contracts:
# We initialize the number of customers
count = 0
# We loop through the lines
for row in reader:
row_contract = row[1]
# If we find a line containing the contract
if row_contract == contract:
# We increment the number of customers for the current contract
count += 1
array.append([contract, count])
输出:
[[2345, 3], [2346, 2]]