对于一列中的唯一值,获取另一列中的唯一值总数

时间:2020-01-29 06:53:22

标签: python pyodbc

我有两个pyodbc行对象,如下所示:

('Emp1', 'Absent')

('Emp1', 'Absent')

('Emp1', 'Present')

('Emp2', 'Present')

('Emp2', 'Present')

('Emp2', 'Absent')

('Emp2', 'Present')

('Emp2', 'Absent')

我想计算像这样的每个唯一员工的“在职”和“缺席”数目:

Emp1: Absent= 2, Present= 1

Emp2: Absent = 2, Present = 3

我尝试过:

new = []
for row in cursor.fetchall():
    if row[0] not in new:
    new.append(row[0])
for x in new:
    print(x, row[1].count("Present"))
    print(x, row[1].count("Absent"))

但是它返回了000000的行

预先感谢您的帮助。

1 个答案:

答案 0 :(得分:0)

应该是这样的:

import collections
import itertools

data = [
    ('Emp1', 'Absent'),
    ('Emp1', 'Absent'),
    ('Emp1', 'Present'),
    ('Emp2', 'Present'),
    ('Emp2', 'Present'),
    ('Emp2', 'Absent'),
    ('Emp2', 'Present'),
    ('Emp2', 'Absent'),
]
sorted_data = sorted(data, key = lambda x: (x[0], x[1])) # sort our data
employees = collections.defaultdict(dict)
# group by employee
for employee, employee_group in itertools.groupby(sorted_data, lambda item: item[0]):
    # group by category
    for category, category_group in itertools.groupby(employee_group, lambda item: item[1]):
        employees[employee][category] = sum(1 for _ in category_group)

print('employees', employees) # employees defaultdict(<class 'dict'>, {'Emp1': {'Absent': 2, 'Present': 1}, 'Emp2': {'Absent': 2, 'Present': 3}})