我有一个熊猫数据框,其中的一列存储特定任务的名称,另一列报告执行该任务的员工的ID号。类似于:
EMPLOYEE_ID TASK_NAME
Employee1 Inspection
Employee2 Inspection
Employee3 Inspection
Employee4 Inspection
Employee5 Inspection
Employee1 Change
Employee2 Inspection
Employee3 Change
Employee1 Change
Employee2 Change
我想知道为了对执行的任务进行分组/分组的员工我必须执行哪种类型的命令/分析。换句话说,例如,“ Employee_Group_1”(包括Employee1,Employee2,Employee3)已经执行了所有“检查和更改”任务的75%。
任何帮助将不胜感激! 预先感谢。
答案 0 :(得分:1)
我认为需要map
通过将dictionary
展平为Series.value_counts
来平整d1
:
d = {'g1':['Employee1', 'Employee2', 'Employee3'],
'g2':['Employee4', 'Employee5', 'Employee6']}
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
print (d1)
{'Employee1': 'g1', 'Employee2': 'g1', 'Employee3': 'g1',
'Employee4': 'g2', 'Employee5': 'g2', 'Employee6': 'g2'}
s = df['EMPLOYEE_ID'].map(d1).value_counts(normalize=True)
print (s)
g1 0.8
g2 0.2
Name: EMPLOYEE_ID, dtype: float64
如果还想分析另一列,请使用SeriesGroupBy.value_counts
:
df2 = (df.groupby(df['EMPLOYEE_ID'].map(d1))['TASK_NAME']
.value_counts(normalize=True)
.reset_index(name='norm'))
print (df2)
EMPLOYEE_ID TASK_NAME norm
0 g1 Change 0.5
1 g1 Inspection 0.5
2 g2 Inspection 1.0
详细信息:
print (df['EMPLOYEE_ID'].map(d1))
0 g1
1 g1
2 g1
3 g2
4 g2
5 g1
6 g1
7 g1
8 g1
9 g1
Name: EMPLOYEE_ID, dtype: object