我想采用下面的df,通过'USER','TASK'和'STATIC_VALUE'将唯一值组合在一起。我可以使用groupby()执行此操作,但是我在添加'TASK_COUNT'和'TOTALS'列时遇到了问题。 'TOTALS'列将乘以'STATIC_VALUE'*'TASK_COUNT'。我已尝试过groupby(),transform(),size()的多种变体,但我无法实现。建议?谢谢!
数据帧:
USER TASK STATIC_VALUE
1 USER1 TASK2 30
2 USER2 TASK7 12
3 USER5 TASK4 9
4 USER12 TASK2 30
5 USER2 TASK3 10
6 USER1 TASK2 30
7 USER5 TASK7 12
8 USER1 TASK3 10
9 USER2 TASK3 10
这篇文章让我很接近:
>>> df.groupby(['USER','TASK','STATIC_VALUE']).size()
USER TASK STATIC_VALUE
USER1 TASK2 30 2
TASK3 10 1
USER2 TASK7 12 1
TASK3 10 2
USER5 TASK4 9 1
TASK7 12 1
USER12 TASK2 30 1
预期结果:
USER TASK STATIC_VALUE TASK_COUNT TOTAL
USER1 TASK2 30 2 60
TASK3 10 1 10
USER2 TASK7 12 1 12
TASK3 10 2 20
USER5 TASK4 9 1 9
TASK7 12 1 12
USER12 TASK2 30 1 30
答案 0 :(得分:2)
使用GroupBy.size
:
df1 = df.groupby(['USER','TASK', 'STATIC_VALUE']).size().reset_index(name='TASK_COUNT')
df1['TOTAL'] = df1['TASK_COUNT'] * df1['STATIC_VALUE']
print (df1)
USER TASK STATIC_VALUE TASK_COUNT TOTAL
0 USER1 TASK2 30 2 60
1 USER1 TASK3 10 1 10
2 USER12 TASK2 30 1 30
3 USER2 TASK3 10 2 20
4 USER2 TASK7 12 1 12
5 USER5 TASK4 9 1 9
6 USER5 TASK7 12 1 12