我有一个用以下代码创建的数据透视表:
AdminPivot = pd.pivot_table(admindata, index=['Function Name', 'Manager'], values=['Paid Hours'])
+---------------+-----------+------------+
| Function Name | Manager | Paid Hours |
+---------------+-----------+------------+
| Function 1 | Manager 1 | 0.21 |
| Function 2 | Manager 2 | 0.73 |
| Function 3 | Manager 1 | 2.335 |
| | Manager 3 | 0.51 |
| | Manager 4 | 1.4 |
| | Manager 5 | 0.796 |
| | Manager 6 | 0.48 |
| | Manager 7 | 12 |
| Function 4 | Manager2 | 0.15 |
| Function 6 | Manager 1 | 0.87 |
| | Manager 3 | 0.31 |
+---------------+-----------+------------+
我想将每个职能的一部分经理的带薪小时总数加起来。就是说我有兴趣得到这个:
Sum of Function 1 Total Paid Hours if managers is (Manager 5, 6, 7)
Sum of Function 2 Total Paid Hours if managers in (Manager 2, 6, 7)
Sum of Function 3 Total Paid Hours if managers in (Manager 1, 3, 6, 7)
我可以轻松地为关键点建立索引,以获取任何特定经理的价值:
AdminPivot.loc[('Function 1', 'Manager 1'), 'Paid Hours']
随后,可以重复这些值,并使用if语句对管理器进行本质上的硬编码。但是,必须有一种更优雅的方法。
具体来说,如何遍历给定的条件:
我正在尝试找到一种遍历此方法的好方法,而无需为每个函数和每个管理器重新创建loc
语句并添加它们-使用if
语句来确定它是否存在。任何帮助表示赞赏!
答案 0 :(得分:1)
获取所有必需组合的列表,使您可以创建所有ID:
l = [('Function 1', ['Manager 5', 'Manager 6', 'Manager 7']),
('Function 2', ['Manager 2', 'Manager 6', 'Manager 7']),
('Function 3', ['Manager 1', 'Manager 3', 'Manager 6', 'Manager 7'])]
ids = [(x, z) for x,y in l for z in y]
然后您可以.reindex
+ .sum
df.reindex(ids).sum(level=0)
Paid Hours
Function Name
Function 1 0.000
Function 2 0.730
Function 3 15.325
以下是.reindex
的输出,仅包含所需的行,并在没有数据的地方填充NaN
,随后在.sum
Paid Hours
Function Name Manager
Function 1 Manager 5 NaN
Manager 6 NaN
Manager 7 NaN
Function 2 Manager 2 0.730
Manager 6 NaN
Manager 7 NaN
Function 3 Manager 1 2.335
Manager 3 0.510
Manager 6 0.480
Manager 7 12.000