如何比较两个熊猫组的输出功能

时间:2018-04-06 05:19:18

标签: python pandas dataframe

我有两个df' s是我在执行2个数据帧组后创建的。我需要加入这两个摘要输出,以检查声称的小时数是否有差异。

df1 = pd.DataFrame({"Week": ["3/30/2018", "3/30/2018", "3/30/2018", "3/23/2018",
                             "3/23/2018","3/16/2018", "3/16/2018", "3/9/2018",
                             "3/9/2018"],
                    "Empl": ["Sam", "John", "Mike", "Sam", "Mike","Sam",
                             "John", "Mike", "Sam"],
                    "Hrs": [11, 12, 2, 13, 5, 14, 15, 16, 7]})
df2 = pd.DataFrame({"Week": ["3/30/2018", "3/30/2018", "3/30/2018", "3/23/2018",
                             "3/16/2018", "3/16/2018", "3/9/2018", "3/9/2018"],
                    "Empl": ["Sam", "John", "Mike", "Sam", "Mike","Sam",  "Mike", "Sam"],
                    "Hrs": [16, 12, 2, 13, 5, 15, 21, 7]})

gdF1 = df1.groupby(["Week","Empl"])["Hrs"].sum()
gdF2 = df2.groupby(["Week","Empl"])["Hrs"].sum()

# need to join gdF1 and gDF2 on Week and Empl for further comparison.

1 个答案:

答案 0 :(得分:0)

在groupby中添加as_index=False,然后您可以将它们加入进行进一步分析:

gdF1 = df1.groupby(["Week","Empl"],as_index=False)["Hrs"].sum()
gdF2 = df2.groupby(["Week","Empl"],as_index=False)["Hrs"].sum()
or
gdF1 = df1.groupby(["Week","Empl"])["Hrs"].sum().reset_index(drop=True)
gdF2 = df2.groupby(["Week","Empl"])["Hrs"].sum().reset_index(drop=True)
In [6]: gdF1
Out[6]: 
        Week  Empl  Hrs
0  3/16/2018  John   15
1  3/16/2018   Sam   14
2  3/23/2018  Mike    5
3  3/23/2018   Sam   13
4  3/30/2018  John   12
5  3/30/2018  Mike    2
6  3/30/2018   Sam   11
7   3/9/2018  Mike   16
8   3/9/2018   Sam    7

In [7]: gdF2
Out[7]: 
        Week  Empl  Hrs
0  3/16/2018  Mike    5
1  3/16/2018   Sam   15
2  3/23/2018   Sam   13
3  3/30/2018  John   12
4  3/30/2018  Mike    2
5  3/30/2018   Sam   16
6   3/9/2018  Mike   21
7   3/9/2018   Sam    7