Question

我在Pandas上按季度运行了一个非常简单的聚合，并且出于好奇而对结果进行了测试。

    dfQtr = df.groupby([pd.TimeGrouper(key= 'Date', freq='Q'),'JourneyType','OriginCode','DestinationCode']).agg(np.sum).reset_index()

    print sum(dfQtr.TotalFlights) , sum(df.TotalFlights)              
                       941899              967205

@IanS我很抱歉，这是相当大的数据集的子集

Date            JourneyType             OriginCode            DestinationCode Total_Flights
01/08/2015  T_A-M-R-A-S_M_R_M_S D_P         FLL                     SDQ                 1
01/08/2015  T_A-M-R-A-S_M_R_M_S D_P         PAP                     FLL                 1
01/08/2015  T_A-M-R-A-S_M_R_M_S D_P         TPA                     BDL                 1
01/08/2015  T_A-M-R-A-S_M_R_M_S D_P         HPN                     MCO                 1
01/08/2015  T_A-L-O-C-G_L_P_D_S D_P         FLL                     PAP                 1
01/08/2015  T_A-L-O-C-G_L_P_D_S D_P         FLL                     PAP                 1
01/08/2015  T_A-L-O-C-G_L_P_D_S D_P         FLL                     PIT                 1

结果表明之前有一个不同的＆amp;聚合后，我想知道为什么会这样？

非常感谢！将

Answer 1

“GroupBy中的NA组被自动排除”

http://pandas.pydata.org/pandas-docs/stable/missing_data.html#na-values-in-groupby

我猜你某处缺少某些值。

熊猫：分组后的差异

1 个答案: