我正在努力调整pivot_table与groupby的性能
一方面我有:
%time pd.pivot_table(df, index='INDEX', columns='COLUMN', values='VALUE', aggfunc=[len, np.sum], fill_value=0)
CPU times: user 1min 51s, sys: 1.57 s, total: 1min 53s
Wall time: 1min 54s
另一方面,我得到:
In [97]: df["GN"] = df.groupby(["A","B"]).grouper.group_info[0]
In [98]: df["G"] = "G" + (df["GN"] + 1).astype(str)
In [99]: df
Out[99]:
A B C D GN G
0 foo one -1.245506 0.307395 3 G4
1 bar one 0.072989 -0.402182 0 G1
2 foo two 0.399269 0.794413 5 G6
3 bar three 0.475859 -0.685398 1 G2
4 foo two -0.463065 -0.222632 5 G6
5 bar two 0.696606 -0.999691 2 G3
6 foo one -1.211876 -0.368574 3 G4
7 foo three -0.936385 -1.067160 4 G5
这些基本上是相同的东西,但我得到60倍的性能差异。那是为什么?