我正在尝试选择pandas数据帧的子集并计算一些统计数据,但它的速度很慢,也许可以更快地完成?
column_name1_set = df.column_name1.unique()
column_name2_set = df.column_name2.unique()
for i, name1 in enumerate(column_name1_set):
for name2 in column_name2_set:
df_t = df[(df['column_name1']==int(name1)) & (df['column_name2']==name2)]
s = df_t.sum(axis=0)
s['amount_min'] = df_t['amount'].min()
s['amount_max'] = df_t['amount'].max()
s['amount_mean'] = df_t['amount'].mean()
s['amount_median'] = df_t['amount'].median()
#store s ...
答案 0 :(得分:1)
您似乎需要使用groupby
agg
df.groupby(['column_name1','column_name2']).sum()['amount'].agg(['min','max','mean','median'])