加速选择子集

时间:2018-05-03 21:54:08

标签: python pandas dataframe subset

我正在尝试选择pandas数据帧的子集并计算一些统计数据,但它的速度很慢,也许可以更快地完成?

column_name1_set = df.column_name1.unique()
column_name2_set = df.column_name2.unique()

for i, name1 in enumerate(column_name1_set):
    for name2 in column_name2_set:
        df_t = df[(df['column_name1']==int(name1)) & (df['column_name2']==name2)]
        s = df_t.sum(axis=0)
        s['amount_min'] = df_t['amount'].min()
        s['amount_max'] = df_t['amount'].max()
        s['amount_mean'] = df_t['amount'].mean()
        s['amount_median'] = df_t['amount'].median()

        #store s ...

1 个答案:

答案 0 :(得分:1)

您似乎需要使用groupby

agg
df.groupby(['column_name1','column_name2']).sum()['amount'].agg(['min','max','mean','median'])