Question

我的数据框中有很多列，其中一些包含价格，其余包含以下内容：

year_month   0_fx_price_gy 0_fx_volume_gy 1_fx_price_yuy 1_fx_volume_yuy
1990-01      2             10             3              30
1990-01      2             20             2              40
1990-02      2             30             3              50

我需要按year_month进行分组，并在价格栏和数量栏上求和。

是否有一种快速的方法可以在一个语句中做到这一点，例如如果列名包含价格，则平均值，如果包含数量，则求和吗？

df.groupby（'year_month'）。？

注意：这只是具有较少列的样本数据，但格式相似

输出

year_month   0_fx_price_gy 0_fx_volume_gy 1_fx_price_yuy 1_fx_volume_yuy
1990-01      2             30             2.5              70
1990-02      2             30             3                50

Answer 1

通过匹配的值创建字典，然后传递到DataFrameGroupBy.agg，如果输出列的顺序发生更改，请最后添加reindex：

d1 = dict.fromkeys(df.columns[df.columns.str.contains('price')], 'mean')
d2 = dict.fromkeys(df.columns[df.columns.str.contains('volume')], 'sum')

#merge dicts together
d = {**d1, **d2}
print (d)
{'0_fx_price_gy': 'mean', '1_fx_price_yuy': 'mean',
 '0_fx_volume_gy': 'sum', '1_fx_volume_yuy': 'sum'}

字典的另一种解决方案：

d = {}
for c in df.columns:
    if 'price' in c:
        d[c] = 'mean'
    if 'volume' in c:
        d[c] = 'sum'

如果仅price过滤掉df.columns[1:]和没有第一列的卷列，则解决方案应该简化：

d = {x:'mean' if 'price' in x else 'sum' for x in df.columns[1:]}

df1 = df.groupby('year_month', as_index=False).agg(d).reindex(columns=df.columns)
print (df1)
  year_month  0_fx_price_gy  0_fx_volume_gy  1_fx_price_yuy  1_fx_volume_yuy
0    1990-01              2              40               3               60
1    1990-02              2              20               3               30

如何将不同的聚合函数应用于pandas中的不同列？

1 个答案: