优化groupby agg函数以返回多个结果列

时间:2020-04-27 16:03:42

标签: python pandas dataframe aggregate pandas-groupby

我有这个数据框;

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Client':np.random.choice(['Customer_A', 'Customer_B'], 1000),
    'Product':np.random.choice( ['Guns', 'Ammo', 'Armour'], 1000),
    'Value':(np.random.randn(1000))
})
Categoricals = ['Client', 'Product']
df[Categoricals] = df[Categoricals].astype('category')
df = df.drop_duplicates()
df

我想要这个结果;

# Non-anonymous function for Anomaly limit
def Anomaly (x):
    Q3 = np.nanpercentile(x, q = 75)
    Q1 = np.nanpercentile(x, q = 25)
    IQR = (Q3 - Q1)
    return (Q3 + (IQR * 2.0))

# Non-anonymous function for CriticalAnomaly limit
def CriticalAnomaly (x):
    Q3 = np.nanpercentile(x, q = 75)
    Q1 = np.nanpercentile(x, q = 25)
    IQR = (Q3 - Q1)
    return (Q3 + (IQR * 3.0))


# Define metrics
Metrics = {'Value':['count', Anomaly, CriticalAnomaly]}

# Groupby has more than 1 grouping column, so agg can only accept non-anonymous functions
Limits = df.groupby(['Client', 'Product']).agg(Metrics)
Limits

但是在大​​型数据集上速度较慢,因为函数“ Anomaly”和“ CriticalAnomaly”必须重新计算Q1,Q3和IQR两次,而不是一次。通过将两个功能组合在一起,可以使其更快。但是结果输出到1列而不是2列。

# Combined anomaly functions
def CombinedAnom (x):
    Q3 = np.nanpercentile(x, q = 75)
    Q1 = np.nanpercentile(x, q = 25)
    IQR = (Q3 - Q1)
    Anomaly = (Q3 + (IQR * 2.0))
    CriticalAnomaly = (Q3 + (IQR * 3.0))
    return (Anomaly, CriticalAnomaly)

# Define metrics
Metrics = {'Value':['count', CombinedAnom]}

# Groupby has more than 1 grouping column, so agg can only accept non-anonymous functions
Limits = df.groupby(['Client', 'Product']).agg(Metrics)
Limits

如何创建组合函数,以便将结果分成两列?

1 个答案:

答案 0 :(得分:0)

如果您使用apply而不是agg,则可以返回一个Series,将其分解成列:

def f(g):
    return pd.Series({
        'c1': np.sum(g.b),
        'c2': np.prod(g.b)
    })

df = pd.DataFrame({'a': list('aabbcc'), 'b': [1,2,3,4,5,6]})
df.groupby('a').apply(f)

来源:

   a  b
0  a  1
1  a  2
2  b  3
3  b  4
4  c  5
5  c  6

   c1  c2
a        
a   3   2
b   7  12
c  11  30
相关问题