Question

我有一个这样的数据框：

    col1 col2
0    a   100
1    a   200
2    a   150
3    b   1000
4    c   400
5    c   200

我想要做的是按col1分组并计算出现的次数，如果count等于或大于2，则计算这些行的col2平均值，如果不应用其他函数。输出应为：

    col1 mean
0    a   150
1    b   whatever aggregator function returns 
2    c   300

我在这里pandas groupby count and then conditional mean遵循了@ansev解决方案，但是我不想将其替换为NaN，实际上是想将其替换为从另一个函数返回的值，如下所示：

def aggregator(col1, col2):
    return col1+col2

请记住，实际的聚合函数更加复杂，并且依赖于其他表，这只是为了简化问题。

Answer 1

我不确定这是您所需要的，但是您可以解决apply：

def aggregator(x):
    if len(x)==1:
        return pd.Series( (x['col1'] + x['col2'].astype(str)).values)
    else: return pd.Series(x['col2'].mean())

df.groupby('col1').apply(aggregator)

输出：

          0
col1       
a       150
b     b1000
c       300

熊猫groupby在其他地方

1 个答案: