使用Groupby多列熊猫获取正值和负值计数

时间:2020-01-29 21:35:21

标签: python-3.x pandas count pandas-groupby

我有一个熊猫DataFrame,看起来像:

Person    Year    Weight Lost/Gained
Joe       2015          -5.7
Bryan     2015           7.8
Kelly     2015          -16.2
Frank     2016           10.3
Bill      2016          -22.1

我想按年份获取负值和正值的计数,并获取正值和负值的平均值。结果可能在新的数据框中,也可能在相同的数据框中。如果它在同一个中,我希望结果看起来像这样:

Person    Year    Weight Lost/Gained    Pos Count    Neg Count      Pos Avg.     Neg Avg.
Joe       2015          -5.7                1           2             7.8         -10.95
Bryan     2015           7.8                1           2             7.8         -10.95
Kelly     2015          -16.2               1           2             7.8         -10.95
Frank     2016           10.3               1           1            10.3         -22.1
Bill      2016          -22.1               1           1            10.3         -22.1

我可以找到并尝试实现的最接近的答案可以在这里找到: How to sum negative and positive values separately when using groupby in pandas?

但是,我真的不想重新排列整个数据框,因为我的实际数据框要大得多。

2 个答案:

答案 0 :(得分:3)

这是一种方法:

# custom function
def func(f):
    pos = f['WeightLost'].gt(0)
    neg = f['WeightLost'].lt(0)
    pos_avg = f.loc[pos,'WeightLost'].mean()
    neg_avg = f.loc[neg,'WeightLost'].mean()
    return pd.Series([pos.sum(), neg.sum(), pos_avg, neg_avg], index=['Pos Count', 'Neg Count','Pos Avg','Neg Avg'])

f = df.groupby('Year').apply(func).reset_index()

print(f)

  Year  Pos Count  Neg Count  Pos Avg  Neg Avg
0  2015        1.0        2.0      7.8   -10.95
1  2016        1.0        1.0     10.3   -22.10

答案 1 :(得分:1)

由于您想要原始的df,因此我们可以利用map。

def map_year_stats(df):

    col = 'Weight_Lost/Gained'


    rule_pos = df[col] > 0

    rule_neg = df[col] < 0

    pos_count = df[rule_pos].groupby('Year')[col].count()
    neg_count = df[rule_neg].groupby('Year')[col].count()

    pos_avg = df[rule_pos].groupby('Year')[col].mean()
    neg_avg = df[rule_neg].groupby('Year')[col].mean()

    df['pos_count'] = df['Year'].map(pos_count)
    df['neg_count'] = df['Year'].map(neg_count)
    df['pos_avg'] = df['Year'].map(pos_avg)
    df['neg_avg'] = df['Year'].map(neg_avg)
    return df

df_new = map_year_stats(df)

  Person  Year  Weight_Lost/Gained  pos_count  neg_count  pos_avg  neg_avg
0    Joe  2015                -5.7          1          2      7.8   -10.95
1  Bryan  2015                 7.8          1          2      7.8   -10.95
2  Kelly  2015               -16.2          1          2      7.8   -10.95
3  Frank  2016                10.3          1          1     10.3   -22.10
4   Bill  2016               -22.1          1          1     10.3   -22.10