我有一个熊猫DataFrame,看起来像:
Person Year Weight Lost/Gained
Joe 2015 -5.7
Bryan 2015 7.8
Kelly 2015 -16.2
Frank 2016 10.3
Bill 2016 -22.1
我想按年份获取负值和正值的计数,并获取正值和负值的平均值。结果可能在新的数据框中,也可能在相同的数据框中。如果它在同一个中,我希望结果看起来像这样:
Person Year Weight Lost/Gained Pos Count Neg Count Pos Avg. Neg Avg.
Joe 2015 -5.7 1 2 7.8 -10.95
Bryan 2015 7.8 1 2 7.8 -10.95
Kelly 2015 -16.2 1 2 7.8 -10.95
Frank 2016 10.3 1 1 10.3 -22.1
Bill 2016 -22.1 1 1 10.3 -22.1
我可以找到并尝试实现的最接近的答案可以在这里找到: How to sum negative and positive values separately when using groupby in pandas?
但是,我真的不想重新排列整个数据框,因为我的实际数据框要大得多。
答案 0 :(得分:3)
这是一种方法:
# custom function
def func(f):
pos = f['WeightLost'].gt(0)
neg = f['WeightLost'].lt(0)
pos_avg = f.loc[pos,'WeightLost'].mean()
neg_avg = f.loc[neg,'WeightLost'].mean()
return pd.Series([pos.sum(), neg.sum(), pos_avg, neg_avg], index=['Pos Count', 'Neg Count','Pos Avg','Neg Avg'])
f = df.groupby('Year').apply(func).reset_index()
print(f)
Year Pos Count Neg Count Pos Avg Neg Avg
0 2015 1.0 2.0 7.8 -10.95
1 2016 1.0 1.0 10.3 -22.10
答案 1 :(得分:1)
由于您想要原始的df,因此我们可以利用map。
def map_year_stats(df):
col = 'Weight_Lost/Gained'
rule_pos = df[col] > 0
rule_neg = df[col] < 0
pos_count = df[rule_pos].groupby('Year')[col].count()
neg_count = df[rule_neg].groupby('Year')[col].count()
pos_avg = df[rule_pos].groupby('Year')[col].mean()
neg_avg = df[rule_neg].groupby('Year')[col].mean()
df['pos_count'] = df['Year'].map(pos_count)
df['neg_count'] = df['Year'].map(neg_count)
df['pos_avg'] = df['Year'].map(pos_avg)
df['neg_avg'] = df['Year'].map(neg_avg)
return df
df_new = map_year_stats(df)
Person Year Weight_Lost/Gained pos_count neg_count pos_avg neg_avg
0 Joe 2015 -5.7 1 2 7.8 -10.95
1 Bryan 2015 7.8 1 2 7.8 -10.95
2 Kelly 2015 -16.2 1 2 7.8 -10.95
3 Frank 2016 10.3 1 1 10.3 -22.10
4 Bill 2016 -22.1 1 1 10.3 -22.10