我在DataFrame中有一组具有重复索引的值:
value
CDE 2.318620
CDE -3.097715
LXU -3.791043
LXU 4.818995
SWN 3.059964
SWN -4.349304
OAS -3.336539
LPI -3.037097
LPI -5.701044
LPI -3.519923
CZR -3.270018
CZR -3.056712
所需的结果是仅保留最高的绝对值,并在新列中返回平均值:
value average
CDE -3.097715 -0.389547
LXU 4.818995 0.513976
SWN -4.349304 -0.644670
OAS -3.336539 -3.336539
LPI -5.701044 -4.086021
CZR -3.270018 -3.163365
我尝试将.lamly应用于重复的行,但出现“轴”错误:
max_absolute = lambda x: max(x.min(), x.max(), key=abs)
df_duplicate_absmax = df.groupby(df.index).apply(max_absolute, axis=1)
ps:修改Abhi的解决方案以与NaN一起使用:
df1 = df.groupby(df.index)['value'].agg([lambda x: max(x[~np.isnan(x)], key=abs), 'mean'])
答案 0 :(得分:2)
使用:
df1 = df.groupby(df.index)['value'].agg([lambda x: max(x,key=abs), 'mean'])
df1.columns = ['value', 'average']
print (df1)
value average
CDE -3.097715 -0.389547
CZR -3.270018 -3.163365
LPI -5.701044 -4.086021
LXU 4.818995 0.513976
OAS -3.336539 -3.336539
SWN -4.349304 -0.644670
答案 1 :(得分:1)
这是使用groupby
+ agg
的两个函数的解决方案,一个函数通过绝对值计算最大值,另一个函数计算均值:
def max_abs(x):
return x.iloc[x.abs().values.argmax()]
res = df.groupby(level=0).agg([max_abs, 'mean'])\
.xs('value', axis=1, drop_level=True)
print(res)
max_abs mean
CDE -3.097715 -0.389547
CZR -3.270018 -3.163365
LPI -5.701044 -4.086021
LXU 4.818995 0.513976
OAS -3.336539 -3.336539
SWN -4.349304 -0.644670
答案 2 :(得分:1)
from io import StringIO
import pandas as pd
df = pd.read_fwf(StringIO("""
cod value
CDE 2.318620
CDE -3.097715
LXU -3.791043
LXU 4.818995
SWN 3.059964
SWN -4.349304
OAS -3.336539
LPI -3.037097
LPI -5.701044
LPI -3.519923
CZR -3.270018
CZR -3.056712
"""), header=1, Index=None)
# Create a new column with absoulte value
df['abs_value'] = df['value'].abs()
# Calulate the mean in new data farame, grouped by code using
# pandas groupped aggregation naming the column average
df_avg = df.groupby("cod").value.agg([('average', 'mean')])
# Choose the row within group with largest abs value
df_abs = df.sort_values("abs_value").groupby("cod").tail(1)[["cod", "value"]]
# Join the average and the max
df_abs.join(df_avg, on="cod")
结果:
cod value average
1 CDE -3.097715 -0.389547
10 CZR -3.270018 -3.163365
6 OAS -3.336539 -3.336539
5 SWN -4.349304 -0.644670
3 LXU 4.818995 0.513976
8 LPI -5.701044 -4.086021