在子组中识别最大值并根据是否为max来分配每个行值的正确方法是什么? 这是一个例子df:
group subgroup
A 1
B 1
A 2
A 3
A 4
B 2
C 2
C 1
规则是:
if subgroup = max then result = 1
else subgroup = 2
结果将是:
group subgroup result
A 1 2
B 1 2
A 2 2
A 3 2
A 4 1
B 2 1
C 2 1
C 1 2
我现在这样做了:
df['subgroup_max'] = df.groupby(['group'])['subgroup'].nunique()
df3['result'] = 2
df3.loc[df3['result'] == df3['subgroup_max'],'result'] = 1
似乎效率不高。但是有更好的方法吗?
答案 0 :(得分:4)
您可以将DataFrameGroupBy.idxmax
用于每组max
值的索引:
df['result'] = 2
idx = df.groupby(['group'])['subgroup'].idxmax()
df.loc[idx, 'result'] = 1
print (df)
group subgroup result
0 A 1 2
1 B 1 2
2 A 2 2
3 A 3 2
4 A 4 1
5 B 2 1
6 C 2 1
7 C 1 2
numpy.where
和Index.isin
的另一种解决方案:
idx = df.groupby(['group'])['subgroup'].idxmax()
df['result'] = np.where(df.index.isin(idx), 1, 2)
print (df)
group subgroup result
0 A 1 2
1 B 1 2
2 A 2 2
3 A 3 2
4 A 4 1
5 B 2 1
6 C 2 1
7 C 1 2
idx = df.groupby(['group'])['subgroup'].idxmax()
df['result'] = (~df.index.isin(idx)).astype(int) + 1
print (df)
group subgroup result
0 A 1 2
1 B 1 2
2 A 2 2
3 A 3 2
4 A 4 1
5 B 2 1
6 C 2 1
7 C 1 2
但是,如果每个组有多个最大值并且需要为所有最大值分配值,请使用apply
:
print (df)
group subgroup
0 A 4
1 B 1
2 A 2
3 A 3
4 A 4
5 B 2
6 C 2
7 C 1
mask = df.groupby(['group'])['subgroup'].apply(lambda x: x == x.max())
df['result'] = np.where(mask, 1, 2)
print (df)
group subgroup result
0 A 4 1
1 B 1 2
2 A 2 2
3 A 3 2
4 A 4 1
5 B 2 1
6 C 2 1
7 C 1 2
答案 1 :(得分:2)
您也可以使用lambda函数,它可以指定更多条件。
df=pd.DataFrame({'group':['A','B','A','A','A','B','C','C'],'subgroup':[1,1,2,3,4,2,2,1]})
group subgroup
0 A 1
1 B 1
2 A 2
3 A 3
4 A 4
5 B 2
6 C 2
7 C 1
df['results']=df['subgroup'].apply( lambda x:1 if df['subgroup'].max()==x else 2)
group subgroup results
0 A 1 2
1 B 1 2
2 A 2 2
3 A 3 2
4 A 4 1
5 B 2 2
6 C 2 2
7 C 1 2