在Pandas中为条件赋值的最佳方法

时间:2017-06-21 09:01:39

标签: python pandas pandas-groupby

在子组中识别最大值并根据是否为max来分配每个行值的正确方法是什么? 这是一个例子df:

group            subgroup
  A                 1
  B                 1
  A                 2
  A                 3
  A                 4
  B                 2
  C                 2
  C                 1

规则是:

if subgroup = max then result = 1
else subgroup = 2

结果将是:

group            subgroup      result
  A                 1            2
  B                 1            2
  A                 2            2
  A                 3            2
  A                 4            1
  B                 2            1
  C                 2            1
  C                 1            2

我现在这样做了:

df['subgroup_max'] = df.groupby(['group'])['subgroup'].nunique()
df3['result'] = 2
df3.loc[df3['result'] == df3['subgroup_max'],'result'] = 1

似乎效率不高。但是有更好的方法吗?

2 个答案:

答案 0 :(得分:4)

您可以将DataFrameGroupBy.idxmax用于每组max值的索引:

df['result'] = 2
idx = df.groupby(['group'])['subgroup'].idxmax()
df.loc[idx, 'result'] = 1
print (df)
  group  subgroup  result
0     A         1       2
1     B         1       2
2     A         2       2
3     A         3       2
4     A         4       1
5     B         2       1
6     C         2       1
7     C         1       2

numpy.whereIndex.isin的另一种解决方案:

idx = df.groupby(['group'])['subgroup'].idxmax()
df['result'] = np.where(df.index.isin(idx), 1, 2)
print (df)
  group  subgroup  result
0     A         1       2
1     B         1       2
2     A         2       2
3     A         3       2
4     A         4       1
5     B         2       1
6     C         2       1
7     C         1       2
idx = df.groupby(['group'])['subgroup'].idxmax()
df['result'] = (~df.index.isin(idx)).astype(int) + 1
print (df)
  group  subgroup  result
0     A         1       2
1     B         1       2
2     A         2       2
3     A         3       2
4     A         4       1
5     B         2       1
6     C         2       1
7     C         1       2

但是,如果每个组有多个最大值并且需要为所有最大值分配值,请使用apply

print (df)
  group  subgroup
0     A         4
1     B         1
2     A         2
3     A         3
4     A         4
5     B         2
6     C         2
7     C         1

mask = df.groupby(['group'])['subgroup'].apply(lambda x: x == x.max())
df['result'] = np.where(mask, 1, 2)
print (df)
  group  subgroup  result
0     A         4       1
1     B         1       2
2     A         2       2
3     A         3       2
4     A         4       1
5     B         2       1
6     C         2       1
7     C         1       2

答案 1 :(得分:2)

您也可以使用lambda函数,它可以指定更多条件。

 df=pd.DataFrame({'group':['A','B','A','A','A','B','C','C'],'subgroup':[1,1,2,3,4,2,2,1]})

 group  subgroup
0     A         1
1     B         1
2     A         2
3     A         3
4     A         4
5     B         2
6     C         2
7     C         1



df['results']=df['subgroup'].apply( lambda x:1 if df['subgroup'].max()==x else 2)



group  subgroup  results
0     A         1        2
1     B         1        2
2     A         2        2
3     A         3        2
4     A         4        1
5     B         2        2
6     C         2        2
7     C         1        2