在pandas

时间:2017-06-21 07:30:19

标签: python pandas pandas-groupby

是否有更简单/更正确的方式来分配动态组? 让我们关注df:

group    days(int, >0)
  A        1
  B        12
  A        14
  A        16
  A        19
  B        23
  C        92
  C        12

我想根据以下规则分配子组:

if days >20 then subgroup = 4
if days <= 20 then subgroup = 3
if days <= 10 then subgroup = 2
if days == 0 then subgroup = 1

以下是我现在的表现:

df['subgroup'] = 4
df.loc[df['days'] >20,'subgroup'] = 4
df.loc[df['days'] <=20,'subgroup'] = 3
df.loc[df['days'] <=10,'subgroup'] = 2
df.loc[df['days'] ==0,'subgroup'] = 1
df = df.reset_index()
df['dynamic_subgroup'] = df.groupby(['group'])['subgroup'].rank(method='dense')

结果表就是这个:

group    days(int, >0)     dynamic_subgroup
  A        1                    1
  B        12                   1
  A        14                   2
  A        16                   3
  A        19                   4
  B        23                   2
  C        92                   2
  C        12                   1

我想知道是否有更容易/更好的方法在熊猫中获得相同的结果?通常,对代码的任何更正都会受到赞赏。

2 个答案:

答案 0 :(得分:3)

您可以使用cut进行分箱:

bins = [-1, 0, 10, 20, np.inf]
labels=[1,2,3,4]
df['subgroup'] = pd.cut(df['days'], bins=bins, labels=labels)
print (df)
  group  days subgroup
0     A     1        2
1     B    12        3
2     A    14        3
3     A    16        3
4     A    19        3
5     B    23        4
6     C    92        4
7     C    12        3

答案 1 :(得分:2)

使用searchsorted

df.assign(subgroup=np.searchsorted([0, 10, 20], df.days.values) + 1)

  group  days  subgroup
0     A     1         2
1     B    12         3
2     A    14         3
3     A    16         3
4     A    19         3
5     B    23         4
6     C    92         4
7     C    12         3