熊猫有条件的Groupby分钟

时间:2018-03-28 20:01:22

标签: python pandas dataframe pandas-groupby

在下面的数据框中,我需要df['output']有条件地填写每个df['SubGroup']的日期。

If df['HardDate'] exists for df['SubGroup'], use df['HardDate']
elif minimum (earliest) BookDate where df['Values'] is not null.
然后可能df.apply函数来实现所需的输出。

| MainGroup | SubGroup | BookDate  |Values | HardDate   | **Output** |
|-----------|----------|-----------|-------|------------|------------|
| Group1    | SubG1    | 1/1/2000  | Null  | 10/10/2010 | 10/10/2010 |
| Group1    | SubG1    | 2/1/2000  | Null  | 10/10/2010 | 10/10/2010 |
| Group1    | SubG1    | 3/1/2000  | 350   | 10/10/2010 | 10/10/2010 |
| Group1    | SubG1    | 4/1/2000  | 400   | 10/10/2010 | 10/10/2010 |
| Group1    | DiffG2   | 9/1/2012  | 6000  | Null       | 9/1/2012   |
| Group1    | DiffG2   | 10/1/2012 | 7000  | Null       | 9/1/2012   |
| Group1    | DiffG2   | 11/1/2012 | 8000  | Null       | 9/1/2012   |
| Group1    | DiffG2   | 12/1/2012 | 9000  | Null       | 9/1/2012   |
| Group2    | AltG1    | 5/1/1999  | Null  | Null       | 6/1/1999   |
| Group2    | AltG1    | 6/1/1999  | 190   | Null       | 6/1/1999   |
| Group2    | AltG1    | 7/1/1999  | 290   | Null       | 6/1/1999   |
| Group2    | AltG1    | 8/1/1999  | 390   | Null       | 6/1/1999   |

我尝试了以下拉取最小日期,但不包含任何过滤器。
 df['BookDate'].iloc[df.groupby(by = ['MainGroup','SubGroup'])[‘Book_Date’].idxmin(axis=1)]

尝试添加df['BookDate'].loc[date slice]会导致错误

1 个答案:

答案 0 :(得分:0)

这就是我要做的:

  1. 分组依据并找到最短日期:
  2. grp_df = df.groupby('SubGroup')['BookDate'].min()

    1. 加入原来的df
    2. df = df.merge(grp_df, right_index = True, left_on = 'SubGroup', how='left')

      1. 填充输出列
      2. df['output'] = df[['HardDate','BookDate']].max(axis=1) #This will pick date over null