在下面的数据框中,我需要df['output']
有条件地填写每个df['SubGroup']
的日期。
If df['HardDate'] exists for df['SubGroup'], use df['HardDate']
elif minimum (earliest) BookDate where df['Values'] is not null.
然后可能df.apply函数来实现所需的输出。
| MainGroup | SubGroup | BookDate |Values | HardDate | **Output** |
|-----------|----------|-----------|-------|------------|------------|
| Group1 | SubG1 | 1/1/2000 | Null | 10/10/2010 | 10/10/2010 |
| Group1 | SubG1 | 2/1/2000 | Null | 10/10/2010 | 10/10/2010 |
| Group1 | SubG1 | 3/1/2000 | 350 | 10/10/2010 | 10/10/2010 |
| Group1 | SubG1 | 4/1/2000 | 400 | 10/10/2010 | 10/10/2010 |
| Group1 | DiffG2 | 9/1/2012 | 6000 | Null | 9/1/2012 |
| Group1 | DiffG2 | 10/1/2012 | 7000 | Null | 9/1/2012 |
| Group1 | DiffG2 | 11/1/2012 | 8000 | Null | 9/1/2012 |
| Group1 | DiffG2 | 12/1/2012 | 9000 | Null | 9/1/2012 |
| Group2 | AltG1 | 5/1/1999 | Null | Null | 6/1/1999 |
| Group2 | AltG1 | 6/1/1999 | 190 | Null | 6/1/1999 |
| Group2 | AltG1 | 7/1/1999 | 290 | Null | 6/1/1999 |
| Group2 | AltG1 | 8/1/1999 | 390 | Null | 6/1/1999 |
我尝试了以下拉取最小日期,但不包含任何过滤器。
df['BookDate'].iloc[df.groupby(by = ['MainGroup','SubGroup'])[‘Book_Date’].idxmin(axis=1)]
尝试添加df['BookDate'].loc[date slice]
会导致错误
答案 0 :(得分:0)
这就是我要做的:
grp_df = df.groupby('SubGroup')['BookDate'].min()
df = df.merge(grp_df, right_index = True, left_on = 'SubGroup', how='left')
df['output'] = df[['HardDate','BookDate']].max(axis=1) #This will pick date over null