我有一个相同类别的数据集。我想比较同一类别的两个日期列
我想查看DATE1是否小于相同CATEGORY的DATE2中的值,并找到最早的DATE大于
我正在尝试这种方法,但是没有得到想要的结果
df['test'] = np.where(m['DATE1'] < df['DATE2'], Y, N)
CATEGORY DATE1 DATE2 GREATERTHAN GREATERDATE
0 23 2015-01-18 2015-01-15 Y 2015-01-10
1 11 2015-02-18 2015-02-19 N 0
2 23 2015-03-18 2015-01-10 Y 2015-01-10
3 11 2015-04-18 2015-08-18 Y 2015-02-19
4 23 2015-05-18 2015-02-21 Y 2015-01-10
5 11 2015-06-18 2015-08-18 Y 2015-02-19
6 15 2015-07-18 2015-02-18 0 0
答案 0 :(得分:1)
df['DATE1'] = pd.to_datetime(df['DATE1'])
df['DATE2'] = pd.to_datetime(df['DATE2'])
df['GREATERTHAN'] = np.where(df['DATE1'] > df['DATE2'], 'Y', 'N')
## Getting the earliest date for which data is available, per category
earliest_dates = df.groupby(['CATEGORY']).apply(lambda x: x['DATE1'].append(x['DATE2']).min()).to_frame()
## Merging to get the earliest date column per category
df.merge(earliest_dates, left_on = 'CATEGORY', right_on = earliest_dates.index, how = 'left')