如何比较两个具有常见类别的日期栏?

时间:2018-09-22 22:08:16

标签: python pandas conditional-statements

我有一个相同类别的数据集。我想比较同一类别的两个日期列

我想查看DATE1是否小于相同CATEGORY的DATE2中的值,并找到最早的DATE大于

我正在尝试这种方法,但是没有得到想要的结果

df['test'] = np.where(m['DATE1'] < df['DATE2'], Y, N)

CATEGORY    DATE1           DATE2           GREATERTHAN      GREATERDATE
0   23          2015-01-18      2015-01-15      Y                2015-01-10
1   11          2015-02-18      2015-02-19      N                0
2   23          2015-03-18      2015-01-10      Y                2015-01-10
3   11          2015-04-18      2015-08-18      Y                2015-02-19
4   23          2015-05-18      2015-02-21      Y                2015-01-10
5   11          2015-06-18      2015-08-18      Y                2015-02-19
6   15          2015-07-18      2015-02-18      0                0

1 个答案:

答案 0 :(得分:1)

df['DATE1'] = pd.to_datetime(df['DATE1'])
df['DATE2'] = pd.to_datetime(df['DATE2'])

df['GREATERTHAN'] = np.where(df['DATE1'] > df['DATE2'], 'Y', 'N')

## Getting the earliest date for which data is available, per category
earliest_dates = df.groupby(['CATEGORY']).apply(lambda x: x['DATE1'].append(x['DATE2']).min()).to_frame()

## Merging to get the earliest date column per category
df.merge(earliest_dates, left_on = 'CATEGORY', right_on = earliest_dates.index, how = 'left')