基于多个DateTime比较创建组

时间:2019-06-04 18:20:50

标签: python pandas python-datetime

我正在尝试创建一个基于列,并使用基于一个日期列与其他三个日期列的比较值进行填充。

DataFrame df的示例如下所示。显示的所有日期都已转换为pd.to_datetime,由于个人没有进步,因此产生了多个NaT

    1st_date     2nd_date        3rd_date     action_date
    2015-10-05   NaT             NaT          2015-12-03 
    2015-02-27   2015-03-14      2015-03-15   2015-04-08 
    2015-03-07   2015-03-27      2015-03-28   2015-03-27 
    2015-01-05   2015-01-20      2015-01-21   2015-05-20 
    2015-01-05   2015-01-20      2015-01-21   2015-09-16 
    2015-05-23   2015-06-18      2015-06-19   2015-07-01 
    2015-03-03   NaT             NaT          2015-07-23 
    2015-03-03   NaT             NaT          2015-11-14 
    2015-06-05   2015-06-19      2015-06-20   2015-10-24 
    2015-10-08   2015-10-21      2015-10-22   2015-12-22 

我正在尝试创建第五列,其中包含action_date列与前三个日期列1st_date, 2nd_date, 3rd_date的比较结果(或组)。

我正在尝试在第五列填充名为action_group的字符串,该字符串将每个日期分配给一个组。

潜在功能(和预期输出)的伪代码为:if action_date > 1st_date and < 2nd_date then action_group = '1st_action_group'

action_date2nd_date3rd_date需要相同的比较,这将导致2nd_action_group列中的输出action_group

最后,如果action_date大于3rd_date,将为action_group分配一个值3rd_action_group

预期输出的示例如下所示。

1st_date     2nd_date        3rd_date     action_date  action_group
2015-10-05   NaT             NaT          2015-12-03   1st_action_group
2015-02-27   2015-03-14      2015-03-15   2015-04-08   3rd_action_group
2015-03-07   2015-03-27      2015-03-28   2015-03-27   2nd_action_group
2015-01-05   2015-01-20      2015-01-21   2015-05-20   3rd_action_group
2015-01-05   2015-01-20      2015-01-21   2015-09-16   3rd_action_group
2015-05-23   2015-06-18      2015-06-19   2015-07-01   3rd_action_group
2015-03-03   NaT             NaT          2015-07-23   1st_action_group
2015-03-03   NaT             NaT          2015-11-14   1st_action_group
2015-06-05   2015-06-19      2015-06-20   2015-10-24   3rd_action_group
2015-10-08   2015-10-21      2015-10-22   2015-12-22   3rd_action_group

任何人都可以提供的任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:1)

df['action_group'] = np.where(df['action_date']>df['3rd_date'], 
                              '3rd_action_group', 
                               np.where(((df['action_date'] >= df['2nd_date'])&(df['action_date']<df['3rd_date'])), 
                                          '2nd_action_group', 
                                          '1st_action_group'))

您只需堆叠2个np即可获得所需的结果。

    1st_date    2nd_date    3rd_date    action_date action_group
0   2015-10-05     NaT          NaT     2015-12-03  1st_action_group
1   2015-02-27  2015-03-14  2015-03-15  2015-04-08  3rd_action_group
2   2015-03-07  2015-03-27  2015-03-28  2015-03-27  2nd_action_group
3   2015-01-05  2015-01-20  2015-01-21  2015-05-20  3rd_action_group
4   2015-01-05  2015-01-20  2015-01-21  2015-09-16  3rd_action_group
5   2015-05-23  2015-06-18  2015-06-19  2015-07-01  3rd_action_group
6   2015-03-03     NaT          NaT     2015-07-23  1st_action_group
7   2015-03-03     NaT          NaT     2015-11-14  1st_action_group
8   2015-06-05  2015-06-19  2015-06-20  2015-10-24  3rd_action_group
9   2015-10-08  2015-10-21  2015-10-22  2015-12-22  3rd_action_group