熊猫-根据日期范围映射两个数据框

时间:2019-09-11 06:51:28

标签: pandas

我正在尝试根据用户的生命周期对其进行分类。下方的Pandas数据框显示了客户根据使用产品的时间长短而获得票证的次数。

主数据框

cust_id,start_date,end_date
101,02/01/2019,12/01/2019
101,14/02/2019,24/04/2019
101,27/04/2019,02/05/2019
102,25/01/2019,02/02/2019
103,02/01/2019,22/01/2019

主查询表

 start_date,end_date,project_name
 01/01/2019,13/01/2019,project_a
 14/01/2019,13/02/2019,project_b
 15/02/2019,13/03/2019,project_c
 14/03/2019,13/06/2019,project_d

我正在尝试映射以上两个数据帧,以便能够将project_name添加到主数据帧

预期输出:

cust_id,start_date,end_date,project_name
101,02/01/2019,12/01/2019,project_a
101,14/02/2019,24/04/2019,project_c
101,14/02/2019,24/04/2019,project_d
101,27/04/2019,02/05/2019,project_d
102,25/01/2019,02/02/2019,project_b
103,02/01/2019,22/01/2019,project_a
103,02/01/2019,22/01/2019,project_b

我确实希望最终输出中有重复的行,因为主数据帧中的单行将落在主查询表的多行下

1 个答案:

答案 0 :(得分:1)

我认为您需要:

df = df1.assign(a=1).merge(df2.assign(a=1), on='a')
m1 = df['start_date_y'].between(df['start_date_x'], df['end_date_x'])
m2 = df['end_date_y'].between(df['start_date_x'], df['end_date_x'])

df = df[m1 | m2]
print (df)
   cust_id start_date_x end_date_x  a start_date_y end_date_y project_name
1      101   2019-02-01 2019-12-01  1   2019-01-14 2019-02-13    project_b
2      101   2019-02-01 2019-12-01  1   2019-02-15 2019-03-13    project_c
3      101   2019-02-01 2019-12-01  1   2019-03-14 2019-06-13    project_d
6      101   2019-02-14 2019-04-24  1   2019-02-15 2019-03-13    project_c
7      101   2019-02-14 2019-04-24  1   2019-03-14 2019-06-13    project_d