与我以前的一个问题(Merge dataframes on nearest datetime / timestamp)类似,我想使用最接近的匹配在两个日期时间列上合并两个pandas数据框:
让A和B为两个数据帧,如下所示:
A = pd.DataFrame({"ID":["A", "A", "C" ,"B", "B"], "init_date":["01/01/2015","07/02/2014","08/02/1999","01/01/1991","06/22/2014"], "fin_date":["04/16/1923","09/24/1945","06/24/1952","11/26/1988","10/05/1990"]})
In [15]: A
Out[15]:
ID fin_date init_date
0 A 04/16/1923 01/01/2015
1 A 09/24/1945 07/02/2014
2 C 06/24/1952 08/02/1999
3 B 11/26/1988 01/01/1991
4 B 10/05/1990 06/22/2014
B = pd.DataFrame({"ID":["A", "A", "C" ,"B", "B"], "date":["02/15/2015","06/30/2014","07/02/1999","10/05/1990","06/24/2014"],"fin_date":["12/10/1926","01/01/1944","08/21/1955","12/12/1987","11/05/1991"], "value": ["3","5","1","7","8"] })
In [11]: B
Out[11]:
ID date fin_date value
0 A 02/15/2015 12/10/1926 3
1 A 06/30/2014 01/01/1944 5
2 C 07/02/1999 08/21/1955 1
3 B 10/05/1990 12/12/1987 7
4 B 06/24/2014 11/05/1991 8
结果数据框应如下所示:
In [21]: C
Out[21]:
ID fin_date init_date value
0 A 04/16/1923 01/01/2015 3
1 A 09/24/1945 07/02/2014 5
2 C 06/24/1952 08/02/1999 1
3 B 11/26/1988 01/01/1991 7
4 B 10/05/1990 06/22/2014 8
一般问题可能与init_date和fin_date都没有紧密匹配,但是,例如,当init_date存在完全匹配时,我也会对解决方案感兴趣。
请注意,一个难点是一个匹配可能更接近init_date中的值而不是最终日期,而竞争匹配可能相反。在这种情况下,我更喜欢接近init_date的那个。据我所知,在尝试与链接中的方法类似的方法之后,我通过"最近的"重新编制索引。没有实现多索引。
谢谢,感谢您的帮助,
答案 0 :(得分:0)
pd.merge(A,B['value'],on=['ID','fin_date'],how='left')