两列(熊猫)上的最近匹配合并

时间:2016-08-29 00:58:04

标签: python pandas

与我以前的一个问题(Merge dataframes on nearest datetime / timestamp)类似,我想使用最接近的匹配在两个日期时间列上合并两个pandas数据框:

让A和B为两个数据帧,如下所示:

A = pd.DataFrame({"ID":["A", "A", "C" ,"B", "B"], "init_date":["01/01/2015","07/02/2014","08/02/1999","01/01/1991","06/22/2014"], "fin_date":["04/16/1923","09/24/1945","06/24/1952","11/26/1988","10/05/1990"]})

 In [15]: A
Out[15]: 
  ID    fin_date   init_date
0  A  04/16/1923  01/01/2015
1  A  09/24/1945  07/02/2014
2  C  06/24/1952  08/02/1999
3  B  11/26/1988  01/01/1991
4  B  10/05/1990  06/22/2014


B = pd.DataFrame({"ID":["A", "A", "C" ,"B", "B"], "date":["02/15/2015","06/30/2014","07/02/1999","10/05/1990","06/24/2014"],"fin_date":["12/10/1926","01/01/1944","08/21/1955","12/12/1987","11/05/1991"], "value": ["3","5","1","7","8"] })

 In [11]: B
Out[11]: 
  ID        date    fin_date value
0  A  02/15/2015  12/10/1926     3
1  A  06/30/2014  01/01/1944     5
2  C  07/02/1999  08/21/1955     1
3  B  10/05/1990  12/12/1987     7
4  B  06/24/2014  11/05/1991     8

结果数据框应如下所示:

In [21]: C
Out[21]: 
  ID    fin_date   init_date value
0  A  04/16/1923  01/01/2015     3
1  A  09/24/1945  07/02/2014     5
2  C  06/24/1952  08/02/1999     1
3  B  11/26/1988  01/01/1991     7
4  B  10/05/1990  06/22/2014     8

一般问题可能与init_date和fin_date都没有紧密匹配,但是,例如,当init_date存在完全匹配时,我也会对解决方案感兴趣。

请注意,一个难点是一个匹配可能更接近init_date中的值而不是最终日期,而竞争匹配可能相反。在这种情况下,我更喜欢接近init_date的那个。据我所知,在尝试与链接中的方法类似的方法之后,我通过"最近的"重新编制索引。没有实现多索引。

谢谢,感谢您的帮助,

1 个答案:

答案 0 :(得分:0)

pd.merge(A,B['value'],on=['ID','fin_date'],how='left')