两个DataFrame中重叠范围的Pandas合并

时间:2017-06-22 18:43:45

标签: python pandas merge range

鉴于以下内容:

import pandas as pd
a=pd.DataFrame({' ID':[1,1,2,2],'a.A':[1,5,10,15],'a.B':[3,8,13,18]})
b=pd.DataFrame({' ID':[1,1,2,2],'b.A':[2,2,14,18],'b.B':[3,2,15,20]})
a
    ID  a.A     a.B
0   1     1     3
1   1     5     8
2   2     10    13
3   2     15    18

b
    ID  b.A     b.B
0   1     2     3
1   1     2     2
2   2     14    15
3   2     18    20

我需要将连接b留到a.A到a.B的范围与给定ID号的b.A到b.B的范围重叠的位置。逻辑也可以这样解释: 如果ID在a和b之间匹配,则if(a.A< = b.A和a.B> = b.A)或(a.A< = b.B和a.B> = b.B)然后匹配。

最终结果如下:

    ID   a.A     a.B    b.A   b.B
0   1      1       3     2      3
1   1      1       3     2      2
2   1      5       8        
3   2     10      13        
4   2     15      18    18     20
5   2     15      18    18     20

提前致谢!

2 个答案:

答案 0 :(得分:2)

不确定这是最好的解决方案,但它可以是一个良好的开端:

import pandas as pd
a=pd.DataFrame({' ID':[1,1,2,2],'a.A':[1,5,10,15],'a.B':[3,8,13,18]})
b=pd.DataFrame({' ID':[1,1,2,2],'b.A':[2,2,14,18],'b.B':[3,2,15,20]})

c = a.merge(b)
cbAB = (c["a.A"] <= c["b.A"]) & (c["a.B"] >= c["b.A"]) | (c["a.A"] <= c["b.B"]) & (c["a.B"] >= c["b.B"])
cb = c[["b.A","b.B"]]
cb = cb[cbAB]
c[["b.A","b.B"]] = cb

c = c.drop_duplicates()

c的输出是:

>>> c
    ID  a.A  a.B  b.A  b.B
0    1    1    3    2    3
1    1    1    3    2    2
2    1    5    8  NaN  NaN
4    2   10   13  NaN  NaN
6    2   15   18   14   15
7    2   15   18   18   20

答案 1 :(得分:1)

import pandas as pd
import numpy as np
a=pd.DataFrame({' ID':[1,1,2,2],'a.A':[1,5,10,15],'a.B':[3,8,13,18]})
b=pd.DataFrame({' ID':[1,1,2,2],'b.A':[2,2,14,18],'b.B':[3,2,15,20]})

c = a.merge(b, on=' ID', how='left')
range_overlaps = (
    ((c['a.A'] <= c['b.A']) & (c['a.B'] >= c['b.A'])) |
    ((c['a.A'] <= c['b.B']) & (c['a.B'] >= c['b.B']))
)
c.loc[~range_overlaps, ['b.A', 'b.B']] = np.nan
c = c.drop_duplicates()
c = c.reset_index(drop=True)

print(c)

给出:

    ID  a.A  a.B   b.A   b.B
0    1    1    3   2.0   3.0
1    1    1    3   2.0   2.0
2    1    5    8   NaN   NaN
3    2   10   13   NaN   NaN
4    2   15   18  14.0  15.0
5    2   15   18  18.0  20.0