鉴于以下内容:
import pandas as pd
a=pd.DataFrame({' ID':[1,1,2,2],'a.A':[1,5,10,15],'a.B':[3,8,13,18]})
b=pd.DataFrame({' ID':[1,1,2,2],'b.A':[2,2,14,18],'b.B':[3,2,15,20]})
a
ID a.A a.B
0 1 1 3
1 1 5 8
2 2 10 13
3 2 15 18
b
ID b.A b.B
0 1 2 3
1 1 2 2
2 2 14 15
3 2 18 20
我需要将连接b留到a.A到a.B的范围与给定ID号的b.A到b.B的范围重叠的位置。逻辑也可以这样解释: 如果ID在a和b之间匹配,则if(a.A< = b.A和a.B> = b.A)或(a.A< = b.B和a.B> = b.B)然后匹配。
最终结果如下:
ID a.A a.B b.A b.B
0 1 1 3 2 3
1 1 1 3 2 2
2 1 5 8
3 2 10 13
4 2 15 18 18 20
5 2 15 18 18 20
提前致谢!
答案 0 :(得分:2)
不确定这是最好的解决方案,但它可以是一个良好的开端:
import pandas as pd
a=pd.DataFrame({' ID':[1,1,2,2],'a.A':[1,5,10,15],'a.B':[3,8,13,18]})
b=pd.DataFrame({' ID':[1,1,2,2],'b.A':[2,2,14,18],'b.B':[3,2,15,20]})
c = a.merge(b)
cbAB = (c["a.A"] <= c["b.A"]) & (c["a.B"] >= c["b.A"]) | (c["a.A"] <= c["b.B"]) & (c["a.B"] >= c["b.B"])
cb = c[["b.A","b.B"]]
cb = cb[cbAB]
c[["b.A","b.B"]] = cb
c = c.drop_duplicates()
c
的输出是:
>>> c
ID a.A a.B b.A b.B
0 1 1 3 2 3
1 1 1 3 2 2
2 1 5 8 NaN NaN
4 2 10 13 NaN NaN
6 2 15 18 14 15
7 2 15 18 18 20
答案 1 :(得分:1)
import pandas as pd
import numpy as np
a=pd.DataFrame({' ID':[1,1,2,2],'a.A':[1,5,10,15],'a.B':[3,8,13,18]})
b=pd.DataFrame({' ID':[1,1,2,2],'b.A':[2,2,14,18],'b.B':[3,2,15,20]})
c = a.merge(b, on=' ID', how='left')
range_overlaps = (
((c['a.A'] <= c['b.A']) & (c['a.B'] >= c['b.A'])) |
((c['a.A'] <= c['b.B']) & (c['a.B'] >= c['b.B']))
)
c.loc[~range_overlaps, ['b.A', 'b.B']] = np.nan
c = c.drop_duplicates()
c = c.reset_index(drop=True)
print(c)
给出:
ID a.A a.B b.A b.B
0 1 1 3 2.0 3.0
1 1 1 3 2.0 2.0
2 1 5 8 NaN NaN
3 2 10 13 NaN NaN
4 2 15 18 14.0 15.0
5 2 15 18 18.0 20.0