Python-比较两个数据框之间的范围

时间:2019-12-18 07:42:12

标签: python pandas

对于df1:

    Country fruit   low high
0   Spain   orange  100 20000
1   Italy   apple   500 50000
2   Aus     grape   300 10000

和df2:

    City    fruit   low high
0   sample1 orange  50  200
1   sample1 apple   10  400
2   sample2 orange  25000   50000
3   sample3 orange  50  300
4   sample3 grape   350 1000
5   sample3 grape   10  100

如果df2中“ low”和“ high”之间的范围包含在df1中“ low”和“ high”范围内,我想基于“ fruit”匹配行并从df1中提取行。因此预期的输出将是:

    City    fruit   low high  Country   fruit   low high
0   sample1 orange  50  200   Spain     orange  100 20000
1   sample3 orange  50  300   Spain     orange  100 20000
2   sample3 grape   350 1000  Aus       grape   300 10000

我认为它可以像这样开始:

for sample, subdf in df2.groupby("fruit"):        
        for index, row in subdf.iterrows():

2 个答案:

答案 0 :(得分:2)

DataFrame.merge与外部联接一起使用,并通过boolean indexing进行过滤:

df1 = df2.merge(df1, on='fruit', how='outer', suffixes=('','1'))
df2 = df1[(df1.low1 <= df1.high) & (df1.high1 >= df1.low)]
print (df2)
      City   fruit  low  high Country  low1  high1
0  sample1  orange   50   200   Spain   100  20000
2  sample3  orange   50   300   Spain   100  20000
4  sample3   grape  350  1000     Aus   300  10000

答案 1 :(得分:0)

我将使用左联接而不是外部联接。

>>> (
    df2
    .merge(df1, how='left', on='fruit', suffixes=('', '_country'))
    .loc[lambda frame: frame.eval('(low > low_country) and (high < high_country)')]
    .reset_index()
    )
   index     City  fruit  low  high Country  low_country  high_country
0      4  sample3  grape  350  1000     Aus          300         10000