对于df1:
Country fruit low high
0 Spain orange 100 20000
1 Italy apple 500 50000
2 Aus grape 300 10000
和df2:
City fruit low high
0 sample1 orange 50 200
1 sample1 apple 10 400
2 sample2 orange 25000 50000
3 sample3 orange 50 300
4 sample3 grape 350 1000
5 sample3 grape 10 100
如果df2中“ low”和“ high”之间的范围包含在df1中“ low”和“ high”范围内,我想基于“ fruit”匹配行并从df1中提取行。因此预期的输出将是:
City fruit low high Country fruit low high
0 sample1 orange 50 200 Spain orange 100 20000
1 sample3 orange 50 300 Spain orange 100 20000
2 sample3 grape 350 1000 Aus grape 300 10000
我认为它可以像这样开始:
for sample, subdf in df2.groupby("fruit"):
for index, row in subdf.iterrows():
答案 0 :(得分:2)
将DataFrame.merge
与外部联接一起使用,并通过boolean indexing
进行过滤:
df1 = df2.merge(df1, on='fruit', how='outer', suffixes=('','1'))
df2 = df1[(df1.low1 <= df1.high) & (df1.high1 >= df1.low)]
print (df2)
City fruit low high Country low1 high1
0 sample1 orange 50 200 Spain 100 20000
2 sample3 orange 50 300 Spain 100 20000
4 sample3 grape 350 1000 Aus 300 10000
答案 1 :(得分:0)
我将使用左联接而不是外部联接。
>>> (
df2
.merge(df1, how='left', on='fruit', suffixes=('', '_country'))
.loc[lambda frame: frame.eval('(low > low_country) and (high < high_country)')]
.reset_index()
)
index City fruit low high Country low_country high_country
0 4 sample3 grape 350 1000 Aus 300 10000