如何根据条件基于另一个数据框提取熊猫数据框的行

时间:2019-12-03 14:46:50

标签: python pandas dataframe

我有这两个数据框:

df1 = pd.DataFrame({'Points':[1,2,3,4,5], 'ColX':[9,8,7,6,5]})
df1
    Points  ColX
0        1     9
1        2     8
2        3     7
3        4     6
4        5     5

df2 = pd.DataFrame({'Points':[2,5], 'Sum':[-1,1], 'ColY':[2,4]}) # ColY does not matter, I just added it to say that this dataframe can have other columns that the useful columns for this topic
df2
    Points  Sum  ColY
0        2   -1     2
1        5    1     4

我想获取带有df1行的数据框,其中:

  • df1中的“点”列的值也位于df2的“点”列中
  • df2中Sum列的值在0到2之间

因此,我想获取此数据帧(无论索引如何):

    Points  ColX
4        5     5

我尝试了以下操作,但没有成功:

df1[df1.merge(df2, on = 'Points')['Sum'] <= 2 and ['Sum']>=0] 

您能帮我找到正确的代码吗?

3 个答案:

答案 0 :(得分:3)

尝试一下:

df1[df1['Points'].isin(df2.query('0 <= Sum <= 2')['Points'])]

输出:

  Points  ColX
4       5     5

解释:

  • df2.query('0 <= Sum <=2')首先将df2过滤为仅有效记录
  • 然后对过滤器df2点列的isin使用布尔索引。

答案 1 :(得分:1)

Series.between用于带有boolean indexing的布尔掩码,用于过滤传递给带有Series.isin的另一个掩码:

df = df1[df1['Points'].isin(df2.loc[df2['Sum'].between(0,2), 'Points'])]
print (df)
   Points  ColX
4       5     5

您的解决方案应使用DataFrame.query进行更改以进行过滤:

df = df1.merge(df2, on = 'Points').query('0<=Sum<=2')[df1.columns]
print (df)
   Points  ColX
1       5     5

答案 2 :(得分:0)

也可以:

df3 = df1.merge(df2, on='Points')
result = df3[(df3.Sum >= 0) & (df3.Sum <= 2)]
result