Question

我有两个数据帧如下：

df1 

Index   Fruit
1       Apple
2       Banana
3       Peach

df2 

Index   Taste
1       Tasty
1.5     Rotten
2       Tasty
2.6     Tasty
3       Rotten
3.3     Tasty
4       Tasty

我想通过使用两个数据帧的索引来过滤df2，例如df1.index + 0.5＆lt; = df2.index，然后取第一行结果。然后将两个数据帧组合在一起。

结果数据框应如下所示：

df_outcome          

Index   Fruit   Index_df2   Taste
1       Apple   1.5         Rotten
2       Banana  2.6         Tasty
3       Peach   4           Tasty

我尝试执行以下df2[df2.index>=df1.index + 0.5]但它返回

ValueError：只能比较带有相同标签的Series对象

任何帮助？

Answer 1

对索引使用searchsorted，然后按iloc和上次concat选择：

df = pd.concat([df1.reset_index(), 
                df2.iloc[df2.index.searchsorted(df1.index + .5)].reset_index()], axis=1)
print (df)
   Index   Fruit  Index   Taste
0      1   Apple    1.5  Rotten
1      2  Banana    2.6   Tasty
2      3   Peach    4.0   Tasty

详情：

print (df2.index.searchsorted(df1.index + .5))
[1 3 6]

print (df2.iloc[df2.index.searchsorted(df1.index + .5)])
        Taste
Index        
1.5    Rotten
2.6     Tasty
4.0     Tasty

Answer 2

要从df2获取行，请使用 numpy broadcasting 和argmax。然后，使用df1将结果与pd.concat联系起来。

r = df2.iloc[(df1.Index.values + 0.5 
       <= df2.Index.values[:, None]).argmax(axis=0)].reset_index(drop=1)

pd.concat([df1, r], 1)

   Index   Fruit  Index   Taste
0      1   Apple    1.5  Rotten
1      2  Banana    2.6   Tasty
2      3   Peach    4.0   Tasty

<强>详情

广播给出：

x = (df1.Index.values + 0.5 <= df2.Index.values[:, None])
array([[False, False, False],
       [ True, False, False],
       [ True, False, False],
       [ True,  True, False],
       [ True,  True, False],
       [ True,  True, False],
       [ True,  True,  True]], dtype=bool)

拿这个argmax，你有：

x.argmax(axis=0)
array([1, 3, 6])

pandas - 通过另一个数据帧的索引过滤数据帧，然后组合这两个数据帧

2 个答案: