Question

我有两个这样的DataFrame：

df_cells = pd.DataFrame({
    'left': [1095, 257],
    'top': [1247, 1148],
    'right': [1158, 616],
    'bottom': [1273, 1176] 
})

df_text = pd.DataFrame({
    'words': ['Hello', 'world', 'nice day', 'have a'],
    'left': [1097, 1099, 258, 259],
    'top': [1248, 1249, 1156, 1153],
    'right': [1154, 1156, 615, 614],
    'bottom': [1269, 1271, 1175, 1172] 
})

df_cells包含图像上短语的边界框坐标，df_text包含单词及其在图像上的边界框坐标。我想将两个DataFrame组合成第三个DataFrame，其中df_text中落入df_cell中bbox中的单词被概括为一个短语，以及{{1}中短语的bbox坐标还会根据以下条件显示}}：

df_text

结果数据框应如下所示：

[(df_text['left'] >= df_cells['left']) & (df_text['top'] >= df_cells['top']) & (df_text['right'] <= df_cells['right']) & (df_text['bottom'] <= df_cells['bottom'])]

我将不胜感激。

编辑：不必将多个单词始终放入一个单元格中，有时边界框内可能只是一个单词。

Answer 1

您可以创建一个function，然后使用apply为每一行应用该功能。

def filter_text(left, top, right, bottom, df=df_text, **unuse):
    df = df.copy()

    # Based on given conditions
    df = df[(left <= df.left) & (top <= df.top) & (right >= df.right) & (bottom >= df.bottom)]
    
    df.sort_values(['top', 'left'], ignore_index=True, inplace=True)
    
    return(" ".join(df.words.tolist()))

使用apply将功能应用于每一行。

df_cells['Words'] = df_cells.apply(lambda row: filter_text(**row.to_dict()), axis=1)

df_cells

   left   top  right  bottom            Words
0  1095  1247   1158    1273      Hello world
1   257  1148    616    1176  have a nice day

根据条件从两个数据框中合并并选择行

1 个答案: