Question

下面有一个名为row_list的嵌套列表：

[[
        {
            'text': 'Col',
            'x0': Decimal('21.600'),
            'x1:' Decimal('30.000')
        },
        {
            'text': '1',
            'x0': Decimal('41.600'),
            'x1': Decimal('51.600')
        }
    ],[
        {
            'text': 'Col',
            'x0': Decimal('21.600'),
            'x1': Decimal('51.600')
        },
        {
            'text': '1',
            'x0': Decimal('41.600'),
            'x1': Decimal('51.600')
        },
        {
            'text': 'Col',
            'x0': Decimal('200.736'),
            'x1': Decimal('210.296')
        },
        {
            'text': '2',
            'x0': Decimal('230.600'),
            'x1': Decimal('240.920')
        }
]]

每个嵌套列表的位置都代表一个文本行。因此，以上表示：

Col 1        
Col 1           Col 2

现在考虑我有两个已定义的区域(x, y, w, h)，我想使用它们来“拆分”列表（非常类似于表列）。例如：

areas = {}
areas[0] = (0, 0, 100, 792)
areas[1] = (100, 0, 300, 792)

使用以上内容，我想选择定义区域内的所有文本（无论它属于哪个嵌套列表）。那应该给我：

[[
        {
            'text': 'Col',
            'x0': Decimal('21.600'),
            'x1': Decimal('30.000')
        },
        {
            'text': '1',
            'x0': Decimal('41.600'),
            'x1:' Decimal('51.000')
        },
        {
            'text': 'Col',
            'x0': Decimal('21.600'),
            'x1:' Decimal('30.000')
        },
        {
            'text': '1',
            'x0': Decimal('41.600'),
            'x1:' Decimal('51.600')
        }
    ],[
        {
            'text': 'Col',
            'x0': Decimal('200.736'),
            'x1': Decimal('210.296')
        },
        {
            'text': '2',
            'x0': Decimal('230.600'),
            'x1': Decimal('240.920')
        }
]]

我不确定如何在嵌套列表中进行搜索/选择以及“重新映射”数据。我已经尝试过类似的东西：

finalCols = []
for i, area in enumerate(areas):
    area = areas[i]
    for line in row_list:
        for word in line:
            if word['x0'] >= area[0] and word['x1'] <= area[2]:
                finalCols[].append(word)

但这只是将每个单词附加到列表中，而没有创建上面的嵌套列表结构（我的预期输出）。

Answer 1

您已经关闭。应该是这样的：

finalCols = []
for area in areas:
    for line in area:
        newWords = []
        for word in line:
            if word['x0'] >= area[0] and word['x1'] <= area[2]:
                newWords.append(word)
        finalCols.append(newWords)

在嵌套列表中搜索范围之间的数字

1 个答案: