Question

我目前正在尝试用Python创建一个捕获项目。目前，我正在捕获账单上的联系信息。

就目前而言，我的首次搜索基于邮政编码，以找到一些有用的信息。然后，我将使用最接近的单词的位置（X-Y）来构造所有地址信息。

但是我有一个问题。我有这段代码，它返回邮政编码同一列上的单词（xTL表示x在单词左上方的位置）

line_and_word_boxes = tool.image_to_string(
    img,
    lang="fra",
    builder=pyocr.builders.WordBoxBuilder()
)


arrayOfLine     = []
arrayOfZipCode  = []

# xTL stands for x Top Left (the position of X on the top left of the word)
# yTL stands for y Top Left
# xBR stands for x Bottom Right (the position of X on the bottom right of the word)
# yBR stands for y Bottom Right
for box in line_and_word_boxes: # Loop over all the lines of the document
    arrayOfLine.append({
        'xTL'      : box.position[0][0],
        'yTL'      : box.position[0][1],
        'xBR'      : box.position[1][0],
        'yBR'      : box.position[1][1],
        'content'   : box.content
    })
    if re.match(r"^\d{5}$", box.content) is not None: # Search for zip code (regex on 5 digits)
    arrayOfZipCode.append({
        'xTL'      : box.position[0][0],
        'yTL'      : box.position[0][1],
        'xBR'      : box.position[1][0],
        'yBR'      : box.position[1][1],
        'content'   : box.content
    })

    for line in arrayOfLine:
    # Check words on the same column and put them in an array of words
    if abs(line['xTL'] - zipCode['xTL']) < rangeX :
        nearXWord.append({
            'xTL': line['xTL'],
            'yTL': line['yTL'],
            'xBR': line['xBR'],
            'yBR': line['yBR'],
            'content': line['content']
        })

因此在下面的示例中，此代码将返回一个包含'M'，'1762'，'1ER'，'84000'的数组

现在我的问题是，我如何才能在先前找到的每个单词的同一行上找到这些单词？

PYOCR-循环播放下一个单词

0 个答案: