Question

我使用的是Google Vision API，主要用于提取文字。我工作正常，但对于我需要API来扫描输入行的特定情况，在移动到下一行之前吐出文本。但是，似乎API正在使用某种逻辑，使其在左侧从上到下扫描并向右侧扫描并进行从上到下的扫描。我希望如果API从左向右读，向下移动等等。

例如，考虑图像：

API返回如下文本：

“ Name DOB Gender: Lives In John Doe 01-Jan-1970 LA ”

然而，我会期待这样的事情：

“ Name: John Doe DOB: 01-Jan-1970 Gender: M Lives In: LA ”

我想有一种方法可以定义块大小或边距设置（？）来逐行读取图像/扫描？

感谢您的帮助。亚历

Answer 1

这可能是一个迟到的答案，但添加它以供将来参考。您可以向JSON请求添加功能提示以获得所需的结果。

{
  "requests": [
    {
      "image": {
        "source": {
          "imageUri": "https://i.stack.imgur.com/TRTXo.png"
        }
      },
      "features": [
        {
          "type": "DOCUMENT_TEXT_DETECTION"
        }
      ]
    }
  ]
}

对于相距很远的文本，DOCUMENT_TEXT_DETECTION也无法提供正确的分割。

以下code根据字符多边形坐标进行简单的线段分割。

https://github.com/sshniro/line-segmentation-algorithm-to-gcp-vision

Answer 2

You can extract the text based on the bounds per line too, you can use boundyPoly and concatenate the text in the same line

"boundingPoly": {
        "vertices": [
          {
            "x": 87,
            "y": 148
          },
          {
            "x": 411,
            "y": 148
          },
          {
            "x": 411,
            "y": 206
          },
          {
            "x": 87,
            "y": 206
          }
        ]

for example this 2 words are in the same "line"

"description": "you",
      "boundingPoly": {
        "vertices": [
          {
            "x": 362,
            "y": 1406
          },
          {
            "x": 433,
            "y": 1406
          },
          {
            "x": 433,
            "y": 1448
          },
          {
            "x": 362,
            "y": 1448
          }
        ]
      }
    },
    {
      "description": "start",
      "boundingPoly": {
        "vertices": [
          {
            "x": 446,
            "y": 1406
          },
          {
            "x": 540,
            "y": 1406
          },
          {
            "x": 540,
            "y": 1448
          },
          {
            "x": 446,
            "y": 1448
          }
        ]
      }
    }

Answer 3

这里有一个简单的代码，可以逐行读取。行的y轴和行中每个单词的x轴。

items = []
lines = {}

for text in response.text_annotations[1:]:
    top_x_axis = text.bounding_poly.vertices[0].x
    top_y_axis = text.bounding_poly.vertices[0].y
    bottom_y_axis = text.bounding_poly.vertices[3].y

    if top_y_axis not in lines:
        lines[top_y_axis] = [(top_y_axis, bottom_y_axis), []]

    for s_top_y_axis, s_item in lines.items():
        if top_y_axis < s_item[0][1]:
            lines[s_top_y_axis][1].append((top_x_axis, text.description))
            break

for _, item in lines.items():
    if item[1]:
        words = sorted(item[1], key=lambda t: t[0])
        items.append((item[0], ' '.join([word for _, word in words]), words))

print(items)

文本提取 - 逐行

3 个答案: