Question

我在ios应用中使用Firebase ML-Kit来识别图像（图像是一种形式）中的文本。使用以下示例https://www.raywenderlich.com/6565-ml-kit-tutorial-for-ios-recognizing-text-in-images

效果很好

但是，我只想识别表单特定区域中的文本，并将其存储在Firebase实时数据库中。

我如何定义表单区域来限制返回的文本并将其标记为推送到数据库？

我找到了一个识别关键字的python示例：

def assemble_word(word):
    assembled_word=""
    for symbol in word.symbols:
        assembled_word+=symbol.text
    return assembled_word
def find_word_location(document,word_to_find):
    for page in document.pages:
        for block in page.blocks:
            for paragraph in block.paragraphs:
                for word in paragraph.words:
                    assembled_word=assemble_word(word)
                    if(assembled_word==word_to_find):
                        return word.bounding_box

location=find_word_location(document,'Overdrafts')


####Output :
vertices {
  x: 131
  y: 130
}
vertices {
  x: 200
  y: 130
}
vertices {
  x: 200
  y: 145
}
vertices {
  x: 131
  y: 145
}

然后假设从“ Overdrafts”一词的右边开始的框的宽度与Overdrafts的宽度相同，使用它来定义输出文本：

def text_within(document,x1,y1,x2,y2): 
  text=""
  for page in document.pages:
    for block in page.blocks:
      for paragraph in block.paragraphs:
        for word in paragraph.words:
          for symbol in word.symbols:
            min_x=min(symbol.bounding_box.vertices[0].x,symbol.bounding_box.vertices[1].x,symbol.bounding_box.vertices[2].x,symbol.bounding_box.vertices[3].x)
            max_x=max(symbol.bounding_box.vertices[0].x,symbol.bounding_box.vertices[1].x,symbol.bounding_box.vertices[2].x,symbol.bounding_box.vertices[3].x)
            min_y=min(symbol.bounding_box.vertices[0].y,symbol.bounding_box.vertices[1].y,symbol.bounding_box.vertices[2].y,symbol.bounding_box.vertices[3].y)
            max_y=max(symbol.bounding_box.vertices[0].y,symbol.bounding_box.vertices[1].y,symbol.bounding_box.vertices[2].y,symbol.bounding_box.vertices[3].y)
            if(min_x >= x1 and max_x <= x2 and min_y >= y1 and max_y <= y2):
              text+=symbol.text
              if(symbol.property.detected_break.type==1 or 
                symbol.property.detected_break.type==3):
                text+=' '
              if(symbol.property.detected_break.type==2):
                text+='\t'
              if(symbol.property.detected_break.type==5):
                text+='\n'
return text

text_within(document, location.vertices[1].x, location.vertices[1].y, 30+location.vertices[1].x+(location.vertices[1].x-location.vertices[0].x),location.vertices[2].y)

### OUTPUT
'$ 511,789.61 '

我可以用Swift做类似的事情，还是有更好的方法？

使用ML KIT从图像定义OCR文本区域

0 个答案: