使用Google Cloud Vision API从申请表中提取手写文本

时间:2019-05-09 18:45:51

标签: python google-cloud-platform google-cloud-vision

我想使用Google Vision API的文本检测功能从申请表中提取手写文本。它极大地提取了手写文本,但给出了非常无组织的JSON类型响应,我不知道如何解析,因为我只想提取特定字段(例如姓名,联系电话,电子邮件等)并将它们存储到MySQL数据库中。 / p>

代码(https://cloud.google.com/vision/docs/detecting-fulltext#vision-document-text-detection-python):

opts

Input Image

来自API的响应:

def detect_document(path):
    """Detects document features in an image."""
    from google.cloud import vision
    client = vision.ImageAnnotatorClient()

    with io.open(path, 'rb') as image_file:
        content = image_file.read()

    image = vision.types.Image(content=content)

    response = client.document_text_detection(image=image)

    for page in response.full_text_annotation.pages:
        for block in page.blocks:
            print('\nBlock confidence: {}\n'.format(block.confidence))

            for paragraph in block.paragraphs:
                print('Paragraph confidence: {}'.format(
                    paragraph.confidence))

                for word in paragraph.words:
                    word_text = ''.join([
                        symbol.text for symbol in word.symbols
                    ])
                    print('Word text: {} (confidence: {})'.format(
                        word_text, word.confidence))

                    #for symbol in word.symbols:
                    #    print('\tSymbol: {} (confidence: {})'.format(
                    #        symbol.text, symbol.confidence))

1 个答案:

答案 0 :(得分:0)

将整个“ for循环”替换为print (response.full_text_annotation.text)