Question

我正在尝试使用boto3运行textract detect_document_text请求。

我正在使用以下代码：

client = boto3.client('textract')
response = client.detect_document_text(
             Document={
            'Bytes': image_b64['document_b64']
        }
      )

其中image_b64 ['document_b64']是我使用https://base64.guru/converter/encode/image网站示例进行转换的base64图像代码。

但是我遇到了以下错误：

UnsupportedDocumentException

我做错了什么？

Answer 1

每个文档：

如果您使用AWS开发工具包调用Amazon Textract，则可能不需要对使用“字节”字段传递的图像字节进行base64编码。

仅在直接调用REST API时才需要

Base64编码。使用Python或NodeJS SDK时，请使用本机字节（二进制字节）。

Answer 2

为了将来参考，我使用以下方法解决了该问题：

        client = boto3.client('textract')
        image_64_decode = base64.b64decode(image_b64['document_b64']) 
        bytes = bytearray(image_64_decode)
        response = client.detect_document_text(
                                                Document={
                                                    'Bytes': bytes
                                                    }
                                            )

Textract不支持的文档异常

2 个答案: