Question

我正在使用boto3（适用于python的aws sdk）分析文档（pdf）以获取键：值对形式。

import boto3

def process_text_analysis(bucket, document):
    # Get the document from S3
    s3_connection = boto3.resource('s3')
    s3_object = s3_connection.Object(bucket, document)
    s3_response = s3_object.get()
    # Analyze the document
    client = boto3.client('textract')
    response = client.analyze_document(Document={'S3Object': {'Bucket': bucket, 'Name': document}},
                                       FeatureTypes=["FORMS"])


process_text_analysis('francismorgan-01', '709 Privado M SURESTE.pdf')

我使用Analyze Document遵循了AWS文档，运行函数时出现错误。

botocore.errorfactory.UnsupportedDocumentException: An error occurred (UnsupportedDocumentException) when calling the AnalyzeDocument operation: Request has unsupported document format

我错过了什么吗？

Answer 1

AnalyzeDocument是仅支持PNG或JPG图像的同步API。

由于您要使用PDF文件，因此需要使用Amazon Textract异步API ，例如StartDocumentAnalysis，StartDocumentTextDetection

AWS Textract-UnsupportedDocumentException-PDF

1 个答案: