Question

我使用boto3，试图检索存储在S3中的Microsoft Word文档。但是，当我尝试访问调用client.get_object()的对象时，Word文档的内容长度为0，而带有.txt扩展名的文件返回正确的内容长度。有没有一种方法可以将Word Document解码以将其输出写入流中？

我已经使用.txt文件和.docs文件进行了测试，并且在读取文件后也尝试使用.decode（）方法，但是基于返回的内容，似乎没有任何可解码的内容

访问.txt文档我注意到内容长度为17（文件中的字符数），可以通过调用txt_file.read（）来读取它们。

s3 = boto3.client('s3')
txt_file = s3.get_object(Bucket="test_bucket", Key="test.txt").get()
>>> txt_file
{
    u'Body': <botocore.response.StreamingBody object at 0x7fc5f0074f10>, 
    u'AcceptRanges': 'bytes', 
    u'ContentType': 'text/plain', 
    'ResponseMetadata': {
        'HTTPStatusCode': 200, 
    'RetryAttempts': 0, 
        'HTTPHeaders': {
        'content-length': '17', 
        'accept-ranges': 'bytes', 
        'server': 'AmazonS3', 
        'last-modified': 'Sat, 06 Jul 2019 02:13:45 GMT', 
        'date': 'Sat, 06 Jul 2019 15:58:21 GMT', 
        'x-amz-server-side-encryption': 'AES256', 
        'content-type': 'text/plain'
        }
    }
}

访问.docx文档我注意到content-length为0（而文档具有写入.txt文件的相同字符串），并调用txt_file.read（）输出空字符串u''

s3 = boto3.client('s3')
word_file = s3.get_object(Bucket="test_bucket", Key="test.docx").get()
>>> word_file
{
    u'Body': <botocore.response.StreamingBody object at 0x7fc5f0074f10>, 
    u'AcceptRanges': 'bytes', 
    u'ContentType': 'binary/octet-stream', 
    'ResponseMetadata': {
        'HTTPStatusCode': 200, 
    'RetryAttempts': 0, 
        'HTTPHeaders': {
        'content-length': '0', 
        'accept-ranges': 'bytes', 
        'server': 'AmazonS3', 
        'last-modified': 'Thu, 04 Jul 2019 21:51:53 GMT', 
        'date': 'Sat, 06 Jul 2019 15:58:30 GMT', 
        'x-amz-server-side-encryption': 'AES256', 
        'content-type': 'binary/octet-stream'
        }
    }
}

我希望两个文件的内容长度都能输出文件中的字节数，但是，只有.txt文件正在返回数据。

如何从boto3以流媒体形式存储在AWS S3中的Word文档

0 个答案: