我有一个存储在AWS S3上的numpy数组。我完全撤回它并重建numpy数组。但是,我无法为部分数组执行此操作:
import boto3
import numpy as np
import sys
# Let's use Amazon S3
aws_session = boto3.Session(profile_name='myprofileAWS')
client = aws_session.client('s3')
resource = aws_session.resource('s3')
bucket_name = 'test'
bucket = resource.Bucket(bucket_name)
# Construct numpy array and upload on S3
tab = np.arange(100, dtype=np.int16)
tab.tofile('/temp/tab_test.bin')
bucket.upload_file('/temp/tab_test.bin', 'tab_test.bin')
# Check object size (returns 200 Bytes i.e. 100 items of 2 Bytes)
resource.Object(bucket_name=bucket_name, key='tab_test.bin').content_length
# Retrieve object
offset = 0
end = 200
obj_test = client.get_object(Bucket=bucket_name,
Key='tab_test.bin',
Range='bytes={}-{}'.format(offset, end))
obj_test_string = obj_test['Body'].read()
# Reconstruct Array
# Return the right array well reconstructed
np.fromstring(obj_test_string, dtype=np.int16)
#> array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
# 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
# 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
# 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
# 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
# 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99], dtype=int16)
# Retrieve half of the object
# Here I got a ValueError
offset = 0
end = 100
obj_half_test = client.get_object(Bucket=bucket_name,
Key='tab_test.bin',
Range='bytes={}-{}'.format(offset, end))
obj_half_test_string = obj_test['Body'].read()
# Reconstruct Array
np.fromstring(obj_half_test_string, dtype=np.int16)
在最后一次通话中,我收到以下错误:
ValueError: string size must be a multiple of element size
然而,当我直接尝试将numpy数组转换为字符串时,它可以工作:
# return a numpy array of 50 elements
np.fromstring(tab.tostring()[:100], dtype=np.int16)
> array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49], dtype=int16)
* [编辑] *
另一个测试,当我更改半对象的预期dtype时:
np.fromstring(obj_test_half_string, dtype=np.int8)
> array([ 0, 0, 1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8,
0, 9, 0, 10, 0, 11, 0, 12, 0, 13, 0, 14, 0, 15, 0, 16, 0,
17, 0, 18, 0, 19, 0, 20, 0, 21, 0, 22, 0, 23, 0, 24, 0, 25,
0, 26, 0, 27, 0, 28, 0, 29, 0, 30, 0, 31, 0, 32, 0, 33, 0,
34, 0, 35, 0, 36, 0, 37, 0, 38, 0, 39, 0, 40, 0, 41, 0, 42,
0, 43, 0, 44, 0, 45, 0, 46, 0, 47, 0, 48, 0, 49, 0, 50], dtype=int8)
* [EDIT2]解决方案*
如https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35中所述,字节范围是包含的,这意味着如果我想要前500个字节,我需要写bytes=0-499
而不是bytes=0-500
。我们在查看len(obj_half_test_string) ---> = 101
时会进行验证。因此,当我将结尾从100更改为99时,它会按预期工作。