Python boto3从s3加载模型tar文件并解压缩

时间:2019-08-14 18:21:33

标签: amazon-s3 boto3 tar amazon-sagemaker

我正在使用Sagemaker,并且有一堆model.tar.gz文件,需要将它们解压缩并加载到sklearn中。我一直在使用带定界符的list_objects来测试以获取tar.gz文件:

response = s3.list_objects(
Bucket = bucket,
Prefix = 'aleks-weekly/models/',
Delimiter = '.csv'
)


for i in response['Contents']:
    print(i['Key'])

然后我打算用

提取
import tarfile
tf = tarfile.open(model.read())
tf.extractall()

但是如何从s3而不是某个boto3对象获取实际的tar.gz文件?

1 个答案:

答案 0 :(得分:1)

您可以使用s3.download_file()将对象下载到文件中。这将使您的代码看起来像:

s3 = boto3.client('s3')
bucket = 'my-bukkit'
prefix = 'aleks-weekly/models/'

# List objects matching your criteria
response = s3.list_objects(
    Bucket = bucket,
    Prefix = prefix,
    Delimiter = '.csv'
)

# Iterate over each file found and download it
for i in response['Contents']:
    key = i['Key']
    dest = os.path.join('/tmp',key)
    print("Downloading file",key,"from bucket",bucket)
    s3.download_file(
        Bucket = bucket,
        Key = key,
        Filename = dest
    )