如何使用joblib.dump在s3上保存sklearn模型?

时间:2019-06-12 23:55:36

标签: python amazon-web-services amazon-s3 scikit-learn joblib

我有一个sklearn模型,我想使用joblib.dump将pickle文件保存在我的s3存储桶中

我使用joblib.dump(model, 'model.pkl')在本地保存模型,但是我不知道如何将其保存到s3存储桶。

s3_resource = boto3.resource('s3')
s3_resource.Bucket('my-bucket').Object("model.pkl").put(Body=joblib.dump(model, 'model.pkl'))

我希望将腌制后的文件放在我的s3存储桶中。

3 个答案:

答案 0 :(得分:3)

这是对我有用的方法。非常简单直接。我正在使用joblib(最好用于存储大型sklearn模型),但您也可以使用pickle
另外,我正在使用临时文件来与S3进行传输。但是,如果您愿意,可以将文件存储在更永久的位置。

import tempfile
import boto3
import joblib

bucket_name = "my-bucket"
key = "model.pkl"

# WRITE
with tempfile.TemporaryFile() as fp:
    joblib.dump(model, fp)
    fp.seek(0)
    s3_resource.put_object(Body=fp.read(), Bucket=bucket_name, Key=key)

# READ
with tempfile.TemporaryFile() as fp:
    s3_resource.download_fileobj(Fileobj=fp, Bucket=bucket_name, Key=key)
    fp.seek(0)
    model = joblib.load(fp)

# DELETE
s3_resource.delete_object(Bucket=bucket_name, Key=key)

答案 1 :(得分:2)

使用以下代码将模型以.pkl或.sav格式转储到s3位置:

import tempfile
import boto3
s3 = boto3.resource('s3')

# you can dump it in .sav or .pkl format 
location = 's3://bucket_name/folder_name/'
model_filename = 'model.sav'  # use any extension you want (.pkl or .sav)
OutputFile = location + model_filename

# WRITE
with tempfile.TemporaryFile() as fp:
    joblib.dump(scikit_learn_model, fp)
    fp.seek(0)
    # use bucket_name and OutputFile - s3 location path in string format.
    s3.Bucket('bucket_name').put_object(Key= OutputFile, Body=fp.read())

答案 2 :(得分:0)

Just correcting Sayali Sonawane's answer:

import tempfile
import boto3
s3 = boto3.resource('s3')

# you can dump it in .sav or .pkl format 
location = 'folder_name/' # THIS is the change to make the code work
model_filename = 'model.sav'  # use any extension you want (.pkl or .sav)
OutputFile = location + model_filename

# WRITE
with tempfile.TemporaryFile() as fp:
    joblib.dump(scikit_learn_model, fp)
    fp.seek(0)
    # use bucket_name and OutputFile - s3 location path in string format.
    s3.Bucket('bucket_name').put_object(Key= OutputFile, Body=fp.read())