当前,我使用以下方法来保存一个pickle文件:
with open('model/tokenizer.pickle', 'wb') as handle:
pickle.dump(t, handle, protocol=pickle.HIGHEST_PROTOCOL)
这会将文件存储到我的本地目录中,稍后我使用以下方式从本地上传到Minio:
minioClient = Minio(endpoint = endpoint, access_key = minio_access_key, secret_key = minio_secret_key)
minioClient.fput_object(bucket_name='model', object_name='tokenizer.pickle', file_path='model/tokenizer.pickle')
如何直接将文件保存到Minio,而无需在本地编写?
答案 0 :(得分:2)
您可以先使用
bytes_file = pickle.dumps(t)
将您的对象转换为字节,然后以这种方式使用io.BytesIO(bytes_file)
:
client.put_object(
bucket_name=bucket_name,
object_name=object_name,
data=io.BytesIO(bytes_file),
length=len(bytes_file)
)
然后就做
pickle.loads(client.get_object(bucket_name=bucket_name,
object_name=path_file).read())
答案 1 :(得分:1)
最佳答案有正确的想法,但不正确。它甚至不会运行,因为 put_object
方法中的参数无效。此外,由于 OP 想要将文件写入 Minio(托管在本地),因此您必须指定 endpoint_url
。
这里是一些从头到尾应该可以工作的示例代码。将 endpoint_url
替换为您的 ec2 所在的任何公共 IP。我用 localhost
作为一个简单的例子。
import boto3
import io
import numpy as np
import pandas as pd
import pickle
ACCESS_KEY = 'BLARG'
SECRET_ACCESS_KEY = 'KWARG'
#sample dataframe
df = pd.DataFrame(np.random.randint(0, 100, size=(100, 4))
, columns=list('ABCD'))
bytes_file = pickle.dumps(df)
bucket_name = 'mlflow-minio'
object_name = 'df.pkl'
s3client = boto3.client('s3'
,endpoint_url = 'http://localhost:9000/'
,aws_access_key_id = ACCESS_KEY
,aws_secret_access_key = SECRET_ACCESS_KEY
)
#places file in the Minio bucket
s3client.put_object(
Bucket=bucket_name,
Key=object_name,
Body=io.BytesIO(bytes_file)
)
#Now to load the pickled file
response = s3client.get_object(Bucket=bucket_name, Key=object_name)
body = response['Body'].read()
data = pickle.loads(body)
#sample records
print (data.head())