我正在使用AWS Sagemaker并尝试从Sagemaker将数据文件夹上传到S3。我想做的是将我的数据上传到s3_train_data目录(该目录存在于S3中)。但是,它不会将其上传到该存储桶中,而是存储在已创建的默认存储桶中,然后使用S3_train_data变量创建新的文件夹目录。
在目录
中输入的代码import os
import sagemaker
from sagemaker import get_execution_role
sagemaker_session = sagemaker.Session()
role = get_execution_role()
bucket = <bucket name>
prefix = <folders1/folders2>
key = <input>
s3_train_data = 's3://{}/{}/{}/'.format(bucket, prefix, key)
#path 'data' is the folder in the Jupyter Instance, contains all the training data
inputs = sagemaker_session.upload_data(path= 'data', key_prefix= s3_train_data)
代码中的问题或更多是我创建笔记本的方法吗?
答案 0 :(得分:0)
您可以查看Sample笔记本,如何上传数据S3存储桶 有很多方法。我只是给你提示回答。 你忘了创建一个boto3会话来访问S3存储桶
这是实现目标的方法之一。
import os
import urllib.request
import boto3
def download(url):
filename = url.split("/")[-1]
if not os.path.exists(filename):
urllib.request.urlretrieve(url, filename)
def upload_to_s3(channel, file):
s3 = boto3.resource('s3')
data = open(file, "rb")
key = channel + '/' + file
s3.Bucket(bucket).put_object(Key=key, Body=data)
# caltech-256
download('http://data.mxnet.io/data/caltech-256/caltech-256-60-train.rec')
upload_to_s3('train', 'caltech-256-60-train.rec')
download('http://data.mxnet.io/data/caltech-256/caltech-256-60-val.rec')
upload_to_s3('validation', 'caltech-256-60-val.rec')
另一种方法。
bucket = '<your_s3_bucket_name_here>'# enter your s3 bucket where you will copy data and model artifacts
prefix = 'sagemaker/breast_cancer_prediction' # place to upload training files within the bucket
# do some processing then prepare to push the data.
f = io.BytesIO()
smac.write_numpy_to_dense_tensor(f, train_X.astype('float32'), train_y.astype('float32'))
f.seek(0)
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train', train_file)).upload_fileobj(f)
Youtube链接:https://www.youtube.com/watch?v=-YiHPIGyFGo - 如何在S3存储桶中提取数据。