Question

我正在尝试将s3存储桶链接到笔记本实例，但是我无法：

这是我所知道的：

from sagemaker import get_execution_role

role = get_execution_role
bucket = 'atwinebankloadrisk'
datalocation = 'atwinebankloadrisk'

data_location = 's3://{}/'.format(bucket)
output_location = 's3://{}/'.format(bucket)

从存储桶中调用数据：

df_test = pd.read_csv(data_location/'application_test.csv')
df_train = pd.read_csv('./application_train.csv')
df_bureau = pd.read_csv('./bureau_balance.csv')

但是，我不断收到错误消息，无法继续。我还没有找到可以帮助您的答案。

PS：我是这个AWS的新手

Answer 1

您可以使用以下示例代码将S3数据加载到AWS SageMaker Notebook。确保确保Amazon SageMaker角色已附加策略，以便可以访问S3。

[1] https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html

import boto3 
import botocore 
import pandas as pd 
from sagemaker import get_execution_role 

role = get_execution_role() 

bucket = 'Your_bucket_name' 
data_key = your_data_file.csv' 
data_location = 's3://{}/{}'.format(bucket, data_key) 

pd.read_csv(data_location)

Answer 2

您正试图使用Pandas从S3读取文件-Pandas可以从本地磁盘读取文件，但不能直接从S3读取文件。
相反，download the files from S3 to your local disk,然后使用熊猫来读取它们。

import boto3
import botocore

BUCKET_NAME = 'my-bucket' # replace with your bucket name
KEY = 'my_image_in_s3.jpg' # replace with your object key

s3 = boto3.resource('s3')

try:
    s3.Bucket(BUCKET_NAME).download_file(KEY, 'my_local_image.jpg')
except botocore.exceptions.ClientError as e:
    if e.response['Error']['Code'] == "404":
        print("The object does not exist.")
    else:
        raise

Answer 3

您可以使用https://s3fs.readthedocs.io/en/latest/通过熊猫直接读取s3文件。以下代码摘自here

import os
import pandas as pd
from s3fs.core import S3FileSystem

os.environ['AWS_CONFIG_FILE'] = 'aws_config.ini'

s3 = S3FileSystem(anon=False)
key = 'path\to\your-csv.csv'
bucket = 'your-bucket-name'

df = pd.read_csv(s3.open('{}/{}'.format(bucket, key), mode='rb'))

Answer 4

在熊猫1.0.5中，如果您已经提供了对笔记本实例的访问权限，那么从S3读取csv就这么简单（https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#reading-remote-files）：

<Pivot linkSize={PivotLinkSize.large}>
        <PivotItem headerText='userInfo' headerButtonProps={}>
            <UserPage />
        </PivotItem>
        <PivotItem headerText='userConfig'>
            <UserSetting />
        </PivotItem>
        <PivotItem headerText='Store'>
            <StorePage />
        </PivotItem>
        <PivotItem headerText='SubInfo'>
            <SubInfo />
        </PivotItem>
</Pivot>

在笔记本设置过程中，我向笔记本实例附加了df = pd.read_csv('s3://<bucket-name>/<filepath>.csv')策略，以授予其对S3存储桶的访问权限。您也可以通过IAM管理控制台执行此操作。

如果您需要凭据，可以通过三种方式提供它们（https://s3fs.readthedocs.io/en/latest/#credentials）：

SageMakerFullAccess，aws_access_key_id和aws_secret_access_key环境变量
配置文件，例如aws_session_token
对于IAM元数据提供程序EC2上的节点

如何将s3存储桶链接到sagemaker笔记本电脑

5 个答案: