Question

我有一个工作代码可以从S3中的一个存储桶下载文件，并且可以在Python中完成一些转换。我不将访问密钥和秘密密钥嵌入代码中，但这些密钥位于我的AWS CLI配置中。

import boto3
import botocore


BUCKET_NAME = 'converted-parquet-bucket' # replace with your own bucket name
KEY = 'json-to-parquet/names.snappy.parquet' # replace with path and follow with key object

s3 = boto3.resource('s3')

try:
    s3.Bucket(BUCKET_NAME).download_file(KEY, 'names.snappy.parquet') #  replace the key object name
except botocore.exceptions.ClientError as e: # exception handling
    if e.response['Error']['Code'] == "404":
        print("The object does not exist.") # if object that you are looking for does not exist it will print this
    else:
        raise

# Un comment lines 21 and 22 to convert csv to parquet
# dataframe = pandas.read_csv('names.csv')
# dataframe.to_parquet('names.snappy.parquet' ,engine='auto', compression='snappy')

data = pq.read_pandas('names.snappy.parquet', columns=['Year of Birth', 'Gender', 'Ethnicity', "Child's First Name", 'Count', 'Rank']).to_pandas()


#print(data) # this code will print the ALL the data in the parquet file

print(data.loc[data['Gender'] == 'MALE']) # this code will print the data in the parquet file ONLY what is in the query (SQL query)

有人可以帮助我如何使该代码正常运行吗？而无需将访问和密钥嵌入在AWS的或代码中

Answer 1

如果您在本地运行功能，则需要在本地凭证/配置文件上拥有凭证才能与AWS资源进行交互。

一种替代方法是在AWS Lambda上运行（如果您的功能定期运行，则可以使用CloudWatch Events进行设置），然后使用Environment Variables或AWS Security Token Service (STS)生成临时凭证。

Answer 2

如果您不想使用秘密/访问密钥，则应使用 roles 和 policies 。这是交易：

定义一个角色（例如RoleWithAccess），并确保您的用户（在您的凭据中定义）可以担任此角色

为RoleWithAccess设置策略，为存储桶提供读/写访问权限

如果要在本地计算机上执行它，请运行必要的命令（AWS CLI）创建配置文件，使您假设RoleWithAccess（例如ProfileWithAccess）

使用 session 并将此 profile 作为参数来执行脚本，这意味着您需要替换：

s3 = boto3.resource('s3')

使用

session = boto3.session.Session(profile_name='ProfileWithAccess')
s3 = session.resource('s3')

此方法的好处是，如果您在EC2实例中运行它，则可以在构建实例时将其绑定到特定角色（例如RoleWithAccess）。在这种情况下，您可以完全忽略会话，配置文件，所有AWS CLI hocus pocus ，而只需运行s3 = boto3.resource('s3')。

您还可以使用AWS Lambda，设置具有对存储桶的读/写权限的角色和策略。

在没有访问权限和Python密钥的情况下从AWS S3下载文件

2 个答案: