如何使用python从s3存储桶中的文件夹结构中读取文件

时间:2016-06-02 17:33:12

标签: python amazon-web-services amazon-s3

我是python的新手,试图了解如何使用python访问S3 bucket文件夹结构中的文件。我应该指定bucket_key ="路径/文件夹/文件"这样的事情?请帮忙

我主要是尝试从csv文件中获取行数。但是我在阅读文件时遇到错误。

import os
import sys
import string
import urllib
import urllib2
import boto
import boto.cloudformation
import boto.exception       
import boto.sns
import logging
from boto.s3.key import Key
from boto.s3.connection import S3Connection

#?pass this as a parameter?
bucket_name = "reporting"
bucket_key = "/compliance/testfile.csv"


def read_contents(bucket_key):
    # connect to the bucket
   conn = boto.connect_s3()
bucket = conn.get_bucket(bucket_name)
key = bucket_key  
# create a key to keep track 
k = Key(bucket)
k.key=key
testfile = k.get_contents_as_string()
return testfile


test = read_contents(bucket_key)
print test

1 个答案:

答案 0 :(得分:-2)

使用安全操作对S3进行高级访问

嗨,让我们定义一些从AWS S3访问和下载文件的功能。 首先,您必须连接到亚马逊。

连接到S3

import boto
from boto.s3.connection import S3Connection

def s3_conn(conf):
  """ Connect to S3 
  :param conf - dict: contains AWS credentials
  :return S3Connection:
  """
  try:
    s3 = boto.connect_s3()
    return s3
  except:
    key_id = conf.get('AWS_ACCESS_KEY_ID')
    access_key = conf.get('AWS_SECRET_ACCESS_KEY')
    return S3Connection(key_id, access_key)

conf对象引用包含您的AWS凭据的字典。 完成此操作后,我们可以专注于下载步骤。

从S3下载文件

import os
from boto.s3.key import Key

def download_file_s3(filename, dirs3, output_path, buckets3, conf):
  """
  :param filename - str: filename
  :param dirs3 - str: full path to file
  :param output_path - str: output path
  :param buckets3 - str: bucket name
  :param conf - dict: contains AWS credentials
  """

  print('Downloading file from s3, filename={}, output_path={}, dirs3={}, buckets3={}'.format(
    filename, output_path, dirs3, buckets3))
  filepath = '/'.join([dirs3, filename])
  s3 = s3_conn(conf)
  bucket = s3.get_bucket(buckets3)
  key = bucket.get_key(filepath)
  key.get_contents_to_filename(os.path.join(output_path, filename))
  print('File saved, output path={}'.format(os.path.join(output_path, filename)))
  key.close()
  s3.close()

download_file_s3filename作为参数,dirs3对应于文件的完整路径,您还可以设置output_path,{{1}的名称最后,您必须提供包含您的AWS凭据的字典。 因此,此函数确定您的文件路径,连接到Amazon S3,获取所需的存储桶,然后下载您的文件。

安全下载(错误处理)

如果您想从S3安全下载文件,只需使用此功能:

buckets3

在此示例中,from boto.exception import S3ResponseError def safe_download_from_s3(filename, output_path, buckets3, dirs3, conf): """ :param filename - str: filename :param dirs3 - str: full path to file :param output_path - str: output path :param buckets3 - str: bucket name :param conf - dict: contains AWS credentials """ print('Trying to download file from s3, filename={}, output_path={}, dirs3={}, buckets3={}'.format( filename, output_path, dirs3, buckets3)) try: download_file_s3(filename, dirs3, output_path, buckets3, conf) print('File downloaded successfully') except S3ResponseError as err: print('An S3ResponseError occurred while downloading, err={}'.format(err)) except TypeError as err: print('A TypeError occurred while downloading, err={}'.format(err)) except NameError as err: print('A NameError occurred while downloading, err={}'.format(err)) except: print('Unexpected error, exec_info={}'.format(sys.exc_info()[0])) 是这样的字典:

conf

但您也可以将凭据导出到环境变量中:

conf = {'AWS_ACCESS_KEY_ID':'<your_aws_access_key_id>',
        'AWS_SECRET_ACCESS_KEY':'<your_aws_secret_access>'}

只是这样做:

export AWS_ACCESS_KEY_ID=<your_aws_access_key_id>
export AWS_SECRET_ACCESS_KEY=<your_aws_secret_access>

这就是我向你推荐的。

我希望这会有所帮助。如果您有任何疑问,欢迎您。