使用AWS Lambda(Python)在s3中压缩文件

时间:2019-11-04 11:31:13

标签: python amazon-web-services amazon-s3 aws-lambda

我在s3存储桶中有几百个PDF,我想要一个lambda函数来为我所有的PDF创建一个zip文件。

在本地Python上执行此操作显然很容易,并且我已经假定逻辑可以以非常简单的方式转移到AWS Lambda。但是到目前为止,我还没有设法解决这个问题。

我一直在使用zipfile Python库以及boto3。我的逻辑很简单,例如找到所有文件,将它们添加到“ files_to_zip”列表中,然后遍历该列表,将每个文件写入新的zip文件中。

但是,这引发了许多问题,我认为这是由于我在理解Lambda的调用和加载文件的工作原理方面的短距离。

这是我到目前为止尝试过的代码

    import os
    import boto3
    from io import BytesIO, StringIO
    from zipfile import ZipFile, ZIP_DEFLATED

    def zipping_files(event, context):
        s3 = boto3.resource('s3')

        BUCKET = 'BUCKET NAME'
        PREFIX_1 = 'KEY NAME'
        new_zip = r'NEW KEY NAME' 
        s3_client = boto3.client('s3')
        files_to_zip = []
        response = s3_client.list_objects_v2(Bucket=BUCKET, Prefix=PREFIX_1)

        all = response['Contents']     
        for i in all:
            files_to_zip.append(str(i['Key']))



        with ZipFile(new_zip, 'w',  compression=ZIP_DEFLATED, allowZip64=True) as new_zip:
            for file in files_to_zip:
                new_zip.write(file) 

我收到错误消息,例如new_zip字符串不存在(FileNotFoundError),这是只读操作。

2 个答案:

答案 0 :(得分:0)

此代码示例尝试在Lambda函数容器的本地文件系统上的默认目录(即NEW KEY NAME afaik)中创建本地文件/var/task

第1步:在/tmp目录(即os.path.join('/tmp', target_filename))中建立一个不错的文件路径。

第2步:您的代码未将zip文件上传到S3。将通话添加到s3_client.put_object

答案 1 :(得分:0)

这里我们如何解决这个问题

import os
import boto3
from io import BytesIO, StringIO
from zipfile import ZipFile, ZIP_DEFLATED

def zipping_files(event, context):
    s3 = boto3.resource('s3')

    BUCKET = 'BUCKET NAME'
    PREFIX_1 = 'KEY NAME'
    s3_client = boto3.client('s3')
    files_to_zip = []
    response = s3_client.list_objects_v2(Bucket=BUCKET, Prefix=PREFIX_1)

    all = response['Contents']     
    for i in all:
        files_to_zip.append(str(i['Key'])) 

    # we download all files to tmp directory of lambda for that we create directory structure in /tmp same as s3 files structure (subdirectory)

    for KEY in files_to_zip:
    try:
        local_file_name = '/tmp/'+KEY
        if os.path.isdir(os.path.dirname(local_file_name)):
          print(local_file_name)
        else:
          os.mkdir(os.path.dirname(local_file_name))

        s3_resource.Bucket(bucket).download_file(KEY, local_file_name)
    except botocore.exceptions.ClientError as e:
        print(e.response)

    #now create empty zip file in /tmp directory use suffix .zip if you want 
    with tempfile.NamedTemporaryFile('w', suffix='.tar.gz', delete=False) as f:
      with ZipFile(f.name, 'w', compression=ZIP_DEFLATED, allowZip64=True) as zip:
        for file in files_to_zip:
          zip.write('/tmp/'+file)

  #once zipped in temp copy it to your preferred s3 location 
  s3_resource.meta.client.upload_file(f.name, bucket, 'destination_s3_path ex. out/filename.tar.gz')
  print('All files zipped successfully!')