读取由s3事件触发的文件

时间:2017-10-25 08:57:40

标签: python csv amazon-s3 aws-lambda serverless-framework

这是我想要做的:

  1. 用户将csv文件上传到AWS S3存储桶。
  2. 上传文件后,S3存储桶会调用我创建的lambda函数。
  3. 我的lambda函数读取csv文件内容,然后发送包含文件内容和信息的电子邮件
  4. 本地环境

    无服务器框架版本1.22.0

    Python 2.7

    这是我的serverless.yml文件

    service: aws-python # NOTE: update this with your service name
    
    provider:
      name: aws
      runtime: python2.7
      stage: dev
      region: us-east-1
      iamRoleStatements:
            - Effect: "Allow"
              Action:
                  - s3:*
                  - "ses:SendEmail"
                  - "ses:SendRawEmail"
                  - "s3:PutBucketNotification"
              Resource: "*"
    
    functions:
      csvfile:
        handler: handler.csvfile
        description: send mail whenever a csv file is uploaded on S3 
        events:
          - s3:
              bucket: mine2
              event: s3:ObjectCreated:*
              rules:
                - suffix: .csv
    

    这是我的lambda函数:

    import json
    import boto3
    import botocore
    import logging
    import sys
    import traceback
    import csv
    
    from botocore.exceptions import ClientError
    from pprint import pprint
    from time import strftime, gmtime
    from json import dumps, loads, JSONEncoder, JSONDecoder
    
    
    #setup simple logging for INFO
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)
    
    from botocore.exceptions import ClientError
    
    def csvfile(event, context):
        """Send email whenever a csvfile is uploaded to S3 """
        body = {}
        emailcontent = ''
        status_code = 200
        #set email information
        email_from = '****@*****.com'
        email_to = '****@****.com'
        email_subject = 'new file is uploaded'
        try:
            s3 = boto3.resource(u's3')
            s3 = boto3.client('s3')
            for record in event['Records']:
                filename = record['s3']['object']['key']
                filesize = record['s3']['object']['size']
                source = record['requestParameters']['sourceIPAddress']
                eventTime = record['eventTime']
            # get a handle on the bucket that holds your file
            bucket = s3.Bucket(u'mine2')
            # get a handle on the object you want (i.e. your file)
            obj = bucket.Object(key= event[u'Records'][0][u's3'][u'object'][u'key'])
            # get the object
            response = obj.get()
            # read the contents of the file and split it into a list of lines
            lines = response[u'Body'].read().split()
            # now iterate over those lines
            for row in csv.DictReader(lines):    
                print(row)
                emailcontent = emailcontent + '\n' + row 
        except Exception as e:
            print(traceback.format_exc())
            status_code = 500
            body["message"] = json.dumps(e)
    
        email_body = "File Name: " + filename + "\n" + "File Size: " + str(filesize) + "\n" +  "Upload Time: " + eventTime + "\n" + "User Details: " + source + "\n" + "content of the csv file :" + emailcontent
        ses = boto3.client('ses')
        ses.send_email(Source = email_from,
            Destination = {'ToAddresses': [email_to,],}, 
                Message = {'Subject': {'Data': email_subject}, 'Body':{'Text' : {'Data': email_body}}}
                )
        print('Function execution Completed')
    

    我不知道自己做错了什么,因为当我刚刚得到有关文件的信息时这部分工作正常,当我添加读取部分时,lambda函数不会返回任何

1 个答案:

答案 0 :(得分:16)

我建议您在IAM政策中添加对Cloudwatch的访问权限。 实际上你的lambda函数没有返回任何内容,但你可以在Cloudwatch中看到你的日志输出。在您设置logger.info(message)时,我建议您使用print代替logger

我希望这有助于调试你的功能。

除发送部分外,我将重写它(仅在AWS控制台中测试):

import logging
import boto3

logger = logging.getLogger()
logger.setLevel(logging.INFO)

s3 = boto3.client('s3')

def lambda_handler(event, context):
    email_content = ''

    # retrieve bucket name and file_key from the S3 event
    bucket_name = event['Records'][0]['s3']['bucket']['name']
    file_key = event['Records'][0]['s3']['object']['key']
    logger.info('Reading {} from {}'.format(file_key, bucket_name))
    # get the object
    obj = s3.get_object(Bucket=bucket_name, Key=file_key)
    # get lines inside the csv
    lines = obj['Body'].read().split(b'\n')
    for r in lines:
       logger.info(r.decode())
       email_content = email_content + '\n' + r.decode()
    logger.info(email_content)