Question

是否有一种使用Boto3将数据往返于AWS lambda的数据流的方法？我有一个有效的代码，但是将CSV数据加载到内存中进行处理并将其放入s3对象中。我宁愿寻找一种使用Boto3从S3流对象并将其流回到S3的方法。

import csv
import json
import boto3

def lambda_handler(event, context):

    targetbucket = 'AWS_BUCKET_NAME'
    csvkey = 'CSV_FILENAME.csv'
    jsonkey = 'JSON_FILENAME.json'

    s3 = boto3.resource('s3')
    csv_object = s3.Object(targetbucket, csvkey)
    csv_content = csv_object.get()['Body'].read().splitlines()
    s3_client = boto3.client('s3')
    result = []

    for line in csv_content:
        x = json.dumps(line.decode('utf-8')).split(',')
        Name = str(x[0])
        Title = str(x[1])
        Age = str(x[2])
        jsonData = '{ "Name": ' + Name + '"' + ','  \
            + ' "Title": ' + '"' + Title + '"' + ',' \
            + ' "Age": ' + '"' +  Age + '"' + '}'
        result.append(jsonData)

    s3_client.put_object(
        Bucket=targetbucket,
        Body= str(result).replace("'",""),
        Key=jsonkey
    )

Answer 1

对于在S3中从CSV / JSON文件流式传输数据，您可以使用'S3 Select'。使用此方法，您可以将数据直接流式传输到代码中并使用它，而不是将文件下载到内存中并进行处理。

除此之外，您还可以对代码执行基本的SQL语句。

您也可以参考以下代码以获取参考：https://gist.github.com/SrushithR/1dbb6d3521383c259b47756506cf5955

Answer 2

我最终使用smart_open：https://github.com/RaRe-Technologies/smart_open，这是自述文件中的一个示例。

>>> # can use context managers too:
>>> with open('smart_open/tests/test_data/1984.txt.gz') as fin:
...    with open('smart_open/tests/test_data/1984.txt.bz2', 'w') as fout:
...        for line in fin:
...           fout.write(line)

您可以使用以下URL在s3存储桶中打开文件：s3：// my_bucket / my_key

是否可以使用boto3将CSV对象从S3存储桶流式传输到AWS Lambda？

2 个答案: