AWS Lambda-读取csv并转换为pandas数据框

时间:2020-10-27 21:23:44

标签: python pandas aws-lambda

我有一个简单的Lambda代码,可以从S3 Bucket中读取csv文件。一切正常,但是我尝试将csv数据获取到pandas数据帧,并且出现错误string indices must be integers

我的代码是标准的沼泽,但我只需要使用csv作为数据框进行进一步操作。虚线是错误的来源。我可以毫无问题地打印数据,因此可以正确配置存储桶和文件的详细信息。

更新的代码

import json
import pandas as pd
import numpy as np
import requests
import glob
import time
import os
from datetime import datetime
from csv import reader
import boto3
import traceback
import io

s3_client = boto3.client('s3')

def lambda_handler(event, context):
    try:
            
        bucket_name = event["Records"][0]["s3"]["bucket"]["name"]
        s3_file_name = event["Records"][0]["s3"]["object"]["key"]
        resp = s3_client.get_object(Bucket=bucket_name, Key=s3_file_name)
        
        data = resp['Body'].read().decode('utf-8')
        df=pd.DataFrame( list(reader(data)))
        print (df.head())

    except Exception as err:
        print(err)
        

        
        
    # TODO implement
    return {
        'statusCode': 200,
        'body': json.dumps('Hello fr2om Lambda!')
    }
    
    traceback.print_exc()

3 个答案:

答案 0 :(得分:2)

我相信您的问题很可能与这一行有关-函数中的 df = pd.DataFrame(list(reader(data)))。下面的答案应允许您将csv文件读取到进程的pandas数据框中。

import boto3
import pandas as pd
from io import BytesIO

s3_client = boto3.client('s3')

def lambda_handler(event, context):
   try:
       bucket_name = event["Records"][0]["s3"]["bucket"]["name"]
       s3_file_name = event["Records"][0]["s3"]["object"]["key"]
       resp = s3_client.get_object(Bucket=bucket_name, Key=s3_file_name)

       ###########################################
       # one of these methods should work for you. 
       # Method 1
       # df_s3_data = pd.read_csv(resp['Body'], sep=',')
       #
       # Method 2
       # df_s3_data = pd.read_csv(BytesIO(resp['Body'].read().decode('utf-8')))
       ###########################################
       print(df_s3_data.head())

   except Exception as err:
      print(err)

答案 1 :(得分:0)

您可以使用read_csv直接从熊猫读取S3文件:

s3_client = boto3.client('s3')

def lambda_handler(event, context):
    try:            
        bucket_name = event["Records"][0]["s3"]["bucket"]["name"]
        s3_file_name = event["Records"][0]["s3"]["object"]["key"]

        # This 'magic' needs s3fs (https://pypi.org/project/s3fs/)
        df=pd.read_csv(f's3://{bucket_name}/{s3_file_name}', sep=',')

        print (df.head())

    except Exception as err:
        print(err)

要记住的事情:

   # Track memory usage at cost of CPU. Great for troubleshooting. Use wisely.
   print(df.info(verbose=True, memory_usage='deep'))  

答案 2 :(得分:0)

import json
import pandas as pd
import numpy as np
import requests
import glob
import time
import os
from datetime import datetime
from csv import reader
import boto3
import io

s3_client = boto3.client('s3')

def lambda_handler(event, context):
    try:
            
        bucket_name = event["Records"][0]["s3"]["bucket"]["name"]
        s3_file_name = event["Records"][0]["s3"]["object"]["key"]
        obj = s3_client.get_object(Bucket=bucket_name, Key= s3_file_name)
        df = pd.read_csv(obj['Body']) # 'Body' is a key word
        print(df.head())

    except Exception as err:
        print(err)
        
    # TODO implement
    return {
        'statusCode': 200,
        'body': json.dumps('Hello fr2om Lambda!')
    }
相关问题