我有一个SNS通知设置,当.xlsx文件上传到S3存储桶时,该设置会触发Lambda函数。
lambda函数将.xlsx文件读取到Pandas DataFrame中。
import os
import pandas as pd
import json
import xlrd
import boto3
def main(event, context):
message = event['Records'][0]['Sns']['Message']
parsed_message = json.loads(message)
src_bucket = parsed_message['Records'][0]['s3']['bucket']['name']
filepath = parsed_message['Records'][0]['s3']['object']['key']
s3 = boto3.resource('s3')
s3_client = boto3.client('s3')
obj = s3_client.get_object(Bucket=src_bucket, Key=filepath)
print(obj['Body'])
df = pd.read_excel(obj, header=2)
print(df.head(2))
我收到如下错误:
Invalid file path or buffer object type: <type 'dict'>: ValueError
Traceback (most recent call last):
File "/var/task/handler.py", line 26, in main
df = pd.read_excel(obj, header=2)
File "/var/task/pandas/util/_decorators.py", line 178, in wrapper
return func(*args, **kwargs)
File "/var/task/pandas/util/_decorators.py", line 178, in wrapper
return func(*args, **kwargs)
File "/var/task/pandas/io/excel.py", line 307, in read_excel
io = ExcelFile(io, engine=engine)
File "/var/task/pandas/io/excel.py", line 376, in __init__
io, _, _, _ = get_filepath_or_buffer(self._io)
File "/var/task/pandas/io/common.py", line 218, in get_filepath_or_buffer
raise ValueError(msg.format(_type=type(filepath_or_buffer)))
ValueError: Invalid file path or buffer object type: <type 'dict'>
我该如何解决?
答案 0 :(得分:2)
Pandas现在支持s3 URL作为文件路径,因此它可以直接从s3读取excel文件,而无需先下载它。
请参见此处以获取CSV示例-https://stackoverflow.com/a/51777553/52954
答案 1 :(得分:1)
这是完全正常的! obj是词典,您尝试过吗?
df = pd.read_excel(obj['body'], header=2)
答案 2 :(得分:0)