我正在尝试使用lambda函数将数据从s3存储桶插入到aws中的mysql RDS实例。我使用sqlalchemy连接到mysql端点。我想对数据做一些修改。我更改了列名,然后重新索引它们,以便我可以将它们映射到RDS实例中的表。问题出在df.columns行中。我没有以字符串格式获取列名,而是将它们作为元组。
+-----------------+-------------+----------------------+---------------+---------
| ('col_a',) | ('date_timestamp',) | ('col_b',) | ('col_c',) | (vehicle_id',) |
+-----------------+-------------+----------------------+---------------+---------
| 0.180008333 | 2017-09-28T20:36:00Z | -6.1487501 | 38.35 | 1004 |
| 0.809708333 | 2017-06-17T14:16:00Z | 8.189424 | -6.8732784 | NominalValue |
+-----------------+-------------+----------------------+---------------+---------
以下是代码 -
from __future__ import print_function
import boto3
import json
import logging
import pymysql
from sqlalchemy import create_engine
from pandas.io import sql
from pandas.io.json import json_normalize
from datetime import datetime
print('Loading function')
s3 = boto3.client('s3')
def getEngine(endpoint):
engine_ = None
try:
engine_ = create_engine(endpoint)
except Exception as e:
print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
raise e
return engine_
engine = getEngine('mysql+pymysql://username:password@endpoint/database')
configuration = {
"aTable":
{
"from" : ['col_1','col_2','date_timestamp','operator_id'],
"to" : ['date_timestamp','operator_id','col_1','col_2'],
"sql_table_name" : 'sql_table_a'
},
"bTable" : {
"from" : ['col_a','date_timestamp','col_b','col_c','vehicle_id'],
"to" : ['date_timestamp','col_a','col_b','vehicle_id','col_c'],
"sql_table_name" : 'sql_table_b'
}
}
def handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
s3_object_key = event['Records'][0]['s3']['object']['key']
obj = s3.get_object(Bucket=bucket, Key=s3_object_key)
data = json.loads(obj['Body'].read())
for _key in data:
if not _key in configuration:
print("No configuration found for {0}".format(_key))
df = json_normalize(data[str(_key)])
df.columns=[configuration[_key]['from']]
#df = df.reindex(indexlist,axis="columns")
#df['date_timestamp'] = df['date_timestamp'].apply(lambda x: datetime.strptime(x, "%Y-%m-%dT%H:%M:%SZ"))
df.to_sql(name=configuration[_key]['sql_table_name'], con=engine, if_exists='append', index=False)
print(df)
return "Loaded data in RDS"
答案 0 :(得分:0)
我们应该从行中的代码中删除[] -
df.columns=[configuration[_key]['from']]
正确的代码是
df.columns=configuration[_key]['from']