将表的列名称作为元组而不是aws中的字符串

时间:2017-12-06 20:18:47

标签: python mysql amazon-web-services amazon-s3 aws-lambda

我正在尝试使用lambda函数将数据从s3存储桶插入到aws中的mysql RDS实例。我使用sqlalchemy连接到mysql端点。我想对数据做一些修改。我更改了列名,然后重新索引它们,以便我可以将它们映射到RDS实例中的表。问题出在df.columns行中。我没有以字符串格式获取列名,而是将它们作为元组。

+-----------------+-------------+----------------------+---------------+---------
| ('col_a',) | ('date_timestamp',)  | ('col_b',) | ('col_c',)  | (vehicle_id',) |
+-----------------+-------------+----------------------+---------------+---------
| 0.180008333 | 2017-09-28T20:36:00Z | -6.1487501 | 38.35      |     1004       |         
| 0.809708333 | 2017-06-17T14:16:00Z |  8.189424  | -6.8732784 | NominalValue   |
+-----------------+-------------+----------------------+---------------+---------

以下是代码 -

from __future__ import print_function
import boto3
import json
import logging
import pymysql
from sqlalchemy import create_engine
from pandas.io import sql
from pandas.io.json import json_normalize
from datetime import datetime
print('Loading function')

s3 = boto3.client('s3')
def getEngine(endpoint):
    engine_ = None
    try:
        engine_ = create_engine(endpoint)
    except Exception as e:
        print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
        raise e
    return engine_
engine = getEngine('mysql+pymysql://username:password@endpoint/database')

configuration = {
    "aTable":
    {
        "from" : ['col_1','col_2','date_timestamp','operator_id'],
        "to" : ['date_timestamp','operator_id','col_1','col_2'],
        "sql_table_name" : 'sql_table_a'
    },
    "bTable" : {
        "from" : ['col_a','date_timestamp','col_b','col_c','vehicle_id'],
        "to" : ['date_timestamp','col_a','col_b','vehicle_id','col_c'],
        "sql_table_name" : 'sql_table_b'
    }
}

def handler(event, context):
    bucket = event['Records'][0]['s3']['bucket']['name']
    s3_object_key = event['Records'][0]['s3']['object']['key']
    obj = s3.get_object(Bucket=bucket, Key=s3_object_key)
    data = json.loads(obj['Body'].read())
    for _key in data:
        if not _key in configuration:
            print("No configuration found for {0}".format(_key))
        df = json_normalize(data[str(_key)])
        df.columns=[configuration[_key]['from']]
        #df = df.reindex(indexlist,axis="columns")
        #df['date_timestamp'] = df['date_timestamp'].apply(lambda x: datetime.strptime(x, "%Y-%m-%dT%H:%M:%SZ"))
        df.to_sql(name=configuration[_key]['sql_table_name'], con=engine, if_exists='append', index=False)
    print(df)
    return "Loaded data in RDS"

1 个答案:

答案 0 :(得分:0)

我们应该从行中的代码中删除[] -

    df.columns=[configuration[_key]['from']]

正确的代码是

    df.columns=configuration[_key]['from']