将多个文件从S3加载到Redshift,查询表中没有数据

时间:2019-03-23 20:35:18

标签: python amazon-s3 amazon-redshift

我有一个从lambda触发的脚本,该脚本执行以下查询:

 COPY test.error_log__c
        FROM 's3://sfdc-etl-jp-test/sfdc_etl/json/error_log__c_json/2019/03/23/'
        iam_role 'arn:aws:iam::<account>:role/LambdaFullAccessRole'
        TRUNCATECOLUMNS
        JSON 'auto'

但是,即使查询成功完成,表仍为空。这些路径中包含1到100个文件。我猜想copy命令不够聪明,无法知道文件名是什么,这就是为什么它不起作用的原因。我对吗?如果可以,我如何告诉它加载多个文件?

如果不是查询问题,请执行以下代码:

create_engine('postgres://{}:{}@{}/ibdrs'.format(igersUser, igersPwd, igersHost), encoding="utf-8")

 loadQuery = '''
        COPY {}.{}
        FROM '{}{}/{}'
        iam_role 'arn:aws:iam::<account>:role/LambdaFullAccessRole' 
        TRUNCATECOLUMNS
        JSON 'auto'
        EMPTYASNULL
        TIMEFORMAT 'auto'
        DATEFORMAT 'auto'
        COMPUPDATE OFF
        STATUPDATE ON
    '''.format(igersSchema, nextObj, s3Destination, s3Path.format(nextObj), dated_path)

with igers.connect() as conn:
        try :
            conn.execute(drop_table)
            print('completed drop table')
            conn.execute(ddl_str)
            print('completed create table')
            conn.execute(loadQuery).execution_options(autocommit=True)
            print('completed load query')
            for row in range(len(groupPerms)) :
                perms_statement = grantPerms.format(groupPerms['namespace'].iloc[row],groupPerms['item'].iloc[row],groupPerms['groname'].iloc[row])
                conn.execute(perms_statement)
            print('completed grant group permissions')
            conn.close()
        except exc.SQLAlchemyError as e :
            print(e)

注意:是的,我知道还有其他查询,它们在执行时没有显示。删除表,重新创建表,重新应用权限都在工作,并且已通过验证。只是S3的副本什么都不做。

1 个答案:

答案 0 :(得分:0)

答案是...。显然对sqlalchemy来说顺序很重要:

create_engine('postgres://{}:{}@{}/ibdrs'.format(igersUser, igersPwd, igersHost), encoding="utf-8")

 loadQuery = '''
        COPY {}.{}
        FROM '{}{}/{}'
        iam_role 'arn:aws:iam::<account>:role/LambdaFullAccessRole' 
        TRUNCATECOLUMNS
        JSON 'auto'
        EMPTYASNULL
        TIMEFORMAT 'auto'
        DATEFORMAT 'auto'
        COMPUPDATE OFF
        STATUPDATE ON
    '''.format(igersSchema, nextObj, s3Destination, s3Path.format(nextObj), dated_path)

with igers.connect() as conn:
        try :
            conn.execute(drop_table)
            print('completed drop table')
            conn.execute(ddl_str)
            print('completed create table')
            conn.execution_options(autocommit=True).execute(loadQuery)
            print('completed load query')
            for row in range(len(groupPerms)) :
                perms_statement = grantPerms.format(groupPerms['namespace'].iloc[row],groupPerms['item'].iloc[row],groupPerms['groname'].iloc[row])
                conn.execute(perms_statement)
            print('completed grant group permissions')
            conn.close()
        except exc.SQLAlchemyError as e :
            print(e)

您会注意到订单更改为auto_commit。