将数据从Amazon s3复制到redshift

时间:2018-04-03 15:19:11

标签: python amazon-web-services amazon-s3 amazon-redshift airflow

我正在尝试使用S3 bucket将数据从Redshift Database复制到airflow,这是我的代码:

from airflow.hooks import PostgresHook
path = 's3://my_bucket/my_file.csv'

redshift_hook = PostgresHook(postgres_conn_id='table_name')
access_key='abcd' 
secret_key='aaaa'
query= """
copy my_table 
FROM '%s' 
ACCESS_KEY_ID '%s' 
SECRET_ACCESS_KEY '%s' 
REGION 'eu-west-1' 
ACCEPTINVCHARS 
IGNOREHEADER 1 
FILLRECORD 
CSV
BLANKSASNULL 
EMPTYASNULL 
MAXERROR 100 
DATEFORMAT 'MM/DD/YYYY'
""" % ( path,
        access_key,
        secret_key) 

redshift_hook.run(query)

但是当我运行此脚本时,它会引发以下错误:

    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: connection [SQL: 'SELECT connection.password AS connection_password, connection.extra AS connection_extra, connection.id AS connection_id, connection.conn_id AS connection_conn_id, connection.conn_type AS connection_conn_type, connection.host AS connection_host, connection.schema AS connection_schema, connection.login AS connection_login, connection.port AS connection_port, connection.is_encrypted AS connection_is_encrypted, connection.is_extra_encrypted AS connection_is_extra_encrypted \nFROM connection \nWHERE connection.conn_id = ?'] [parameters: ('elevaate_uk_production',)]

我能得到一些帮助吗? 提前谢谢。

1 个答案:

答案 0 :(得分:1)

您的connection_id与表名相同? 你需要在http:// ......... / admin / connections /上找你的气流ui,并为你的redshift集群添加一个postgres连接ID。现在将该连接ID的名称放在您编写Error: The ChromeDriver could not be found on the current PATH. Please download the latest version of the ChromeDriver from http://chromedriver.storage.googleapis.com/index.html and ensure it can be found on your PATH. at new ServiceBuilder (C:\Users\pcrunn\Desktop\GermBot\node_modules\selenium-webdriver\chrome.js:232:13) at getDefaultService (C:\Users\pcrunn\Desktop\GermBot\node_modules\selenium-webdriver\chrome.js:321:22) at Function.createSession (C:\Users\pcrunn\Desktop\GermBot\node_modules\selenium-webdriver\chrome.js:696:44) at createDriver (C:\Users\pcrunn\Desktop\GermBot\node_modules\selenium-webdriver\index.js:155:33) at Builder.build (C:\Users\pcrunn\Desktop\GermBot\node_modules\selenium-webdriver\index.js:647:16) at Object.<anonymous> (C:\Users\pcrunn\Desktop\GermBot\app.js:3:88) at Module._compile (module.js:643:30) at Object.Module._extensions..js (module.js:654:10) at Module.load (module.js:556:32) at tryModuleLoad (module.js:499:12) 的位置。

当您在此处定义s3连接并将访问和密钥放在那里时。通过连接ID名称实例化SSHHook来加载它,然后从中获取密钥。

最后将table_name替换为PostgresOperator。将密钥放在参数dict中,然后在SQL字符串中使用:

…run(query)