我试图在SQLAlchemy中进行Redshift COPY。
当我在psql中执行它时,以下SQL正确地将对象从我的S3存储桶复制到我的Redshift表中:
COPY posts FROM 's3://mybucket/the/key/prefix'
WITH CREDENTIALS 'aws_access_key_id=myaccesskey;aws_secret_access_key=mysecretaccesskey'
JSON AS 'auto';
我有几个名为
的文件s3://mybucket/the/key/prefix.001.json
s3://mybucket/the/key/prefix.002.json
etc.
我可以验证是否已使用select count(*) from posts
将新行添加到表中。
但是,当我在SQLAlchemy中执行完全相同的SQL表达式时,执行完成且没有错误,但没有行添加到我的表中。
session = get_redshift_session()
session.bind.execute("COPY posts FROM 's3://mybucket/the/key/prefix' WITH CREDENTIALS aws_access_key_id=myaccesskey;aws_secret_access_key=mysecretaccesskey' JSON AS 'auto';")
session.commit()
我是否执行上述操作或
并不重要from sqlalchemy.sql import text
session = get_redshift_session()
session.execute(text("COPY posts FROM 's3://mybucket/the/key/prefix' WITH CREDENTIALS aws_access_key_id=myaccesskey;aws_secret_access_key=mysecretaccesskey' JSON AS 'auto';"))
session.commit()
答案 0 :(得分:6)
我基本上遇到了同样的问题,不过在我的情况下它更多:
engine = create_engine('...')
engine.execute(text("COPY posts FROM 's3://mybucket/the/key/prefix' WITH CREDENTIALS aws_access_key_id=myaccesskey;aws_secret_access_key=mysecretaccesskey' JSON AS 'auto';"))
通过单步执行pdb,问题显然是缺少调用.commit()
。我不知道为什么session.commit()
在您的情况下不起作用(可能是会话"丢失的曲目"已发送的命令?)因此它可能无法真正解决您的问题。
无论如何,explained in the sqlalchemy docs
鉴于此要求,SQLAlchemy实现了自己的“自动提交”功能,该功能在所有后端中完全一致。这是通过检测表示数据更改操作的语句来实现的,即INSERT,UPDATE,DELETE [...]如果语句是纯文本语句且未设置标志,则使用正则表达式来检测INSERT,UPDATE ,DELETE,以及特定后端的各种其他命令。
所以,有两种解决方案:
text("COPY posts FROM 's3://mybucket/the/key/prefix' WITH CREDENTIALS aws_access_key_id=myaccesskey;aws_secret_access_key=mysecretaccesskey' JSON AS 'auto';").execution_options(autocommit=True).
答案 1 :(得分:1)
在对我有用的副本末尾添加一个提交:
<your copy sql>;commit;
答案 2 :(得分:0)
我使用核心表达式语言和Connection.execute()
(而不是ORM和会话)成功地使用下面的代码将分隔文件复制到Redshift。也许你可以为JSON调整它。
def copy_s3_to_redshift(conn, s3path, table, aws_access_key, aws_secret_key, delim='\t', uncompress='auto', ignoreheader=None):
"""Copy a TSV file from S3 into redshift.
Note the CSV option is not used, so quotes and escapes are ignored. Empty fields are loaded as null.
Does not commit a transaction.
:param Connection conn: SQLAlchemy Connection
:param str uncompress: None, 'gzip', 'lzop', or 'auto' to autodetect from `s3path` extension.
:param int ignoreheader: Ignore this many initial rows.
:return: Whatever a copy command returns.
"""
if uncompress == 'auto':
uncompress = 'gzip' if s3path.endswith('.gz') else 'lzop' if s3path.endswith('.lzo') else None
copy = text("""
copy "{table}"
from :s3path
credentials 'aws_access_key_id={aws_access_key};aws_secret_access_key={aws_secret_key}'
delimiter :delim
emptyasnull
ignoreheader :ignoreheader
compupdate on
comprows 1000000
{uncompress};
""".format(uncompress=uncompress or '', table=text(table), aws_access_key=aws_access_key, aws_secret_key=aws_secret_key)) # copy command doesn't like table name or keys single-quoted
return conn.execute(copy, s3path=s3path, delim=delim, ignoreheader=ignoreheader or 0)