Question

I'm doing something among the lines of:

conn_string = "postgresql+pg8000://%s:%s@%s:%d/%s" % (db_user, db_pass, host, port, schema)
conn = sqlalchemy.engine.create_engine(conn_string,execution_options={'autocommit':True},encoding='utf-8',isolation_level="AUTOCOMMIT") 
rows = cur.execute(sql_query)

To run queries on a Redshift cluster. Lately, I've been doing maintenance tasks such as running vacuum reindex on large tables that get truncated and reloaded every day.

The problem is that that command above takes around 7 minutes for a particular table (the table is huge, 60 million rows across 15 columns) and when I run it using the method above it just never finishes and hangs. I can see in the cluster dashboard in AWS that parts of the vacuum command are being run for about 5 minutes and then it just stops. No python errors, no errors on the cluster, no nothing.

My guess is that the connection is lost during the command. So, how do I prove my theory? Anybody else with the issue? What do I change the connection string to keep it alive longer?

EDIT:

I change my connection this after the comments here:

conn = sqlalchemy.engine.create_engine(conn_string,
                                       execution_options={'autocommit': True},
                                       encoding='utf-8',
                                       connect_args={"keepalives": 1, "keepalives_idle": 60,
                                                             "keepalives_interval": 60},  
                                                        isolation_level="AUTOCOMMIT")

And it has been working for a while. However, it decided to start with the same behaviour for even larger tables in which the vacuum reindex actually takes around 45 minutes (at least that is my estimate, the command never finishes running in Python).

How can I make this work regardless of the query runtime?

Answer 1

它很可能不是连接丢弃问题。要确认这一点，请尝试将几百万行推入虚拟表（需要超过5分钟的时间）并查看语句是否失败。将查询提交到redshift后，无论您的连接字符串是否关闭，查询都会在后台执行。

现在，问题本身 - 我的猜测是你的内存或磁盘空间不足，请你更详细一点并列出你的redshift设置（dc1 / ds2有多少个节点）？另外，尝试运行一些管理查询，看看你在磁盘上留下了多少空间。有时当群集被加载到边缘时，会抛出磁盘已满错误，但在您的情况下，因为在将错误抛出到您的python shell之前，连接可能会被丢弃很多。

Redshift + SQLAlchemy long query hangs

1 个答案: