Question

我正在尝试使用psycopg2从redshift到python检索数据。不知何故，在我的python服务器上加载一个35GB的数据库需要很长时间（40分钟）。

import psycopg2
con = psycopg2.connect(db_connection_info)

query_cursor = con.cursor('query_cursor')
query_cursor.execute('my_query')

stop = 0
batch = 0

print('Starting to retrieve data')

while stop == 0:
    tmp = query_cursor.fetchmany(10000)

    if len(tmp) < 1:
        stop = 1
    else:
        if batch % 100 == 0:
            print(str(batch*10000) + ' rows loaded')
        if batch == 0:
            data = tmp
        else:
            data = data + tmp
    batch = batch + 1

print('Transfering data to dataframe')

df = pd.DataFrame.from_records(data, columns = manually_selected_features, coerce_float = True)

我没有使用pd.read_sql，因为出于内存需要，我需要使用服务器端游标。

我无法理解为什么fetchmany的第一次迭代与其他人相比需要很长时间。

有加速查询的好方法吗？

此致泽维尔

psycopg2 - 从Redshift到Python获取数据的性能不佳

0 个答案: