我有以下Postgres查询,其中我要从具有约2500万行的table1中获取数据,并希望将以下查询的输出写入多个文件中。
query = """ WITH sequence AS (
SELECT
a,
b,
c
FROM table1 )
select * from sequence;"""
下面是获取完整数据集的python脚本。如何修改脚本以将其提取到多个文件(例如,每个文件有10000行)
#IMPORT LIBRARIES ########################
import psycopg2
from pandas import DataFrame
#CREATE DATABASE CONNECTION ########################
connect_str = "dbname='x' user='x' host='x' " "password='x' port = x"
conn = psycopg2.connect(connect_str)
cur = conn.cursor()
conn.autocommit = True
cur.execute(query)
df = DataFrame(cur.fetchall())
谢谢
答案 0 :(得分:3)
以下3种方法可能会有所帮助
代码段
with conn.cursor(name='fetch_large_result') as cursor:
cursor.itersize = 20000
query = "SELECT * FROM ..."
cursor.execute(query)
for row in cursor:
....
代码段
conn = psycopg2.connect(conn_url)
cursor = conn.cursor(name='fetch_large_result')
cursor.execute('SELECT * FROM <large_table>')
while True:
# consume result over a series of iterations
# with each iteration fetching 2000 records
records = cursor.fetchmany(size=2000)
if not records:
break
for r in records:
....
cursor.close() # cleanup
conn.close()
最后,您可以定义一个滚动光标
代码段
BEGIN MY_WORK;
-- Set up a cursor:
DECLARE scroll_cursor_bd SCROLL CURSOR FOR SELECT * FROM My_Table;
-- Fetch the first 5 rows in the cursor scroll_cursor_bd:
FETCH FORWARD 5 FROM scroll_cursor_bd;
CLOSE scroll_cursor_bd;
COMMIT MY_WORK;
请注意:未在psycopg2中命名光标将导致光标位于客户端而不是服务器端。