我在postgresql数据库中有一个约900,000行的表。我希望在转换每一行并将数据添加到新列之后,逐行将其复制到另一个包含一些额外列的表中。 问题是RAM已满。
以下是代码的相关部分:
engine = sqlalchemy.create_engine(URL(**REMOTE), echo=False)
Session = sessionmaker(bind=engine)
session = Session()
n=1000
counter=1
for i in range(1,total+1,n):
ids=str([j for j in range(i,i+n)])
**q="SELECT * from table_parts where id in (ids)"%ids**
r=session.execute(q).fetchall()
for element in r:
data={}
....
[taking data from each row, extracting string,calculation,
and filling extra columns that the new table has]
...
query=query.bindparams(**data)
try:
session.execute(query)
except:
session.rollback()
raise
if counter%n==0:
print COMMITING....",counter,datetime.datetime.now("%H:%M:%S")
session.commit()
counter+=1
查询是正确的,因此没有错误。在按Ctrl + C之前,新表会正确更新。
问题似乎在于查询:
“SELECT * from table_parts where id in (1,2,3,4...1000)
”
我已经尝试过postgresql数组。
我已经尝试过的事情:
results = (connection
.execution_options(stream_results=True) # Added this line
.execute(query))
from here。据我所知,这与postgresql一起使用时使用服务器端游标。我在发布的代码中抛弃了会话对象并使用了engine.connect()
来自文档,
所以查询api中的yield_per与上面提到的stream_result选项相同
感谢
答案 0 :(得分:0)
create table table_parts ( id serial primary key, data text );
-- Insert 1M rows of about 32kB data =~ 32GB of data
-- Needs only 0.4GB of disk space because of builtin compression
-- Might take a few minutes
insert into table_parts(data)
select rpad('',32*1024,'A') from generate_series(1,1000000);
以下使用SQLAlchemy.Core的代码不会占用大量内存:
import sqlalchemy
import datetime
import getpass
metadata = sqlalchemy.MetaData()
table_parts = sqlalchemy.Table('table_parts', metadata,
sqlalchemy.Column('id', sqlalchemy.Integer, primary_key=True),
sqlalchemy.Column('data', sqlalchemy.String)
)
engine = sqlalchemy.create_engine(
'postgresql:///'+getpass.getuser(),
echo=False
)
connection = engine.connect()
n = 1000
select_table_parts_n = sqlalchemy.sql.select([table_parts]).\
where(table_parts.c.id>sqlalchemy.bindparam('last_id')).\
order_by(table_parts.c.id).\
limit(n)
update_table_parts = table_parts.update().\
where(table_parts.c.id == sqlalchemy.bindparam('table_part_id')).\
values(data=sqlalchemy.bindparam('table_part_data'))
last_id=0
while True:
with connection.begin() as transaction:
row = None
for row in connection.execute(select_table_parts_n, last_id=last_id):
data = row.data.replace('A','B')
connection.execute(
update_table_parts,
table_part_id=row.id,
table_part_data=data
)
if not row:
break
else:
print "COMMITING {} {:%H:%M:%S}".\
format(row.id,datetime.datetime.now())
transaction.commit()
last_id=row.id
您似乎没有使用ORM功能,所以我想您也应该使用SQLAlchemy.Core。