我正在将数据从Mysql移动到Postgres,我的代码如下 -
import os, re, time, codecs, glob, sqlite3
from StringIO import StringIO
import psycopg2, MySQLdb, datetime, decimal
from datetime import date
import gc
tables = (['table1' , 27],)
conn = psycopg2.connect("dbname='xxx' user='xxx' host='localhost' password='xxx' ")
curpost = conn.cursor()
db = MySQLdb.connect(host="127.0.0.1", user="root", passwd="root" , unix_socket='/var/mysql/mysql.sock', port=3306 )
cur = db.cursor()
cur.execute('use xxx;')
for t in tables:
print t
curpost.execute( "truncate table " + t[0] )
cur.execute("select * from "+ t[0] )
a = ','.join( '%s' for i in range(t[1]) )
qry = "insert into " + t[0] + " values ( " + a +" )"
print qry
i = 0
while True:
rows = cur.fetchmany(5000)
if not rows: break
string = ''
for row in rows:
string = string + ('|'.join([str(x) for x in row])) + "\n"
curpost.copy_from(StringIO(string), t[0], sep="|", null="None" )
i += curpost.rowcount
print i , " loaded"
curpost.connection.commit()
del string, row, rows
gc.collect()
curpost.close()
cur.close()
对于小型表,代码运行正常。然而,较大的那些(360万条记录),在mysql执行的那一刻(cur.execute(“select * from”+ t [0]))运行时,机器上的内存利用率会缩放。这是即使我使用了fetchmany并且记录应该只有5000个批次。我已尝试过500条记录也是如此。对于大型表,似乎fetchmany没有按照记录的那样工作..
编辑 - 我添加了垃圾收集和del语句。在所有记录都没有处理之前,记忆仍然会膨胀。
有什么想法吗?
答案 0 :(得分:0)
对不起,如果我错了,你说你不想改变查询
但是如果您别无选择,可以尝试:
替换此片段:
cur.execute("select * from "+ t[0] )
a = ','.join( '%s' for i in range(t[1]) )
qry = "insert into " + t[0] + " values ( " + a +" )"
print qry
i = 0
while True:
rows = cur.fetchmany(5000)
到这一个:
a = ','.join( '%s' for i in range(t[1]) )
qry = "insert into " + t[0] + " values ( " + a +" )"
print qry
i = 0
while True:
cur.execute("select * from "+ t[0]+" LIMIT "+i+", 5000")
rows = cur.fetchall()