Question

我正在将数据从Mysql移动到Postgres，我的代码如下 -

import os, re, time, codecs, glob, sqlite3
from StringIO import StringIO
import psycopg2, MySQLdb, datetime, decimal
from datetime  import date
import gc

tables = (['table1' , 27],)
conn = psycopg2.connect("dbname='xxx' user='xxx' host='localhost' password='xxx' ")
curpost = conn.cursor()
db = MySQLdb.connect(host="127.0.0.1", user="root", passwd="root" , unix_socket='/var/mysql/mysql.sock', port=3306 )
cur = db.cursor() 
cur.execute('use xxx;')

for t in tables:
    print t
    curpost.execute( "truncate table " + t[0] )
    cur.execute("select * from "+ t[0] )
    a = ','.join( '%s' for i in range(t[1]) )
    qry = "insert into " + t[0]  + " values ( " + a +" )" 
    print qry
    i = 0
    while True:
        rows = cur.fetchmany(5000)
        if not rows: break
        string = ''
        for row in rows:
            string = string +  ('|'.join([str(x) for x in row])) + "\n"                
        curpost.copy_from(StringIO(string),  t[0], sep="|", null="None" )
        i += curpost.rowcount
        print i  , " loaded"
        curpost.connection.commit()        
        del string, row, rows
        gc.collect()        

curpost.close()
cur.close()

对于小型表，代码运行正常。然而，较大的那些（360万条记录），在mysql执行的那一刻（cur.execute（“select * from”+ t [0]））运行时，机器上的内存利用率会缩放。这是即使我使用了fetchmany并且记录应该只有5000个批次。我已尝试过500条记录也是如此。对于大型表，似乎fetchmany没有按照记录的那样工作..

编辑 - 我添加了垃圾收集和del语句。在所有记录都没有处理之前，记忆仍然会膨胀。

有什么想法吗？

Answer 1

对不起，如果我错了，你说你不想改变查询

但是如果您别无选择，可以尝试：

替换此片段：

cur.execute("select * from "+ t[0] )
a = ','.join( '%s' for i in range(t[1]) )
qry = "insert into " + t[0]  + " values ( " + a +" )" 
print qry
i = 0
while True:
        rows = cur.fetchmany(5000)

到这一个：

a = ','.join( '%s' for i in range(t[1]) )
qry = "insert into " + t[0]  + " values ( " + a +" )" 
print qry
i = 0
while True:
    cur.execute("select * from "+ t[0]+" LIMIT "+i+", 5000")
    rows = cur.fetchall()

MySql FetchMany内存问题

1 个答案: