使用csv_copy
创建/填充表时,我发现它有时非常慢。以下是核心代码和一些示例输出。
我有两个问题:
代码:
def create_populate_table(table_name,fields,types,cur):
sql = 'CREATE TABLE IF NOT EXISTS ' + table_name + ' (\n'
for i in xrange(len(fields)):
if i==0:
sql += fields[i]+' '+types[i]+' NOT NULL PRIMARY KEY,\n'
elif i==len(fields)-1:
sql += fields[i]+' '+types[i]+')'
else:
sql += fields[i]+' '+types[i]+',\n'
#print sql
cur.execute(sql)
conn.commit()
print "Table ",table_name," created ",timer()
cur.execute("SELECT count(*) from "+table_name)
if cur.fetchone()[0]>0:
return
# populate data into created table
fr= open(file, 'r')
fr.readline()
# parse and convert data into unicode
#data = unicode_csv_reader(fr, delimiter='|')
# anything can be used as a file if it has .read() and .readline() methods
data = StringIO.StringIO()
s=''.join(fr.readlines())
while(s.find('\r\n')<>-1):
s=s.replace('\r\n','\n')
#timer()
while(s.find('||')<>-1 or s.find('|\n')<>-1 ):
s=s.replace('||','|0|')
s=s.replace('|\n','|0\n')
#timer()
#print s.split('\t')[:2]
#exit(0)
data.write(s)
data.seek(0)
try:
cur.copy_from(data, table_name,sep='|')
conn.commit()
print "Table ",table_name," populated ",timer()
except psycopg2.DatabaseError, e:
if conn:
conn.rollback()
print 'Error %s' % e
fr.close()
我看到的输出:
ME_Features_20121001.txt表ME_Features_20121001创建1.44s 无表ME_Features_20121001已填充1.48s无
FM_Features_20121001.txt表FM_Features_20121001创建了67.92s 无表FM_Features_20121001填充0.22s无
NationalFile_20121001.txt(700mb)表NationalFile_20121001 创建9.34s无表NationalFile_20121001填充4963.18s 无
NJ_Features_20121001.txt表NJ_Features_20121001创建了1.65s 无表NJ_Features_20121001已填充41.11s无
PW_Features_20121001.txt表PW_Features_20121001创建1.73s 无表PW_Features_20121001填充0.20s无
答案 0 :(得分:1)
timer()
如何定义?我的盲目猜测(因为你没有提供它的代码)是这个函数直接调用print
来输出测量的时间,但是没有明确地返回任何内容 - 因此打印None
。如果仍不清楚,请查看以下示例:
>>> def test():
... print 'test'
...
>>> print 'This is a', test()
This is a test
None
我不确定你的意思是时间因创建和填充表格而异。。填充表所需的时间显然取决于要插入的数据量。在每种情况下,创建表所需的时间应该或多或少相同,因此67.92s
输出看起来确实可疑,但是......你确定它被正确测量了吗?
同样,我的盲目猜测是timer()
打印自上次通话以来的时间。也许您应该在开始要测量的操作之前明确重置它?我想在调用create_populate_table()
之前花了60秒......