性能:如何使用cx_Oracle和executemany()快速插入CLOB?

时间:2014-03-25 14:57:30

标签: performance python-3.x cx-oracle oracle12c

cx_Oracle API对我来说非常快,直到我尝试使用CLOB值。

我这样做:

import time
import cx_Oracle

num_records = 100
con = cx_Oracle.connect('user/password@sid')
cur = con.cursor()
cur.prepare("insert into table_clob (msg_id, message) values (:msg_id, :msg)")
cur.bindarraysize = num_records
msg_arr = cur.var(cx_Oracle.CLOB, arraysize=num_records)
text = '$'*2**20    # 1 MB of text
rows = []

start_time = time.perf_counter()
for id in range(num_records):
    msg_arr.setvalue(id, text)
    rows.append( (id, msg_arr) )    # ???

print('{} records prepared, {:.3f} s'
    .format(num_records, time.perf_counter() - start_time))
start_time = time.perf_counter()
cur.executemany(None, rows)
con.commit()
print('{} records inserted, {:.3f} s'
    .format(num_records, time.perf_counter() - start_time))

cur.close()
con.close()
  1. 令我担忧的主要问题是表现:

    100 records prepared, 25.090 s - Very much for copying 100MB in memory!
    100 records inserted, 23.503 s - Seems to be too much for 100MB over network.
    

    有问题的步骤是msg_arr.setvalue(id, text)。如果我发表评论,脚本只需几毫秒即可完成(当然,将空值插入CLOB列)。

  2. 其次,在rows数组中添加对CLOB变量的相同引用似乎很奇怪。我在互联网上找到了这个例子,它运行正常,但我做得对吗?

  3. 在我的案例中是否有提高绩效的方法?

  4. 更新:测试的网络吞吐量:一个107 MB的文件在11秒内通过SMB复制到同一主机。但同样,网络传输不是主要问题。数据准备时间过长。

1 个答案:

答案 0 :(得分:0)

奇怪的解决方法(感谢来自cx_Oracle邮件列表的Avinash Nandakumar),但它是在插入CLOB时大大提高性能的真正方法:

import time
import cx_Oracle
import sys

num_records = 100
con = cx_Oracle.connect('user/password@sid')
cur = con.cursor()
cur.bindarraysize = num_records
text = '$'*2**20    # 1 MB of text
rows = []

start_time = time.perf_counter()
cur.executemany(
    "insert into table_clob (msg_id, message) values (:msg_id, empty_clob())",
    [(i,) for i in range(1, 101)])
print('{} records prepared, {:.3f} s'
      .format(num_records, time.perf_counter() - start_time))

start_time = time.perf_counter()
selstmt = "select message from table_clob " +
          "where msg_id between 1 and :1 for update"
cur.execute(selstmt, [num_records])
for id in range(num_records):
    results = cur.fetchone()
    results[0].write(text)
con.commit()
print('{} records inserted, {:.3f} s'
      .format(num_records, time.perf_counter() - start_time))

cur.close()
con.close()

在语义上,这与我原来的帖子不完全相同,我希望尽可能简单地保持示例以显示原理。关键是你应该插入emptyclob(),然后选择它并写下它的内容。