我正在尝试更新SQLite数据库中的大约500k行。我可以很快地创建它们,但是当我更新时,它似乎无限期挂起,但我没有收到错误消息。 (相同大小的插件需要35秒,此更新已超过12小时)。
我执行更新的代码部分是:
for line in result:
if --- blah blah blah ---:
stuff
else:
counter = 1
print("Starting to append result_list...")
result_list = []
for line in result:
result_list.append((str(line),counter))
counter += 1
sql = 'UPDATE BRFSS2015 SET ' + col[1] + \
' = ? where row_id = ?'
print("Executing SQL...")
c.executemany(sql, result_list)
print("Committing.")
conn.commit()
它打印出“执行SQL ...”,并且可能会尝试执行executemany,而这就是它被卡住的地方。变量“result”是一个记录列表,并且据我所知,因为insert语句正常工作,它基本相同。
我是否误用了executemany?我在executemany()上看到很多线程,但据我所知,所有这些线程都会收到错误消息,而不仅仅是无限期挂起。
作为参考,我的完整代码如下。基本上我正在尝试将ASCII文件转换为sqlite数据库。我知道我可以在技术上同时插入所有列,但我可以访问的机器都限制为32位Python并且内存不足(此文件非常大,接近1GB的文本)。
import pandas as pd
import sqlite3
ascii_file = r'c:\Path\to\file.ASC_'
sqlite_file = r'c:\path\to\sqlite.db'
conn = sqlite3.connect(sqlite_file)
c = conn.cursor()
# Taken from https://www.cdc.gov/brfss/annual_data/2015/llcp_varlayout_15_onecolumn.html
raw_list = [[1,"_STATE",2],
[17,"FMONTH",2],
... many other values here
[2154,"_AIDTST3",1],]
col_list = []
for col in raw_list:
begin = (col[0] - 1)
col_name = col[1]
end = (begin + col[2])
col_list.append([(begin, end,), col_name,])
for col in col_list:
print(col)
col_specification = [col[0]]
print("Parsing...")
data = pd.read_fwf(ascii_file, colspecs=col_specification)
print("Done")
result = data.iloc[:,[0]]
result = result.values.flatten()
sql = '''CREATE table if not exists BRFSS2015
(row_id integer NOT NULL,
''' + col[1] + ' text)'
print(sql)
c.execute(sql)
conn.commit()
sql = '''ALTER TABLE
BRFSS2015 ADD COLUMN ''' + col[1] + ' text'
try:
c.execute(sql)
print(sql)
conn.commit()
except Exception as e:
print("Error Happened instead")
print(e)
counter = 1
result_list = []
for line in result:
result_list.append((counter, str(line)))
counter += 1
if '_STATE' in col:
counter = 1
result_list = []
for line in result:
result_list.append((counter, str(line)))
counter += 1
sql = 'INSERT into BRFSS2015 (row_id,' + col[1] + ')'\
+ 'values (?,?)'
c.executemany(sql, result_list)
else:
counter = 1
print("Starting to append result_list...")
result_list = []
for line in result:
result_list.append((str(line),counter))
counter += 1
sql = 'UPDATE BRFSS2015 SET ' + col[1] + \
' = ? where row_id = ?'
print("Executing SQL...")
c.executemany(sql, result_list)
print("Committing.")
conn.commit()
print("Comitted... moving on to next column...")
答案 0 :(得分:2)
对于要更新的每一行,数据库必须搜索该行。 (插入时不需要这样做。)如果row_id
列上没有索引,则数据库必须遍历整个表以进行每次更新。
最好一次插入整行。如果无法做到这一点,row_id
上的create an index或更好,请将其声明为INTEGER PRIMARY KEY。