我想使用readline_google_store
(它是一个生成器)来创建其记录的数据库。我的代码就像:
import sqlite3
import re
from google_ngram_downloader import readline_google_store
import time
def Main():
try:
start_time = time.time()
p = re.compile(r'^[a-z]*$', re.IGNORECASE)
el = 'abcdefghijklmnopqrstuvwxyz'
# Open database connection
con = sqlite3.connect('test.db')
# create a class object
cur = con.cursor()
for l in el:
fname, url, records = next(readline_google_store(ngram_len=1, indices=l))
for r in records:
#time.sleep(0.0001)
if r.year >= 2000:
w = r.ngram.lower()
if p.match(w):
cur.execute('SELECT ngram, match_counts FROM Unigram WHERE ngram = ?', (w,))
results = cur.fetchone()
# print results
if not results: # or if results == None
cur.execute("INSERT INTO Unigram VALUES(?, ?);", (w, r.match_count))
con.commit()
else:
match_count_sum = results[1] + r.match_count
cur.execute("UPDATE Unigram SET match_counts = ? WHERE ngram = ?;", (match_count_sum, w))
con.commit()
except sqlite3.Error, e:
if con:
con.rollback()
print 'There was a problem with sql'
finally:
if con:
con.close()
end_time = time.time()
print("--- It took %s seconds ---" % (end_time - start_time))
if __name__ == '__main__':
Main()
输入是(记录)格式:
(ngram, year, match_count, page_count)
忽略年份和page_count我想要一个包含以下记录的表:(ngram, match_count_sum)
其中match_count_sum
是不同年份所有match_count
的总和。
弹出的错误是:
requests.exceptions.ChunkedEncodingError: ("Connection broken: error(54, 'Connection reset by peer')", error(54, 'Connection reset by peer'))
我尝试time.sleep(0.0001)
来调整线程调度并允许套接字I / O完成,但是我得到超时错误......
如何解决此问题?
答案 0 :(得分:1)
由于SQLite似乎是在本地读/写,因此您的错误似乎是远程API的问题。通常这将是你的应用程序的缓慢部分,但我希望从那里读取阻止。
通过对等方重置连接通常表示某处出现网络错误。所以问题是重置的来源(可能是防火墙,API限制等。根据信息不知道它来自哪里,但我可以给你一个初始核对清单。
这超出了对代码的明确控制范围,但您可以更优雅地处理故障。