Question

我使用pythons内置sqlite3模块来访问数据库。我的查询执行150000个条目的表和40000个条目的表之间的连接，结果再次包含大约150000个条目。如果我在SQLite Manager中执行查询需要几秒钟，但如果我从python执行相同的查询，它在一分钟后就没有完成。这是我使用的代码：

cursor = self._connection.cursor()
annotationList = cursor.execute("SELECT PrimaryId, GOId " + 
                                "FROM Proteins, Annotations " +
                                "WHERE Proteins.Id = Annotations.ProteinId")
annotations = defaultdict(list)
for protein, goterm in annotationList:
    annotations[protein].append(goterm)

我做了fetchall只是为了衡量执行时间。有没有人对性能的巨大差异有解释？我在Mac OS X 10.6.4上使用Python 2.6.1。

修改

我手动实现了连接，这样可以更快地完成。代码如下所示：

cursor = self._connection.cursor()
proteinList = cursor.execute("SELECT Id, PrimaryId FROM Proteins ").fetchall()
annotationList = cursor.execute("SELECT ProteinId, GOId FROM Annotations").fetchall()
proteins = dict(proteinList)
annotations = defaultdict(list)
for protein, goterm in annotationList:
    annotations[proteins[protein]].append(goterm)

因此，当我自己获取表格然后在python中进行连接时，大约需要2秒钟。上面的代码需要永远。我在这里错过了什么吗？

第二次编辑 我现在尝试使用apsw，它工作得很好（代码根本不需要更改），性能很棒。我仍然想知道为什么sqlite3 - 模块这么慢。

Answer 1

这里有一个讨论：http://www.mail-archive.com/python-list@python.org/msg253067.html

似乎sqlite3模块存在性能瓶颈。有一个advice如何让您的查询更快：

确保您在连接列上有索引
使用pysqlite

Answer 2

您尚未发布相关表格的架构，但我认为索引可能存在问题，特别是没有Proteins.Id或Annotations.ProteinId（或两者）的索引。

像这样创建SQLite索引

CREATE INDEX IF NOT EXISTS index_Proteins_Id ON Proteins (Id)
CREATE INDEX IF NOT EXISTS index_Annotations_ProteinId ON Annotations (ProteinId)

加入Pythons sqlite模块比手动更慢

2 个答案: