我有一个800MB MS Access数据库,我迁移到SQLite。数据库的结构如下(迁移后的SQLite数据库大约为330MB):
表Occurrence
有1,600,000条记录。该表看起来像:
CREATE TABLE Occurrence
(
SimulationID INTEGER, SimRunID INTEGER, OccurrenceID INTEGER,
OccurrenceTypeID INTEGER, Period INTEGER, HasSucceeded BOOL,
PRIMARY KEY (SimulationID, SimRunID, OccurrenceID)
)
它有以下索引:
CREATE INDEX "Occurrence_HasSucceeded_idx" ON "Occurrence" ("HasSucceeded" ASC)
CREATE INDEX "Occurrence_OccurrenceID_idx" ON "Occurrence" ("OccurrenceID" ASC)
CREATE INDEX "Occurrence_SimRunID_idx" ON "Occurrence" ("SimRunID" ASC)
CREATE INDEX "Occurrence_SimulationID_idx" ON "Occurrence" ("SimulationID" ASC)
表OccurrenceParticipant
有3,400,000条记录。该表看起来像:
CREATE TABLE OccurrenceParticipant
(
SimulationID INTEGER, SimRunID INTEGER, OccurrenceID INTEGER,
RoleTypeID INTEGER, ParticipantID INTEGER
)
它有以下索引:
CREATE INDEX "OccurrenceParticipant_OccurrenceID_idx" ON "OccurrenceParticipant" ("OccurrenceID" ASC)
CREATE INDEX "OccurrenceParticipant_ParticipantID_idx" ON "OccurrenceParticipant" ("ParticipantID" ASC)
CREATE INDEX "OccurrenceParticipant_RoleType_idx" ON "OccurrenceParticipant" ("RoleTypeID" ASC)
CREATE INDEX "OccurrenceParticipant_SimRunID_idx" ON "OccurrenceParticipant" ("SimRunID" ASC)
CREATE INDEX "OccurrenceParticipant_SimulationID_idx" ON "OccurrenceParticipant" ("SimulationID" ASC)
表InitialParticipant
有130条记录。表的结构是
CREATE TABLE InitialParticipant
(
ParticipantID INTEGER PRIMARY KEY, ParticipantTypeID INTEGER,
ParticipantGroupID INTEGER
)
该表包含以下索引:
CREATE INDEX "initialpart_participantTypeID_idx" ON "InitialParticipant" ("ParticipantGroupID" ASC)
CREATE INDEX "initialpart_ParticipantID_idx" ON "InitialParticipant" ("ParticipantID" ASC)
表ParticipantGroup
有22条记录。它看起来像
CREATE TABLE ParticipantGroup (
ParticipantGroupID INTEGER, ParticipantGroupTypeID INTEGER,
Description varchar (50), PRIMARY KEY( ParticipantGroupID )
)
该表具有以下索引: CREATE INDEX“ParticipantGroup_ParticipantGroupID_idx”ON“ParticipantGroup”(“ParticipantGroupID”ASC)
表tmpSimArgs
有18条记录。它具有以下结构:
CREATE TABLE tmpSimArgs (SimulationID varchar, SimRunID int(10))
以下索引:
CREATE INDEX tmpSimArgs_SimRunID_idx ON tmpSimArgs(SimRunID ASC)
CREATE INDEX tmpSimArgs_SimulationID_idx ON tmpSimArgs(SimulationID ASC)
表'tmpPartArgs'有80条记录。它具有以下结构:
CREATE TABLE tmpPartArgs(participantID INT)
以下索引:
CREATE INDEX tmpPartArgs_participantID_idx ON tmpPartArgs(participantID ASC)
我有一个涉及多个INNER JOIN的查询,我面临的问题是查询的Access版本大约需要一秒钟,而同一查询的SQLite版本需要10秒(大约10倍慢!)这是不可能的让我迁移回Access和SQLite是我唯一的选择。
我是编写数据库查询的新手,因此这些查询可能看起来很愚蠢,所以请告知任何您看错或儿童菜的内容。
Access中的查询是(整个查询需要1秒才能执行):
SELECT ParticipantGroup.Description, Occurrence.SimulationID, Occurrence.SimRunID, Occurrence.Period, Count(OccurrenceParticipant.ParticipantID) AS CountOfParticipantID FROM
(
ParticipantGroup INNER JOIN InitialParticipant ON ParticipantGroup.ParticipantGroupID = InitialParticipant.ParticipantGroupID
) INNER JOIN
(
tmpPartArgs INNER JOIN
(
(
tmpSimArgs INNER JOIN Occurrence ON (tmpSimArgs.SimRunID = Occurrence.SimRunID) AND (tmpSimArgs.SimulationID = Occurrence.SimulationID)
) INNER JOIN OccurrenceParticipant ON (Occurrence.OccurrenceID = OccurrenceParticipant.OccurrenceID) AND (Occurrence.SimRunID = OccurrenceParticipant.SimRunID) AND (Occurrence.SimulationID = OccurrenceParticipant.SimulationID)
) ON tmpPartArgs.participantID = OccurrenceParticipant.ParticipantID
) ON InitialParticipant.ParticipantID = OccurrenceParticipant.ParticipantID WHERE (((OccurrenceParticipant.RoleTypeID)=52 Or (OccurrenceParticipant.RoleTypeID)=49)) AND Occurrence.HasSucceeded = True GROUP BY ParticipantGroup.Description, Occurrence.SimulationID, Occurrence.SimRunID, Occurrence.Period;
SQLite查询如下(此查询大约需要10秒):
SELECT ij1.Description, ij2.occSimulationID, ij2.occSimRunID, ij2.Period, Count(ij2.occpParticipantID) AS CountOfParticipantID FROM
(
SELECT ip.ParticipantGroupID AS ipParticipantGroupID, ip.ParticipantID AS ipParticipantID, ip.ParticipantTypeID, pg.ParticipantGroupID AS pgParticipantGroupID, pg.ParticipantGroupTypeID, pg.Description FROM ParticipantGroup as pg INNER JOIN InitialParticipant AS ip ON pg.ParticipantGroupID = ip.ParticipantGroupID
) AS ij1 INNER JOIN
(
SELECT tpa.participantID AS tpaParticipantID, ij3.* FROM tmpPartArgs AS tpa INNER JOIN
(
SELECT ij4.*, occp.SimulationID as occpSimulationID, occp.SimRunID AS occpSimRunID, occp.OccurrenceID AS occpOccurrenceID, occp.ParticipantID AS occpParticipantID, occp.RoleTypeID FROM
(
SELECT tsa.SimulationID AS tsaSimulationID, tsa.SimRunID AS tsaSimRunID, occ.SimulationID AS occSimulationID, occ.SimRunID AS occSimRunID, occ.OccurrenceID AS occOccurrenceID, occ.OccurrenceTypeID, occ.Period, occ.HasSucceeded FROM tmpSimArgs AS tsa INNER JOIN Occurrence AS occ ON (tsa.SimRunID = occ.SimRunID) AND (tsa.SimulationID = occ.SimulationID)
) AS ij4 INNER JOIN OccurrenceParticipant AS occp ON (occOccurrenceID = occpOccurrenceID) AND (occSimRunID = occpSimRunID) AND (occSimulationID = occpSimulationID)
) AS ij3 ON tpa.participantID = ij3.occpParticipantID
) AS ij2 ON ij1.ipParticipantID = ij2.occpParticipantID WHERE (((ij2.RoleTypeID)=52 Or (ij2.RoleTypeID)=49)) AND ij2.HasSucceeded = 1 GROUP BY ij1.Description, ij2.occSimulationID, ij2.occSimRunID, ij2.Period;
我不知道我在这里做错了什么。我有所有的索引,但我认为我缺少声明一些关键索引,将为我做的伎俩。有趣的是,在迁移之前,我对SQLite的“研究”表明,SQLite在各方面都比Access更快,更小,更好。但在查询方面,我似乎无法让SQLite比Access更快。我重申我是SQLite的新手,显然没有太多的想法和经验,所以如果有任何有学识的灵魂可以帮助我解决这个问题,我将非常感激。
答案 0 :(得分:2)
我已经重新格式化了你的代码(使用我的自制的sql formatter),希望能让其他人更容易阅读..
重新格式化查询:
SELECT
ij1.Description,
ij2.occSimulationID,
ij2.occSimRunID,
ij2.Period,
Count(ij2.occpParticipantID) AS CountOfParticipantID
FROM (
SELECT
ip.ParticipantGroupID AS ipParticipantGroupID,
ip.ParticipantID AS ipParticipantID,
ip.ParticipantTypeID,
pg.ParticipantGroupID AS pgParticipantGroupID,
pg.ParticipantGroupTypeID,
pg.Description
FROM ParticipantGroup AS pg
INNER JOIN InitialParticipant AS ip
ON pg.ParticipantGroupID = ip.ParticipantGroupID
) AS ij1
INNER JOIN (
SELECT
tpa.participantID AS tpaParticipantID,
ij3.*
FROM tmpPartArgs AS tpa
INNER JOIN (
SELECT
ij4.*,
occp.SimulationID AS occpSimulationID,
occp.SimRunID AS occpSimRunID,
occp.OccurrenceID AS occpOccurrenceID,
occp.ParticipantID AS occpParticipantID,
occp.RoleTypeID
FROM (
SELECT
tsa.SimulationID AS tsaSimulationID,
tsa.SimRunID AS tsaSimRunID,
occ.SimulationID AS occSimulationID,
occ.SimRunID AS occSimRunID,
occ.OccurrenceID AS occOccurrenceID,
occ.OccurrenceTypeID,
occ.Period,
occ.HasSucceeded
FROM tmpSimArgs AS tsa
INNER JOIN Occurrence AS occ
ON (tsa.SimRunID = occ.SimRunID)
AND (tsa.SimulationID = occ.SimulationID)
) AS ij4
INNER JOIN OccurrenceParticipant AS occp
ON (occOccurrenceID = occpOccurrenceID)
AND (occSimRunID = occpSimRunID)
AND (occSimulationID = occpSimulationID)
) AS ij3
ON tpa.participantID = ij3.occpParticipantID
) AS ij2
ON ij1.ipParticipantID = ij2.occpParticipantID
WHERE (
(
(ij2.RoleTypeID) = 52
OR
(ij2.RoleTypeID) = 49
)
)
AND ij2.HasSucceeded = 1
GROUP BY
ij1.Description,
ij2.occSimulationID,
ij2.occSimRunID,
ij2.Period;
根据JohnFx(上图),我对派生的视图感到困惑。我认为实际上没有必要,特别是因为它们都是内在的联系。所以,下面我试图降低复杂性。请检查并测试性能。我不得不与tmpSimArgs进行交叉连接,因为它只与Occurence结合 - 我认为这是期望的行为。
SELECT
pg.Description,
occ.SimulationID,
occ.SimRunID,
occ.Period,
COUNT(occp.ParticipantID) AS CountOfParticipantID
FROM ParticipantGroup AS pg
INNER JOIN InitialParticipant AS ip
ON pg.ParticipantGroupID = ip.ParticipantGroupID
CROSS JOIN tmpSimArgs AS tsa
INNER JOIN Occurrence AS occ
ON tsa.SimRunID = occ.SimRunID
AND tsa.SimulationID = occ.SimulationID
INNER JOIN OccurrenceParticipant AS occp
ON occ.OccurrenceID = occp.OccurrenceID
AND occ.SimRunID = occp.SimRunID
AND occ.SimulationID = occp.SimulationID
INNER JOIN tmpPartArgs AS tpa
ON tpa.participantID = occp.ParticipantID
WHERE occ.HasSucceeded = 1
AND (occp.RoleTypeID = 52 OR occp.RoleTypeID = 49 )
GROUP BY
pg.Description,
occ.SimulationID,
occ.SimRunID,
occ.Period;
答案 1 :(得分:0)
我提供了一个较小的缩小版本的查询。希望这比我早些时候更清晰易读。
SELECT5 * FROM
(
SELECT4 FROM ParticipantGroup as pg INNER JOIN InitialParticipant AS ip ON pg.ParticipantGroupID = ip.ParticipantGroupID
) AS ij1 INNER JOIN
(
SELECT3 * FROM tmpPartArgs AS tpa INNER JOIN
(
SELECT2 * FROM
(
SELECT1 * FROM tmpSimArgs AS tsa INNER JOIN Occurrence AS occ ON (tsa.SimRunID = occ.SimRunID) AND (tsa.SimulationID = occ.SimulationID)
) AS ij4 INNER JOIN OccurrenceParticipant AS occp ON (occOccurrenceID = occpOccurrenceID) AND (occSimRunID = occpSimRunID) AND (occSimulationID = occpSimulationID)
) AS ij3 ON tpa.participantID = ij3.occpParticipantID
) AS ij2 ON ij1.ipParticipantID = ij2.occpParticipantID WHERE (((ij2.RoleTypeID)=52 Or (ij2.RoleTypeID)=49)) AND ij2.HasSucceeded = 1
我正在处理的应用程序是Simulation应用程序,为了理解上述查询的上下文,我认为有必要对应用程序进行简要说明。让我们假设有一个拥有一些初始资源和生活代理的星球。允许行星存在1000年,并且监视代理执行的动作并将其存储在数据库中。 1000年后,地球被摧毁,并再次使用相同的初始资源和生活代理重新创建,这是第一次。这(创建和销毁)重复18次,并且在这1000年期间执行的代理的所有动作都存储在数据库中。因此,我们的整个实验包括18次重新创建,称为“模拟”。地球被重建的18次中的每一次被称为跑步,并且1000年的跑步中的每一次被称为时期。因此,“模拟”包含18次运行,每次运行包含1000个周期。在每次运行开始时,我们为“模拟”分配一组初始知识项和相互交互的动态代理和项目。知识项由知识存储内的代理存储。知识库也被认为是我们的模拟中的参与实体。但这个概念(关于知识库)并不重要。我试图详细说明每个SELECT语句和涉及的表。
SELECT1:我认为这个查询可以只用表'Occurrence'来代替,因为它什么也没做。表Occurrence存储代理在特定“模拟”的每个模拟运行的每个时段中采取的不同操作。通常每个'模拟'由18次运行组成。每次运行包含1000个周期。允许代理在“模拟”中的每次运行的每个时段中执行操作。但是Occurrence表不存储有关执行操作的代理的任何详细信息。 “发生”表可能存储与多个“模拟”相关的数据。
SELECT2:此查询仅返回在每次“模拟”运行的每个时段中执行的操作的详细信息,以及“模拟”的所有参与者的详细信息,如各自的ParticipantID。 OccurrenceParticipant表存储Simulation的每个参与实体的记录,包括代理,知识库,知识项等。
SELECT3:此查询仅返回伪表ij3中由代理和知识项引起的那些记录。 ij3中有关知识项的所有记录都将被过滤掉。
SELECT4:此查询将“描述”字段附加到“InitialParticipant”的每条记录。请注意,“描述”列是整个查询的“输出”列。表InitialParticipant包含每个代理和最初分配给'Simulation'的每个知识项的记录
SELECT5:此最终查询返回伪表ij2中的所有记录,参与实体的RoleType(可以是代理或知识项)为49或52。
答案 2 :(得分:0)
我建议将ij2.RoleTypeID过滤从最外层的查询移到ij3,使用IN而不是OR,并将HasSucceeded查询移动到ij4。