在sqlite中非常慢的多表联接

时间:2019-01-04 23:02:25

标签: performance sqlite

我不清楚为什么这么慢的查询:

SELECT count(*) FROM PanelsMeta
INNER JOIN Publishers ON PanelsMeta.publisherid = Publishers.id
INNER JOIN Geographies ON Geographies.geo = Publishers.geo;

使用查询分析器,我看到查询已被索引:

QUERY PLAN
|--SCAN TABLE PanelsMeta USING COVERING INDEX PanPubId
|--SEARCH TABLE Publishers USING INTEGER PRIMARY KEY (rowid=?)
`--SEARCH TABLE Geographies USING COVERING INDEX geos (geo=?)

表的大小如下:

sqlite> select count(*) from Publishers;
55
sqlite> select count(*) from PanelsMeta;
2948875
sqlite> select count(*) from Geographies;
37323

我在做什么错了?

我尝试产生的变化会产生相同的查询计划,并且变慢了几十分钟:

SELECT count(*) FROM Geographies
LEFT JOIN Publishers ON Publishers.geo = Geographies.geo 
LEFT JOIN PanelsMeta ON PanelsMeta.publisherid = Publishers.id;

# QUERY PLAN
# |--SCAN TABLE Geographies USING COVERING INDEX geos
# |--SEARCH TABLE Publishers USING COVERING INDEX PubGeo (geo=?)
# `--SEARCH TABLE PanelsMeta USING COVERING INDEX PanPubId (publisherid=?)

SELECT count(*) FROM Publishers
LEFT JOIN PanelsMeta ON PanelsMeta.publisherid = Publishers.id
LEFT JOIN Geographies ON Geographies.geo = Publishers.geo;

# QUERY PLAN
# |--SCAN TABLE Publishers USING COVERING INDEX PubGeo
# |--SEARCH TABLE PanelsMeta USING COVERING INDEX PanPubId (publisherid=?)
# `--SEARCH TABLE Geographies USING COVERING INDEX geos (geo=?)

更新

架构信息如下:

CREATE TABLE PanelsMeta(
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  f1 TEXT, 
  f2 TEXT, 
  f3 TEXT, 
  f4 DATETIME,
  f5 DATETIME,
  f6 TEXT, 
  f7 TEXT,
  publisherid INTEGER,
  FOREIGN KEY(publisherid) REFERENCES Publishers(id) ON DELETE CASCADE ON UPDATE CASCADE
);

CREATE INDEX ids ON PanelsMeta (id);
CREATE INDEX pp1 ON PanelsMeta (publisherid);
CREATE INDEX pp2 ON PanelsMeta (f1);
CREATE INDEX pp3 ON PanelsMeta (f1,publisherid);

CREATE TABLE Publishers(
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  geo TEXT,
  f3 TEXT NOT NULL, 
  f4 TEXT NOT NULL,
  f5 TEXT,
  f6 TEXT
);

CREATE INDEX zf3 ON Publishers (f3);
CREATE INDEX zgeo ON Publishers (Geo);
CREATE INDEX zf6 ON Publishers (f6);
CREATE INDEX zid ON Publishers (id);
CREATE INDEX zf3g ON Publishers (f3,geo);
CREATE INDEX zf3gf6 ON Publishers (f3,geo,f6);

CREATE TABLE Geographies(
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  geo TEXT NOT NULL,
  f3 TEXT NOT NULL,
  f4 TEXT,
  f5 DATETIME,
  f6 TEXT,
  f7 TEXT,
  f7 JSON DEFAULT '{}',
  f8 TEXT
);

CREATE INDEX g ON Geographies (geo);
CREATE INDEX gf3 ON Geographies (f3);

1 个答案:

答案 0 :(得分:1)

当我尝试将 6 个表与其中的每 (1 - 100) 行进行 INNER JOIN 时,我遇到了同样的问题。每个表只有一列。

但是我的完整数据集是 18 GB 和大约 1100 万行

我通过将所有数据放在一个表中,然后使用“where in”语句解决了这个问题。这很奇怪,但速度更快(大约 1 秒而不是几分钟)