我有一个非常基本的图像上传服务实现,您可以上传图像并标记它们。这是我的架构:
CREATE TABLE Tag(
orm_id INTEGER PRIMARY KEY AUTOINCREMENT,
pid_high UNSIGNED BIG INT NOT NULL,
pid_low UNSIGNED BIG INT NOT NULL,
name STRING NOT NULL,
CONSTRAINT KeyConstraint UNIQUE (pid_high, pid_low) ON CONFLICT FAIL);
CREATE TABLE TagBridge(
orm_id INTEGER PRIMARY KEY AUTOINCREMENT,
pid_high UNSIGNED BIG INT NOT NULL,
pid_low UNSIGNED BIG INT NOT NULL,
image_id_high UNSIGNED BIG INT NOT NULL,
image_id_low UNSIGNED BIG INT NOT NULL,
tag_id_high UNSIGNED BIG INT NOT NULL,
tag_id_low UNSIGNED BIG INT NOT NULL,
CONSTRAINT KeyConstraint UNIQUE (pid_high, pid_low) ON CONFLICT FAIL);
CREATE TABLE Image(
orm_id INTEGER PRIMARY KEY AUTOINCREMENT,
pid_high UNSIGNED BIG INT NOT NULL,
pid_low UNSIGNED BIG INT NOT NULL,
filehash STRING NOT NULL,
mime STRING NOT NULL,
uploadedDate INTEGER NOT NULL,
ratingsAverage REAL,
CONSTRAINT KeyConstraint UNIQUE (pid_high, pid_low) ON CONFLICT FAIL);
和指数
CREATE INDEX ImageTest on Image(pid_high, pid_low, uploadedDate DESC);
CREATE INDEX ImagefilehashIndex ON Image (filehash);
CREATE INDEX ImageuploadedDateIndex ON Image (uploadedDate);
CREATE INDEX TagnameIndex ON Tag (name);
有pid_high / pid_low字段而不是标准主键的原因是因为此服务使用客户端权威的128位GUID,但这不会显着影响查询速度。
由于这是互联网,此服务上的绝大多数图像都是猫,并标有“猫”。事实上,50,000张图片中约有47,000张标有“猫”字样。获取所有标记为'cat'的图像的查询是
select i.* from Tag t, TagBridge b, Image i
where
b.tag_id_high = t.pid_high AND b.tag_id_low = t.pid_low
AND b.image_id_high = i.pid_high and b.image_id_low = i.pid_low
AND t.name ='cat'
order by uploadedDate DESC LIMIT 20;
此查询计划是
sele order from deta
---- ------------- ---- ----
0 0 0 SEARCH TABLE Tag AS t USING INDEX TagnameIndex (name=?) (~1 rows)
0 1 1 SCAN TABLE TagBridge AS b (~472 rows)
0 2 2 SEARCH TABLE Image AS i USING INDEX ImageTest (pid_high=? AND pid_low=?) (~1 rows)
0 0 0 USE TEMP B-TREE FOR ORDER BY
这里的主要问题是最后一行,USE TEMP B-TREE FOR ORDER BY。这会显着减慢查询速度。如果没有'order by'子句,整个查询大约需要0.001秒才能运行。使用order by子句,查询需要0.483秒,这是400倍的性能损失。
我想在0.1秒内得到这个查询,但我不知道如何。我已经尝试了许多其他查询,添加和删除索引,但这是我能够运行的最快的。
答案 0 :(得分:3)
这是在过滤和排序索引之间进行选择的一般问题:
你应该保留一份热门标签列表(订购索引更有利),如果标签很受欢迎,请以某种方式禁止过滤索引,例如:
SELECT i.*
FROM Tag t, TagBridge b, Image i
WHERE b.tag_id_high = t.pid_high AND b.tag_id_low = t.pid_low
AND b.image_id_high = i.pid_high AND b.image_id_low = i.pid_low
AND t.name || '' = 'cat'
ORDER BY
i.uploadedDate DESC
LIMIT 20
或者,您可以对模式进行非规范化,并将uploadedDate
添加到TagBridge
,并使用触发器或其他任何内容填充它。然后在TagBridge (pid_high, pid_low, uploadedDate, image_id_high, image_id_low)
上创建一个复合索引,并稍微重写一下查询:
SELECT i.*
FROM TagBridge b, Image i
WHERE b.tag_id_high =
(
SELECT t.pid_high
FROM Tag t
WHERE t.name = 'cat'
)
AND b.tag_id_low =
(
SELECT t.pid_low
FROM Tag t
WHERE t.name = 'cat'
)
AND i.pid_high = b.image_id_high
AND i.pid_low = b.image_id_low
ORDER BY
b.uploadedDate DESC
LIMIT 20;
双子查询是因为SQLite
不理解元组语法。