我有3张桌子:
基本上,下载图像并在Pidl中创建日志记录。之后,它被调整大小并在Pirl中创建记录。所述记录与Pidl记录相关联。
我正在编写一个查询,以查找需要调整大小的图像,它基本上是查询Pidl。我设计的算法很简单:
for each Image in Pi {
pidlA=newest_pidl(Image);
if(pidlA.status == success) {
pirlA=newest_pirl(Image);
if(pirlA.pidl.hash != pidlA.hash)
{
go;
}
else if(pirlA.status != success){
failed_attempts = failed_pirl_count(pirlA,newest_succesful_pirl(Image))
decide based on pirlA.time and failed_attempts if go or not
}
else
{
dont go;
}
}
else
{
dont go;
}
}
现在我的查询(尽管还没有完成,失败的尝试部分丢失了,但它已经太慢了,所以首先我要修复它。)
SELECT
pidl1A.pidl_id
FROM Pidl as pidl1A
LEFT JOIN Pidl as pidl2A
ON (
pidl1A.pidl_pi_id = pidl2A.pidl_pi_id AND
pidl2A.pidl_status = 1 AND
(pidl2A.pidl_time > pidl1A.pidl_time OR
(pidl2A.pidl_id > pidl1A.pidl_id and pidl1A.pidl_time=pidl2A.pidl_time)
)
)
LEFT JOIN (
#newest pirl subquery#
SELECT
pidl1B.pidl_pi_id as sub_pi_id,
pidl1B.pidl_hash as sub_pidl_hash,
pirl1B.pirl_id as sub_pirl_id,
pirl1B.pirl_status as sub_pirl_status
FROM Pirl as pirl1B
INNER JOIN Pidl as pidl1B ON (pirl1B.pirl_pidl_id = pidl1B.pidl_id)
LEFT JOIN (
SELECT
pidl2B.pidl_pi_id as sub_pi_id,
pirl2B.pirl_id as sub_pirl_id,
pirl2B.pirl_time as sub_pirl_time
FROM Pirl as pirl2B
INNER JOIN Pidl as pidl2B ON (pirl2B.pirl_pidl_id = pidl2B.pidl_id)
WHERE 1
) as pirl3B
ON (
pirl3B.sub_pi_id = pidl1B.pidl_pi_id and
(pirl3B.sub_pirl_time > pirl1B.pirl_time or
(pirl3B.sub_pirl_time = pirl1B.pirl_time and
pirl3B.sub_pirl_id > pirl1B.pirl_id)
)
)
WHERE
pirl3B.sub_pirl_id is null
) as pirl1A
ON (pirl1A.sub_pi_id = pidl1A.pidl_pi_id)
WHERE
pidl1A.pidl_status = 1 AND pidl2A.pidl_id IS NULL
AND (
pirl1A.sub_pirl_id IS NULL
OR (
pidl1A.pidl_hash != pirl1A.sub_pidl_hash
)
OR (
pirl1A.sub_pirl_status != 1
)
)
这是我的数据库架构:
CREATE TABLE Pi (
`pi_id` int,
PRIMARY KEY (`pi_id`)
)
;
CREATE TABLE Pidl
(
`pidl_id` int,
`pidl_pi_id` int,
`pidl_status` int,
`pidl_time` int,
`pidl_hash` varchar(16),
PRIMARY KEY (`pidl_id`)
)
;
alter table Pidl
add constraint fk1_branchNo foreign key (pidl_pi_id) references Pi (pi_id);
CREATE TABLE Pirl
(
`pirl_id` int,
`pirl_pidl_id` int,
`pirl_status` int,
`pirl_time` int,
PRIMARY KEY (`pirl_id`)
)
;
alter table Pirl
add constraint fk2_branchNo foreign key (pirl_pidl_id) references Pidl (pidl_id);
INSERT INTO Pi
(`pi_id`)
VALUES
(3),
(4),
(5);
INSERT INTO Pidl
(`pidl_id`, `pidl_pi_id`,`pidl_status`,`pidl_time`, `pidl_hash`)
VALUES
(1, 3, 1,100, 'hashA'),
(2, 3, 1,150,'hashB'),
(3, 4, 2, 200,'hashC'),
(4, 3, 1, 200,'hashA')
;
INSERT INTO Pirl
(`pirl_id`, `pirl_pidl_id`,`pirl_status`,`pirl_time`)
VALUES
(1, 2, 0,100),
(2, 3, 1,150),
(3, 1, 2, 200)
;
当然有3条记录很快。但是大约10-30k需要超过5秒。我发现的是让它变慢的东西是其中的最后一部分:
AND (
pirl1A.sub_pirl_id IS NULL
OR (
pidl1A.pidl_hash != pirl1A.sub_pidl_hash
)
OR (
pirl1A.sub_pirl_status != 1
)
)
我发现的另一件奇怪的事情是,通过使用DISTINCT,查询速度更快但速度不够快。
答案 0 :(得分:1)
当我阅读您的要求时,我想出了这样的查询:
select pidl.*
from pidl left join
(select image, max(pidl_time) as pidl_time
from pidl
group by image
) maxpidl
on pidl.image = maxpidl.image and pidl.pidl_time = maxpidl.pidl_time
pirl
on pidl.hash = pirl.hash
where pirl.hash is null;
我认为您还有其他一些未完全解释的条件(例如状态的作用)。你应该能够加入它。
在MySQL中,您应该避免使用from
子句中的子查询。这些都是具体化的 - 因此 - 这项工作会产生额外的开销,而引擎也无法随后使用索引。
答案 1 :(得分:0)
您的查询未使用索引,而是使用子查询中的视图。这可能非常慢。我建议制作使用您需要的信息索引的新表格或物化视图。