我有一张这样的表:
create table images (
image_id serial primary key,
user_id int references users(user_id),
date_created timestamp with time zone
);
然后我有一个标签表,用于图片可以有的标签:
create table images_tags (
images_tag_id serial primary key,
image_id int references images(image_id),
tag_id int references tags(tag_id)
);
为了得到我想要的结果,我运行这样的查询:
select image_id,user_id,tag_id from images left join images_tags using(image_id)
where (?=-1 or user_id=?)
and (?=-1 or tag_id in (?, ?, ?, ?)) --have up to 4 tag_ids to search for
order by date_created desc limit 100;
问题是,我想基于唯一image_id
的数量来限制,因为我的输出将如下所示:
{"images":[
{"image_id":1, "tag_ids":[1, 2, 3]},
....
]}
注意我如何将tag_id
分组到数组中以进行输出,即使SQL为每个tag_id
和image_id
组合返回一行。
因此,当我说limit 100
时,我希望它适用于100个唯一的image_id
。
答案 0 :(得分:2)
也许你应该在每一行上放一张图片?如果可行,您可以这样做:
select image_id, user_id, string_agg(cast(tag_id as varchar(2000)), ',') as tags
from images left join
images_tags
using (image_id)
where (?=-1 or user_id=?) and
(?=-1 or tag_id in (?, ?, ?, ?)) --have up to 4 tag_ids to search for
group by image_id, user_id
order by date_created desc
limit 100;
如果这不起作用,请使用CTE:
with cte as (
select image_id, user_id, tag_id,
dense_rank() over (order by date_created desc) as seqnum
from images left join
images_tags
using (image_id)
where (?=-1 or user_id=?) and
(?=-1 or tag_id in (?, ?, ?, ?)) --have up to 4 tag_ids to search for
)
select *
from cte
where seqnum <= 100
order by seqnum;
答案 1 :(得分:1)
首先选择100个合格图像,然后加入images_tags
使用EXISTS
semi-join来满足images_tags上的条件,并注意使括号正确。
SELECT i.*, t.tag_id
FROM (
SELECT i.image_id, i.user_id
FROM images i
WHERE (? = -1 OR i.user_id = ?)
AND (? = -1 OR EXISTS (
SELECT 1
FROM images_tags t
WHERE t.image_id = i.image_id
AND t.tag_id IN (?, ?, ?, ?)
))
ORDER BY i.date_created DESC
LIMIT 100
) i
LEFT JOIN images_tags t
ON t.image_id = i.image_id
AND (? = -1 OR t.tag_id in (?, ?, ?, ?)) -- repeat condition
这应该比具有窗函数和CTE的解决方案更快
使用EXPLAIN ANLAYZE
测试性能。一如既往地运行几次来预热缓存。