使用UNION时,在MySQL查询中删除重复结果

时间:2009-10-01 12:43:23

标签: mysql union duplicate-removal

我有一个MySQL查询来获取最近有活动的项目。基本上用户可以发布评论或将其添加到他们的愿望清单,我希望获得在过去x天内有新评论或被放在某人愿望清单上的所有项目。

查询有点像这样(略微简化):

SELECT items.*, reaction.timestamp AS date FROM items
LEFT JOIN reactions ON reactions.item_id = items.id
WHERE reactions.timestamp > 1251806994
GROUP BY items.id

UNION

SELECT items.*, wishlists.timestamp AS date FROM items
LEFT JOIN wishlist ON wishlists.item_id = items.id
WHERE wishlists.timestamp > 1251806994
GROUP BY items.id

ORDER BY date DESC LIMIT 5

这样做有效,但是当某个项目已被放置在某人的心愿单上时,会发布评论,该项目会被返回两次。 UNION正常删除重复项,但由于date在两行之间不同,因此返回两行。我可以以某种方式告诉MySQL在删除重复行时忽略日期吗?

我也尝试过这样的事情:

SELECT items.*, IF(wishlists.id IS NOT NULL, wishlists.timestamp, reactions.timestamp) AS date FROM items
LEFT JOIN reactions ON reactions.item_id = items.id
LEFT JOIN wishlist ON wishlists.item_id = items.id

WHERE (wishlists.id IS NOT NULL AND wishlists.timestamp > 1251806994) OR
(reactions.id IS NOT NULL AND reactions.timestamp > 1251806994)
GROUP BY items.id

ORDER BY date DESC LIMIT 5

但由于某种原因(大约需要半分钟),结果变得非常缓慢。

3 个答案:

答案 0 :(得分:5)

我根据larryb82的想法自己解决了这个问题。我基本上做了以下事情:

SELECT * FROM (
    SELECT items.*, reaction.timestamp AS date FROM items
    LEFT JOIN reactions ON reactions.item_id = items.id
    WHERE reactions.timestamp > 1251806994
    GROUP BY items.id

    UNION

    SELECT items.*, wishlists.timestamp AS date FROM items
    LEFT JOIN wishlist ON wishlists.item_id = items.id
    WHERE wishlists.timestamp > 1251806994
    GROUP BY items.id

    ORDER BY date DESC LIMIT 5
) AS items

GROUP BY items.id
ORDER BY date DESC LIMIT 5

虽然我意识到这可能没有考虑每个项目的哪个日期最高......不确定是否重要,如果是,那该怎么做。

答案 1 :(得分:1)

不确定这是否会造成巨大的性能损失,但您可以尝试

SELECT item_field_1, item_field_2, ..., max(date) as date
FROM
  (the query you posted) 
GROUP BY item_field_1, item_field_2, ...

答案 2 :(得分:1)

我认为你根本不需要UNION。


SELECT item.*, GREATEST(COALESCE(wishlists.timestamp, 0), COALESCE(reaction.timestamp, 0)) as date
FROM items
LEFT JOIN reactions ON reactions.item_id = items.id AND reactions.timestamp > 1251806994
LEFT JOIN wishlists ON wishlists.item_id = items.id AND wishlists.timestamp > 1251806994
ORDER BY date DESC limit 5

上面使用LEFT JOIN可能非常慢,因为其中带有OR的谓词。您要求数据库将三个表连接在一起,然后检查该结果以获取时间戳信息。我的陈述应该形成一个较小的中间表。没有反应或愿望清单的项目将获得0的日期,这可能会导致他们不被报告。