Question

我有以下表格：

TABLE product
id int(11)
title varchar(400)

TABLE tag
id int(11)
text varchar(100)

TABLE product_tag_map
product_id int(11)
tag_id int(11)

PRODUCT_TAG_MAP将代码映射到产品。系统中标签的分布是不正常的，即一些标签比其他标签具有更多的产品。

我正在尝试编写一个可以获取25个随机产品的SQL：每个标签5个产品，5个标签（因此5x5 = 25）。

在这里找到答案：How can I get an even distribution using WHERE id IN(1,2,3,4)

但这不会产生随机产品 - 它总是为每个标签提取相同的产品。

这是我的SQL：

SET @last_tag = 0;
SET @count_tag = 0;

SELECT DISTINCT id FROM (
SELECT
  product.*,
  @count_tag := IF(@last_tag = product_tag_map.tag_id, @count_tag, 0) + 1 AS tag_row_number,
  @last_tag := product_tag_map.tag_id
FROM product
  LEFT JOIN product_tag_map ON (product_tag_map.product_id=product.id)
WHERE
  product_tag_map.tag_id IN (245,255,259,281,296)
) AS subquery WHERE tag_row_number <= 5;

如何让每个标签返回随机产品？

任何帮助将不胜感激！感谢。

Answer 1

此查询中有很多技巧：

添加嵌套级别以在子查询中使用LIMIT：mySQL subquery limit
为MySQL添加row_number功能：How to select the first/least/max row per group in SQL

最终结果是很多子查询：

SELECT tag.Name, t0.Id as MapId
FROM
(
    SELECT * 
         , @num := if(@type = tag_id, @num + 1, 1) as row_number
         , @type := tag_id as dummy
    FROM (
        SELECT *
        FROM map m
        WHERE tag_id in
        (
            SELECT *
            FROM
            (
                SELECT id
                FROM tag
                ORDER BY RAND() LIMIT 5
            ) t
        )
     ORDER BY tag_id, RAND()  
  ) mainTable
  , (SELECT @num:=0) foo
  , (SELECT @type:=0) foo2
) t0 
    INNER JOIN tag
        ON t0.tag_id = tag.id
WHERE row_number <= 5

SQL Fiddle

想法是选择前5个随机标签。这并不困难，只是一个简单的ORDER BY RAND() LIMIT 5。

然后棘手的部分太模拟ROW_NUMBER() OVER(PARTITION BY tag_id, RAND())，因为随机排列每个项目，但按标签分区正是您所需要的。所以你声明变量并按照查询显示。

最后，过滤row_number，你有25个随机项！

Answer 2

我也想提供“粗暴”的力量方法。这适用于大多数数据库（尽管rand()函数可能被命名为其他东西）。

select content_item_id from content_item where tag_id = 245 order by RAND() limit 5
union all
select content_item_id from content_item where tag_id = 255 order by RAND() limit 5
union all
select content_item_id from content_item where tag_id = 259 order by RAND() limit 5
union all
select content_item_id from content_item where tag_id = 281 order by RAND() limit 5
union all
select content_item_id from content_item where tag_id = 206 order by RAND() limit 5

如果您在content_item(tag_id)上有索引，那么这种表现可能没问题。

SQL从n组获得均匀分布 - 获取随机项

2 个答案: