问题:找出每个类别中一个类别中至少有10个项目的前2个用户。
表格结构:
CREATE TABLE items(
id INT AUTO_INCREMENT PRIMARY KEY,
datetime datetime,
category INT,
user INT,
items_count INT
);
示例数据:
INSERT INTO items (datetime, category, user, items_count) VALUES
('2013-01-01 00:00:00', 1, 1, 10),
('2013-01-01 00:00:01', 1, 2, 1),
('2013-01-01 00:00:02', 1, 3, 10),
('2013-01-01 00:00:03', 1, 2, 9),
('2013-01-01 00:00:00', 2, 4, 10),
('2013-01-01 00:00:01', 2, 1, 10),
('2013-01-01 00:00:01', 2, 5, 10);
期望的结果:
category user
1 1
1 3
2 4
2 5
注意:如结果所示,当多个用户同时满足要求时,我需要能够向用户显示偏好。
SQL小提琴:
http://sqlfiddle.com/#!2/58e60
这就是我的尝试:
SELECT
Derived.*,
IF (@category != Derived.category, @rank := 1, @rank := @rank + 1) AS rank,
@category := category
FROM(
SELECT
category,
user,
SUM(items_count) AS items_count,
MAX(datetime) AS datetime
FROM items
GROUP BY
category,
user
HAVING
SUM(items_count) >= 10
) AS Derived
JOIN(SELECT @rank := 0, @category := 0) AS r
HAVING
rank <= 2
ORDER BY
Derived.category,
Derived.datetime
但这是错误的。它不仅不考虑用户优先级,而且会产生错误的结果,例如:
('2013-01-01 00:00:00', 1, 1, 10),
('2013-01-01 00:00:01', 1, 2, 1),
('2013-01-01 00:00:02', 1, 3, 10),
('2013-01-01 00:00:03', 1, 2, 9),
('2013-01-01 00:00:10', 1, 3, 1);
其他信息:我不知道程序是否会在这种情况下产生影响,但不幸的是,它也不是一种选择。运行此查询的用户仅具有SELECT权限。
答案 0 :(得分:2)
为了找到满足您需求的用户,您需要累计计数总和。以下查询查找用户首次达到10个单位的情况。如果计数从不为负,那么只有一个:
select i.*
from (select i.*,
(select sum(items_count)
from items i2
where i2.user = i.user and
i2.category = i.category and
i2.datetime <= i.datetime
) as cumsum
from items i
) i
where cumsum - items_count < 10 and cumsum >= 10
order by datetime;
要获得前两个,您需要使用MySQL技巧在组内进行计数。这是一个通常有效的例子:
select i.*
from (select i.*, if(@prevc = category, @rn := @rn + 1, @rn := 1) as rn, @prevc := category
from (select i.*,
(select sum(items_count)
from items i2
where i2.user = i.user and
i2.category = i.category and
i2.datetime <= i.datetime
) as cumsum
from items i
) i
cross join
(select @rn := 0) const
where cumsum - items_count < 10 and cumsum >= 10
) i
where rn <= 2
order by category, datetime;
我对这种方法有疑问,因为MySQL中的任何内容都没有表示在计算@prevc := category
之后实际上会计算出这个表达式。但是,似乎是这种情况,这似乎在实践中起作用。
答案 1 :(得分:0)
我尝试了戈登的查询,但不幸的是它似乎不适用于大表;等了15分钟后,我决定杀了它。 但是下面的查询对我来说效果很好,它在大约8秒内通过一个约6M行的表来咀嚼它。
#Variable
SET @min_items = 10,
@max_users = 2,
@preferred_user = 5,
#Static
@category = 0,
@user = 0,
@items = 0,
@row_num = 1;
--
SELECT
category,
user,
datetime
FROM(
SELECT
category,
user,
datetime,
IF (@category = category, @row_num := @row_num + 1, @row_num := 1) AS row_num,
@category := category
FROM(
SELECT
category,
user,
datetime,
IF (@user != user, @items := 0, NULL),
IF (@items < @min_items, @items := @items + items_count, NULL) AS items_cumulative,
@user := user
FROM items
ORDER BY
category,
user,
datetime
) AS Derived
WHERE items_cumulative >= @min_items
ORDER BY
category,
datetime,
FIELD(user, @preferred_user, user)
) AS Derived
WHERE row_num <= @max_users;