以下查询:
SELECT
year, id, rate
FROM h
WHERE year BETWEEN 2000 AND 2009
AND id IN (SELECT rid FROM table2)
GROUP BY id, year
ORDER BY id, rate DESC
的产率:
year id rate
2006 p01 8
2003 p01 7.4
2008 p01 6.8
2001 p01 5.9
2007 p01 5.3
2009 p01 4.4
2002 p01 3.9
2004 p01 3.5
2005 p01 2.1
2000 p01 0.8
2001 p02 12.5
2004 p02 12.4
2002 p02 12.2
2003 p02 10.3
2000 p02 8.7
2006 p02 4.6
2007 p02 3.3
我想要的只是每个ID的前5个结果:
2006 p01 8
2003 p01 7.4
2008 p01 6.8
2001 p01 5.9
2007 p01 5.3
2001 p02 12.5
2004 p02 12.4
2002 p02 12.2
2003 p02 10.3
2000 p02 8.7
有没有办法在GROUP BY中使用某种类似LIMIT的修饰符来做到这一点?
答案 0 :(得分:96)
您可以使用GROUP_CONCAT汇总函数将所有年份归入单个列,按id
分组并按rate
排序:
SELECT id, GROUP_CONCAT(year ORDER BY rate DESC) grouped_year
FROM yourtable
GROUP BY id
结果:
-----------------------------------------------------------
| ID | GROUPED_YEAR |
-----------------------------------------------------------
| p01 | 2006,2003,2008,2001,2007,2009,2002,2004,2005,2000 |
| p02 | 2001,2004,2002,2003,2000,2006,2007 |
-----------------------------------------------------------
然后你可以使用FIND_IN_SET,它返回第二个参数中第一个参数的位置,例如。
SELECT FIND_IN_SET('2006', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000');
1
SELECT FIND_IN_SET('2009', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000');
6
使用GROUP_CONCAT
和FIND_IN_SET
的组合,并按find_in_set返回的位置进行过滤,然后您可以使用此查询返回每个ID的前5年:
SELECT
yourtable.*
FROM
yourtable INNER JOIN (
SELECT
id,
GROUP_CONCAT(year ORDER BY rate DESC) grouped_year
FROM
yourtable
GROUP BY id) group_max
ON yourtable.id = group_max.id
AND FIND_IN_SET(year, grouped_year) BETWEEN 1 AND 5
ORDER BY
yourtable.id, yourtable.year DESC;
请参阅小提琴here。
请注意,如果多行可以具有相同的费率,则应考虑在费率列而不是年份列上使用GROUP_CONCAT(DISTINCT费率ORDER BY费率)。
GROUP_CONCAT返回的字符串的最大长度是有限的,因此如果您需要为每个组选择一些记录,这将很有效。
答案 1 :(得分:78)
original query在派生表上使用了用户变量和ORDER BY
;两个怪癖的行为都无法保证。修改后的答案如下。
在MySQL 5.x中,您可以使用穷人的分级而非分区来获得所需的结果。只需外部连接表格本身和每行,计算 较小 的行数。在上述情况中,较小的行是具有较高速率的行:
SELECT t.id, t.rate, t.year, COUNT(l.rate) AS rank
FROM t
LEFT JOIN t AS l ON t.id = l.id AND t.rate < l.rate
GROUP BY t.id, t.rate, t.year
HAVING COUNT(l.rate) < 5
ORDER BY t.id, t.rate DESC, t.year
| id | rate | year | rank |
|-----|------|------|------|
| p01 | 8.0 | 2006 | 0 |
| p01 | 7.4 | 2003 | 1 |
| p01 | 6.8 | 2008 | 2 |
| p01 | 5.9 | 2001 | 3 |
| p01 | 5.3 | 2007 | 4 |
| p02 | 12.5 | 2001 | 0 |
| p02 | 12.4 | 2004 | 1 |
| p02 | 12.2 | 2002 | 2 |
| p02 | 10.3 | 2003 | 3 |
| p02 | 8.7 | 2000 | 4 |
请注意,如果费率有关系,例如:
100, 90, 90, 80, 80, 80, 70, 60, 50, 40, ...
上述查询将返回6行:
100, 90, 90, 80, 80, 80
更改为HAVING COUNT(DISTINCT l.rate) < 5
以获得8行:
100, 90, 90, 80, 80, 80, 70, 60
或者更改为ON t.id = l.id AND (t.rate < l.rate OR (t.rate = l.rate AND t.pri_key > l.pri_key))
以获得5行:
100, 90, 90, 80, 80
在MySQL 8或更高版本中,只需使用RANK
, DENSE_RANK
or ROW_NUMBER
函数:
SELECT *
FROM (
SELECT *, RANK() OVER (PARTITION BY id ORDER BY rate DESC) AS rnk
FROM t
) AS x
WHERE rnk <= 5
答案 2 :(得分:14)
对我来说像是
SUBSTRING_INDEX(group_concat(col_name order by desired_col_order_name), ',', N)
完美无缺。没有复杂的查询。
例如:每组获得前1名
SELECT
*
FROM
yourtable
WHERE
id IN (SELECT
SUBSTRING_INDEX(GROUP_CONCAT(id
ORDER BY rate DESC),
',',
1) id
FROM
yourtable
GROUP BY year)
ORDER BY rate DESC;
答案 3 :(得分:8)
不,您不能任意限制子查询(您可以在较新的MySQL中有限地执行此操作,但不能在每个组中获得5个结果)。
这是一个groupwise-maximum类型查询,在SQL中这不是一件容易的事。有various ways要解决的问题,对于某些情况可能更有效,但对于top-n,一般情况下,您需要查看Bill's answer之前类似的问题。
与此问题的大多数解决方案一样,如果有多个行具有相同的rate
值,则它可以返回超过五行,因此您可能仍需要大量的后处理来检查该行。 / p>
答案 4 :(得分:8)
这需要一系列子查询来对值进行排名,限制它们,然后在分组时执行求和
@Rnk:=0;
@N:=2;
select
c.id,
sum(c.val)
from (
select
b.id,
b.bal
from (
select
if(@last_id=id,@Rnk+1,1) as Rnk,
a.id,
a.val,
@last_id=id,
from (
select
id,
val
from list
order by id,val desc) as a) as b
where b.rnk < @N) as c
group by c.id;
答案 5 :(得分:8)
试试这个:
SELECT h.year, h.id, h.rate
FROM (SELECT h.year, h.id, h.rate, IF(@lastid = (@lastid:=h.id), @index:=@index+1, @index:=0) indx
FROM (SELECT h.year, h.id, h.rate
FROM h
WHERE h.year BETWEEN 2000 AND 2009 AND id IN (SELECT rid FROM table2)
GROUP BY id, h.year
ORDER BY id, rate DESC
) h, (SELECT @lastid:='', @index:=0) AS a
) h
WHERE h.indx <= 5;
答案 6 :(得分:4)
构建虚拟列(如Oracle中的RowID)
表:
`
CREATE TABLE `stack`
(`year` int(11) DEFAULT NULL,
`id` varchar(10) DEFAULT NULL,
`rate` float DEFAULT NULL)
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
`
数据:
insert into stack values(2006,'p01',8);
insert into stack values(2001,'p01',5.9);
insert into stack values(2007,'p01',5.3);
insert into stack values(2009,'p01',4.4);
insert into stack values(2001,'p02',12.5);
insert into stack values(2004,'p02',12.4);
insert into stack values(2005,'p01',2.1);
insert into stack values(2000,'p01',0.8);
insert into stack values(2002,'p02',12.2);
insert into stack values(2002,'p01',3.9);
insert into stack values(2004,'p01',3.5);
insert into stack values(2003,'p02',10.3);
insert into stack values(2000,'p02',8.7);
insert into stack values(2006,'p02',4.6);
insert into stack values(2007,'p02',3.3);
insert into stack values(2003,'p01',7.4);
insert into stack values(2008,'p01',6.8);
像这样的SQL:
select t3.year,t3.id,t3.rate
from (select t1.*, (select count(*) from stack t2 where t1.rate<=t2.rate and t1.id=t2.id) as rownum from stack t1) t3
where rownum <=3 order by id,rate DESC;
如果删除t3中的where子句,则显示如下:
获得“TOP N Record” - &gt;在where子句(t3的where子句)中添加“rownum&lt; = 3”;
选择“年份” - &gt;在where子句(t3的where子句)中添加“BETWEEN 2000 AND 2009”;
答案 7 :(得分:3)
做了一些工作,但是我认为我的解决方案可以分享,因为它看似优雅而且非常快。
SELECT h.year, h.id, h.rate
FROM (
SELECT id,
SUBSTRING_INDEX(GROUP_CONCAT(CONCAT(id, '-', year) ORDER BY rate DESC), ',' , 5) AS l
FROM h
WHERE year BETWEEN 2000 AND 2009
GROUP BY id
ORDER BY id
) AS h_temp
LEFT JOIN h ON h.id = h_temp.id
AND SUBSTRING_INDEX(h_temp.l, CONCAT(h.id, '-', h.year), 1) != h_temp.l
请注意,此示例是为了问题的目的而指定的,并且可以很容易地修改以用于其他类似目的。
答案 8 :(得分:2)
以下帖子:sql: selcting top N record per group描述了在没有子查询的情况下实现此目的的复杂方法。
通过以下方式改进了此处提供的其他解决方案:
答案 9 :(得分:2)
SELECT year, id, rate
FROM (SELECT
year, id, rate, row_number() over (partition by id order by rate DESC)
FROM h
WHERE year BETWEEN 2000 AND 2009
AND id IN (SELECT rid FROM table2)
GROUP BY id, year
ORDER BY id, rate DESC) as subquery
WHERE row_number <= 5
子查询几乎与您的查询完全相同。只有更改正在添加
row_number() over (partition by id order by rate DESC)
答案 10 :(得分:1)
对于那些有查询超时的人。我在下面使用限制和特定组别的任何其他内容。
DELIMITER $$
CREATE PROCEDURE count_limit200()
BEGIN
DECLARE a INT Default 0;
DECLARE stop_loop INT Default 0;
DECLARE domain_val VARCHAR(250);
DECLARE domain_list CURSOR FOR SELECT DISTINCT domain FROM db.one;
OPEN domain_list;
SELECT COUNT(DISTINCT(domain)) INTO stop_loop
FROM db.one;
-- BEGIN LOOP
loop_thru_domains: LOOP
FETCH domain_list INTO domain_val;
SET a=a+1;
INSERT INTO db.two(book,artist,title,title_count,last_updated)
SELECT * FROM
(
SELECT book,artist,title,COUNT(ObjectKey) AS titleCount, NOW()
FROM db.one
WHERE book = domain_val
GROUP BY artist,title
ORDER BY book,titleCount DESC
LIMIT 200
) a ON DUPLICATE KEY UPDATE title_count = titleCount, last_updated = NOW();
IF a = stop_loop THEN
LEAVE loop_thru_domain;
END IF;
END LOOP loop_thru_domain;
END $$
它循环遍历一个域列表,然后只插入每个200的限制
答案 11 :(得分:1)
试试这个:
SET @num := 0, @type := '';
SELECT `year`, `id`, `rate`,
@num := if(@type = `id`, @num + 1, 1) AS `row_number`,
@type := `id` AS `dummy`
FROM (
SELECT *
FROM `h`
WHERE (
`year` BETWEEN '2000' AND '2009'
AND `id` IN (SELECT `rid` FROM `table2`) AS `temp_rid`
)
ORDER BY `id`
) AS `temph`
GROUP BY `year`, `id`, `rate`
HAVING `row_number`<='5'
ORDER BY `id`, `rate DESC;
答案 12 :(得分:0)
请尝试以下存储过程。我已经验证过。我得到了正确的结果但没有使用groupby
。
CREATE DEFINER=`ks_root`@`%` PROCEDURE `first_five_record_per_id`()
BEGIN
DECLARE query_string text;
DECLARE datasource1 varchar(24);
DECLARE done INT DEFAULT 0;
DECLARE tenants varchar(50);
DECLARE cur1 CURSOR FOR SELECT rid FROM demo1;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
SET @query_string='';
OPEN cur1;
read_loop: LOOP
FETCH cur1 INTO tenants ;
IF done THEN
LEAVE read_loop;
END IF;
SET @datasource1 = tenants;
SET @query_string = concat(@query_string,'(select * from demo where `id` = ''',@datasource1,''' order by rate desc LIMIT 5) UNION ALL ');
END LOOP;
close cur1;
SET @query_string = TRIM(TRAILING 'UNION ALL' FROM TRIM(@query_string));
select @query_string;
PREPARE stmt FROM @query_string;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END
答案 13 :(得分:0)
如何在每个组中获得N个结果
您可以使用UNION而不是GROUP,并在每个SELECT语句中设置LIMIT。
匹配以下值的数组的示例:
(
SELECT * FROM tablename
WHERE column = '".$myValueArray[$n]."'
ORDER BY column DESC
LIMIT 4
)
UNION
(
SELECT * FROM tablename
WHERE column = '".$myValueArray[$n+1]."'
ORDER BY column DESC
LIMIT 4
)
UNION
(
SELECT * FROM tablename
WHERE column = '".$myValueArray[$n+2]."'
ORDER BY column DESC
LIMIT 4
);
这对于大集合来说有点密集/昂贵。但是对于较小的集合可能是一个很好的解决方案。