按mysql中的范围分组

时间:2011-07-14 01:31:56

标签: mysql group-by range

Table:   
new_table                                                    
user_number  | diff                  
     2       |  0                      
     1       |  28  
     2       |  32  
     1       |  40  
     1       |  53  
     1       |  59  
     1       |  101  
     1       |  105  
     2       |  108  
     2       |  129  
     2       |  130    
     1       |  144  


            |(result)
            v

range  | number of users  
0-20   |  2  
21-41  |  3  
42-62  |  1  
63-83  |  2  
84-104 |  1  
105-135|  0  
136-156|  3


select t.range as [range], count(*) as [number of users]  
from (  
  select case    
    when diff between 0 and 20 then ' 0-20'  
    when diff between 21 and 41 then ' 21-41'  
    when diff between 42 and 62 then ' 42-62'  
    when diff between 63 and 83 then ' 63-83'  
    when diff between 84 and 104 then ' 84-104'  
    when diff between 105 and 135 then ' 105-135'  
    else '136-156'   
     end as range  
  from new_table) t  
group by t.diff  

Error:

You have an error in your SQL syntax, near '[range], count(*) as [number of users]  
from (  
  select case  
    when' at line 1  

10 个答案:

答案 0 :(得分:38)

以下是按范围分组的通用代码,因为执行case语句会非常麻烦。

函数'floor'可用于查找范围的底部(不是波希米亚使用的'round'),并添加金额(下例中为19)以查找范围的顶部。切记不要重叠范围的底部和顶部!

mysql> create table new_table (user_number int, diff int);
Query OK, 0 rows affected (0.14 sec)

mysql>  insert into new_table values (2, 0), (1, 28), (2, 32), (1, 40), (1, 53),
        (1, 59), (1, 101), (1, 105), (2, 108), (2, 129), (2, 130), (1, 144);
Query OK, 12 rows affected (0.01 sec)
Records: 12  Duplicates: 0  Warnings: 0

mysql> select concat(21*floor(diff/21), '-', 21*floor(diff/21) + 20) as `range`,
       count(*) as `number of users` from new_table group by 1 order by diff;
+---------+-----------------+
| range   | number of users |
+---------+-----------------+
| 0-20    |               1 |
| 21-41   |               3 |
| 42-62   |               2 |
| 84-104  |               1 |
| 105-125 |               2 |
| 126-146 |               3 |
+---------+-----------------+
6 rows in set (0.01 sec)

答案 1 :(得分:8)

这是一个适用于任何大小差异的解决方案:

select
  concat(21 * round(diff / 21), '-', 21 * round(diff / 21) + 20) as `range`,
  count(*) as `number of users`
from new_table
group by 1
order by diff;

这是一些可测试的代码及其输出:

create table new_table (user_number int, diff int);
insert into new_table values (2, 0), (1, 28), (2, 32), (1, 40), (1, 53), (1, 59), (1, 101), (1, 105), (2, 108), (2, 129), (2, 130), (1, 144); 
-- run query, output is: 
+---------+-----------------+
| range   | number of users |
+---------+-----------------+
| 0-20    |               1 |
| 21-41   |               1 |
| 42-62   |               2 |
| 63-83   |               2 |
| 105-125 |               3 |
| 126-146 |               2 |
| 147-167 |               1 |
+---------+-----------------+

答案 2 :(得分:6)

Mysql作为关键字的分隔符使用反引号“”“,而不是方括号(如sql server)

答案 3 :(得分:6)

如果您有常规范围,更快的解决方案是在div函数的帮助下进行分组。

例如:

select diff div 20 as range, sum(user_number)
from new_table
group by diff div 20;

在这种情况下,范围表示为单个数字,您必须知道它们的含义:0 = 0-19,1 = 20-39,2 = 40-59,...

如果你需要不同的范围,可以使用不同的分隔符,或者从diff中减去一些数字。例如"(diff - 1)div 10"给你范围1-10,11-20,21-30,......

答案 4 :(得分:1)

range是一个mysql关键字。你应该使用':

“scape”它
select t.`range` as [`range`], ...

答案 5 :(得分:0)

一个明显的错误: Mysql使用反引号(

`

),而不是[](作为sqlserver)。将t.range as [range], count(*) as [number of users]更改为

t.range as `range`, count(*) as `number of users`

答案 6 :(得分:0)

select 
case
when diff between 0 and 20 then ' 0-20'
when diff between 0 and 20 then ' 21-41'
when diff between 0 and 20 then ' 42-62'
when diff between 0 and 20 then ' 63-83'
when diff between 0 and 20 then ' 84-104'
when diff between 0 and 20 then ' 105-135'
else '136-156'
end; as 'range',
count(*) as 'number of users'


from new_table
group by range

答案 7 :(得分:0)

您可能需要查看Are square brackets valid in an SQL query?

我怀疑'['和']'用于Microsoft的SQL而不是mysql。

答案 8 :(得分:0)

这不是这个问题的确切解决方案,但对于其他人来说只是类似的建议。我也需要创建数字桶,因为如果我按数字分组,我会得到 9k 个不同的值。

我需要较少的组数。

我通过对数分组(并取整)来管理它。现在,我只有 18 个组,而不是 9k 个组。 (然后我会将它用于 PDF 或 CDF 以进行 1-x 尺度分数计算)。

SELECT COUNT(*) AS `Rows`, round(log(`diff`)) f FROM `users` GROUP BY f ORDER BY f

enter image description here

答案 9 :(得分:0)

这里有一种更通用的 SQL 分箱方法:

SELECT
    concat(
        binsize * floor(diff / binsize),
        ' - ',
        binsize * floor(diff / binsize) + binsize - 1
    ) as range,
    count(*) as number_of_rows
FROM
    new_table,
    (
        SELECT
            21 as binsize
        FROM dual
    ) as prm
GROUP BY 1
ORDER BY floor(diff / binsize)

这样,您只需在来自 dual 的子查询中提供一次范围的大小(称为 bins)。

子查询返回一个大小为 1 的两个维度的表,单行单列。该表与另一个表的每一行交叉制表,因此它的值可以在第一个表的每一行中访问。这无需指定连接条件即可工作。

只要只返回一行,就可以向子查询添加参数。例如,您可以通过这种方式定义上限和下限以从结果中排除某些特征:

SELECT
    concat(
        binsize * floor(diff / binsize),
        ' - ',
        binsize * floor(diff / binsize) + binsize - 1
    ) as range,
    count(*) as number_of_rows
FROM
    new_table,
    (
        SELECT
            21 as binsize,
            21 as above,
            83 as below
        FROM dual
    ) as prm
WHERE
    diff >= above
    AND diff <= below
GROUP BY 1
ORDER BY floor(diff / binsize)

如果您的 RDBMS 支持它,请考虑将您的查询重构为 CTE(公用表表达式),这有助于通过将参数声明放在整个语句的开头来使表达式看起来更整洁:

>
WITH prms as (
    SELECT
        21 as binsize,
        21 as above,
        83 as below
    FROM dual
)
SELECT
    concat(
        binsize * floor(diff / binsize),
        ' - ',
        binsize * floor(diff / binsize) + binsize - 1
    ) as range,
    count(*) as number_of_rows
FROM
    new_table, prms
WHERE
    diff >= above
    AND diff <= below
GROUP BY 1
ORDER BY floor(diff / binsize)