Question

我有一个非常简单的表：

CREATE TABLE IF NOT EXISTS LuxLog (
  Sensor TINYINT,
  Lux INT,
  PRIMARY KEY(Sensor)
)

它包含来自不同传感器的数千条日志。

我希望所有传感器都有Q1和Q3。

我可以对每个数据进行一次查询，但对我来说最好对所有传感器进行一次查询（从一个查询中获取Q1和Q3）

我虽然这是一个相当简单的操作，因为四分位数被广泛使用并且是频率计算中的主要统计变量之一。事实是，我发现了大量过于复杂的解决方案，而我希望找到一些简洁明了的东西。

任何人都可以给我一个提示吗？

编辑：这是我在网上找到的一段代码，但它对我不起作用：

SELECT  SUBSTRING_INDEX(
        SUBSTRING_INDEX(
            GROUP_CONCAT(                 -- 1) make a sorted list of values
                Lux
                ORDER BY Lux
                SEPARATOR ','
            )
        ,   ','                           -- 2) cut at the comma
        ,   75/100 * COUNT(*)        --    at the position beyond the 90% portion
        )
    ,   ','                               -- 3) cut at the comma
    ,   -1                                --    right after the desired list entry
    )                 AS `75th Percentile`
    FROM    LuxLog
    WHERE   Sensor=12
    AND     Lux<>0

我得到1作为返回值，而它应该是一个可以除以10（10,20,30 ..... 1000）的数字

Answer 1

参见SqlFiddle：http://sqlfiddle.com/#!9/accca6/2/6 注意：对于我已生成100行的sqlfiddle，1到100之间的每个整数都有一行，但它是一个随机顺序（在excel中完成）。

以下是代码：

SET @number_of_rows := (SELECT COUNT(*) FROM LuxLog);
SET @quartile := (ROUND(@number_of_rows*0.25));
SET @sql_q1 := (CONCAT('(SELECT "Q1" AS quartile_name , Lux, Sensor FROM LuxLog ORDER BY Lux DESC LIMIT 1 OFFSET ', @quartile,')'));
SET @sql_q3 := (CONCAT('( SELECT "Q3" AS quartile_name , Lux, Sensor FROM LuxLog ORDER BY Lux ASC LIMIT 1 OFFSET ', @quartile,');'));
SET @sql := (CONCAT(@sql_q1,' UNION ',@sql_q3));
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;

编辑：

SET @current_sensor := 101;
SET @quartile := (ROUND((SELECT COUNT(*) FROM LuxLog WHERE Sensor = @current_sensor)*0.25));
SET @sql_q1 := (CONCAT('(SELECT "Q1" AS quartile_name , Lux, Sensor FROM LuxLog WHERE Sensor=', @current_sensor,' ORDER BY Lux DESC LIMIT 1 OFFSET ', @quartile,')'));
SET @sql_q3 := (CONCAT('( SELECT "Q3" AS quartile_name , Lux, Sensor FROM LuxLog WHERE Sensor=', @current_sensor,' ORDER BY Lux ASC LIMIT 1 OFFSET ', @quartile,');'));
SET @sql := (CONCAT(@sql_q1,' UNION ',@sql_q3));
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;

基本推理如下：对于四分位数1，我们希望从顶部获得25％，因此我们想知道有多少行，即：

SET @number_of_rows := (SELECT COUNT(*) FROM LuxLog);

现在我们知道了行数，我们想知道25％的行数，就是这一行：

SET @quartile := (ROUND(@number_of_rows*0.25));

然后找到一个四分位数，我们想要按Lux订购LuxLog表，然后得到行号“@quartile”，为了做到这一点，我们将OFFSET设置为@quartile，说我们要开始我们的选择从行号@quartile我们说限制1表示我们只想检索一行。那是：

SET @sql_q1 := (CONCAT('(SELECT "Q1" AS quartile_name , Lux, Sensor FROM LuxLog ORDER BY Lux DESC LIMIT 1 OFFSET ', @quartile,')'));

对于其他四分位数我们（差不多）相同，但不是从顶部开始（从较高值到较低值），我们从底部开始（它解释了ASC）。

但是现在我们只在变量@sql_q1和@sql_q3中存储了字符串，因此连接它们，我们将查询的结果联合起来，我们准备查询并执行它。

Answer 2

使用NTILE非常简单，但它是Postgres功能。你基本上只是做这样的事情：

SELECT value_you_are_NTILING,
    NTILE(4) OVER (ORDER BY value_you_are_NTILING DESC) AS tiles
FROM
(SELECT math_that_gives_you_the_value_you_are_NTILING_here AS value_you_are_NTILING FROM tablename);

这是我在SQLFiddle上为您做的一个简单示例：http://sqlfiddle.com/#!15/7f05a/1

在MySQL中你会使用RANK ......这是SQLFiddle：http://www.sqlfiddle.com/#!2/d5587/1（这来自下面链接的问题）

这个MySQL RANK（）的使用来自这里回答的Stackoverflow：Rank function in MySQL

寻找Salman A的答案。

Answer 3

这样的事情应该这样做：

select
    ll.*,
    if (a.position is not null, 1,
        if (b.position is not null, 2, 
        if (c.position is not null, 3, 
        if (d.position is not null, 4, 0)))
    ) as quartile
from
    luxlog ll
    left outer join luxlog a on ll.position = a.position and a.lux > (select count(*)*0.00 from luxlog) and a.lux <= (select count(*)*0.25 from luxlog)
    left outer join luxlog b on ll.position = b.position and b.lux > (select count(*)*0.25 from luxlog) and b.lux <= (select count(*)*0.50 from luxlog)
    left outer join luxlog c on ll.position = c.position and c.lux > (select count(*)*0.50 from luxlog) and c.lux <= (select count(*)*0.75 from luxlog)
    left outer join luxlog d on ll.position = d.position and d.lux > (select count(*)*0.75 from luxlog)
;

以下是完整的示例：

use example;

drop table if exists luxlog;

CREATE TABLE LuxLog (
  Sensor TINYINT,
  Lux INT,
  position int,
  PRIMARY KEY(Position)
);

insert into luxlog values (0, 1, 10);
insert into luxlog values (0, 2, 20);
insert into luxlog values (0, 3, 30);
insert into luxlog values (0, 4, 40);
insert into luxlog values (0, 5, 50);
insert into luxlog values (0, 6, 60);
insert into luxlog values (0, 7, 70);
insert into luxlog values (0, 8, 80);

select count(*)*.25 from luxlog;
select count(*)*.50 from luxlog;

select
    ll.*,
    a.position,
    b.position,
    if(
        a.position is not null, 1,
        if (b.position is not null, 2, 0)
    ) as quartile
from
    luxlog ll
    left outer join luxlog a on ll.position = a.position and a.lux >= (select count(*)*0.00 from luxlog) and a.lux < (select count(*)*0.25 from luxlog)
    left outer join luxlog b on ll.position = b.position and b.lux >= (select count(*)*0.25 from luxlog) and b.lux < (select count(*)*0.50 from luxlog)
    left outer join luxlog c on ll.position = c.position and c.lux >= (select count(*)*0.50 from luxlog) and c.lux < (select count(*)*0.75 from luxlog)
    left outer join luxlog d on ll.position = d.position and d.lux >= (select count(*)*0.75 from luxlog) and d.lux < (select count(*)*1.00 from luxlog)
;    


select
    ll.*,
    if (a.position is not null, 1,
        if (b.position is not null, 2, 
        if (c.position is not null, 3, 
        if (d.position is not null, 4, 0)))
    ) as quartile
from
    luxlog ll
    left outer join luxlog a on ll.position = a.position and a.lux > (select count(*)*0.00 from luxlog) and a.lux <= (select count(*)*0.25 from luxlog)
    left outer join luxlog b on ll.position = b.position and b.lux > (select count(*)*0.25 from luxlog) and b.lux <= (select count(*)*0.50 from luxlog)
    left outer join luxlog c on ll.position = c.position and c.lux > (select count(*)*0.50 from luxlog) and c.lux <= (select count(*)*0.75 from luxlog)
    left outer join luxlog d on ll.position = d.position and d.lux > (select count(*)*0.75 from luxlog)
;

Answer 4

或者你可以使用这样的等级：

select
    ll.*,
    @curRank := @curRank + 1 as rank,
    if (@curRank <= (select count(*)*0.25 from luxlog), 1,
        if (@curRank <= (select count(*)*0.50 from luxlog), 2, 
        if (@curRank <= (select count(*)*0.75 from luxlog), 3, 4))
    ) as quartile
from
    luxlog ll,
    (SELECT @curRank := 0) r
;

这将为每个四分位数提供一条记录：

select
    x.quartile, group_concat(position)
from (
    select
        ll.*,
        @curRank := @curRank + 1 as rank,
        if (@curRank > 0 and @curRank <= (select count(*)*0.25 from luxlog), 1,
            if (@curRank > 0 and @curRank <= (select count(*)*0.50 from luxlog), 2, 
            if (@curRank > 0 and @curRank <= (select count(*)*0.75 from luxlog), 3, 4))
        ) as quartile
    from
        luxlog ll,
        (SELECT @curRank := 0) r
) x
group by quartile

+ ------------- + --------------------------- +
| quartile      | group_concat(position)      |
+ ------------- + --------------------------- +
| 1             | 10,20                       |
| 2             | 30,40                       |
| 3             | 50,60                       |
| 4             | 70,80                       |
+ ------------- + --------------------------- +
4 rows

编辑： sqlFiddle示例（http://sqlfiddle.com/#!9/a14a4/17）在删除后显示如下

/*SET @number_of_rows := (SELECT COUNT(*) FROM LuxLog);
SET @quartile := (ROUND(@number_of_rows*0.25));
SET @sql_q1 := (CONCAT('(SELECT "Q1" AS quartile_name , Lux, Sensor FROM LuxLog WHERE Sensor=101 ORDER BY Lux DESC LIMIT 1 OFFSET ', @quartile,')'));
SET @sql_q3 := (CONCAT('( SELECT "Q3" AS quartile_name , Lux, Sensor FROM LuxLog WHERE Sensor=101 ORDER BY Lux ASC LIMIT 1 OFFSET ', @quartile,');'));
SET @sql := (CONCAT(@sql_q1,' UNION ',@sql_q3));
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;*/

enter image description here

Answer 5

这是我提出的用于计算四分位数的查询;它运行在~0.04s w / ~5000个表行中。我包含了最小值/最大值，因为我最终使用这些数据来构建四个四分位数范围：

   SELECT percentile_table.percentile, avg(ColumnName) AS percentile_values
    FROM   
        (SELECT @rownum := @rownum + 1 AS `row_number`, 
                   d.ColumnName 
            FROM   PercentileTestTable d, 
                   (SELECT @rownum := 0) r 
            WHERE  ColumnName IS NOT NULL 
            ORDER  BY d.ColumnName
        ) AS t1, 
        (SELECT count(*) AS total_rows 
            FROM   PercentileTestTable d 
            WHERE  ColumnName IS NOT NULL 
        ) AS t2, 
        (SELECT 0 AS percentile 
            UNION ALL 
            SELECT 0.25
            UNION ALL 
            SELECT 0.5
            UNION ALL 
            SELECT 0.75
            UNION ALL 
            SELECT 1
        ) AS percentile_table  
    WHERE  
        (percentile_table.percentile != 0 
            AND percentile_table.percentile != 1 
            AND t1.row_number IN 
            ( 
                floor(( total_rows + 1 ) * percentile_table.percentile), 
                floor(( total_rows + 2 ) * percentile_table.percentile)
            ) 
        ) OR (
            percentile_table.percentile = 0 
            AND t1.row_number = 1
        ) OR (
            percentile_table.percentile = 1 
            AND t1.row_number = total_rows
        )
    GROUP BY percentile_table.percentile;

在这里小提琴：http://sqlfiddle.com/#!9/58c0e2/1

肯定存在性能问题;如果有人有关于如何改进这一点的反馈，我会很高兴。

示例数据列表：

 3, 4, 4, 4, 7, 10, 11, 12, 14, 16, 17, 18

示例查询输出：

| percentile | percentile_values |
|------------|-------------------|
|          0 |                 3 |
|       0.25 |                 4 |
|        0.5 |              10.5 |
|       0.75 |                15 |
|          1 |                18 |

Answer 6

我将此解决方案与MYSQL函数一起使用：

x 是您想要的百分位

array_values 您的group_concat值顺序并以

分隔

DROP FUNCTION IF EXISTS centile;

delimiter $$
CREATE FUNCTION `centile`(x Text, array_values TEXT) RETURNS text
BEGIN

Declare DIFF_RANK TEXT;
Declare RANG_FLOOR INT;
Declare COUNT INT;
Declare VALEUR_SUP TEXT;
Declare VALEUR_INF TEXT;

SET COUNT = LENGTH(array_values) - LENGTH(REPLACE(array_values, ',', '')) + 1;
SET RANG_FLOOR = FLOOR(ROUND((x) * (COUNT-1),2));
SET DIFF_RANK = ((x) * (COUNT-1)) - FLOOR(ROUND((x) * (COUNT-1),2));

SET VALEUR_SUP = CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(array_values,',', RANG_FLOOR+2),',',-1) AS DECIMAL);
SET VALEUR_INF = CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(array_values,',', RANG_FLOOR+1),',',-1) AS DECIMAL);

/****
    https://fr.wikipedia.org/wiki/Quantile
    x_j+1 + g (x_j+2 - x_j+1)       
***/
RETURN  Round((VALEUR_INF + (DIFF_RANK* (VALEUR_SUP-VALEUR_INF) ) ),2);

END$$

示例：

Select centile(3/4,GROUP_CONCAT(lux ORDER BY lux SEPARATOR ',')) as quartile_3
FROM LuxLog
WHERE Sensor=12 AND Lux<>0

SQL查询中的四分位数

6 个答案: