使用MySQL计算中位数的最简单(并且希望不是太慢)的方法是什么?我用AVG(x)
找到了平均值,但我很难找到一种计算中位数的简单方法。现在,我将所有行返回给PHP,进行排序,然后选择中间行,但肯定必须有一些简单的方法在单个MySQL查询中执行它。
示例数据:
id | val
--------
1 4
2 7
3 2
4 2
5 9
6 8
7 3
对val
进行排序得出2 2 3 4 7 8 9
,因此中位数应为4
,而SELECT AVG(val)
= 5
。
答案 0 :(得分:204)
在MariaDB / MySQL中:
SELECT AVG(dd.val) as median_val
FROM (
SELECT d.val, @rownum:=@rownum+1 as `row_number`, @total_rows:=@rownum
FROM data d, (SELECT @rownum:=0) r
WHERE d.val is NOT NULL
-- put some where clause here
ORDER BY d.val
) as dd
WHERE dd.row_number IN ( FLOOR((@total_rows+1)/2), FLOOR((@total_rows+2)/2) );
Steve Cohen指出,在第一次传递后,@ runum将包含总行数。这可用于确定中位数,因此不需要第二次传递或连接。
当存在偶数个记录时,AVG(dd.val)
和dd.row_number IN(...)
也用于正确生成中位数。推理:
SELECT FLOOR((3+1)/2),FLOOR((3+2)/2); -- when total_rows is 3, avg rows 2 and 2
SELECT FLOOR((4+1)/2),FLOOR((4+2)/2); -- when total_rows is 4, avg rows 2 and 3
答案 1 :(得分:56)
我只是found another answer online in the comments:
对于几乎所有SQL中的中位数:
SELECT x.val from data x, data y GROUP BY x.val HAVING SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*)+1)/2
确保您的列已编入索引,索引用于过滤和排序。验证解释计划。
select count(*) from table --find the number of rows
计算“中位数”行数。也许使用:median_row = floor(count / 2)
。
然后从列表中选择它:
select val from table order by val asc limit median_row,1
这应该只返回你想要的值的一行。
雅各
答案 2 :(得分:29)
我发现接受的解决方案在我的MySQL安装上没有用,返回一个空集,但是这个查询在我测试它的所有情况下都适用于我:
SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val)))/COUNT(*) > .5
LIMIT 1
答案 3 :(得分:18)
不幸的是,TheJacobTaylor和velcro的答案都没有为当前版本的MySQL返回准确的结果。
Velcro上面的答案很接近,但是对于具有偶数行的结果集,它没有正确计算。中位数被定义为1)奇数集上的中间数,或2)偶数集上两个中间数的平均值。
所以,这里的维可牢尼龙搭扣解决方案修补了奇数和偶数集:
SELECT AVG(middle_values) AS 'median' FROM (
SELECT t1.median_column AS 'middle_values' FROM
(
SELECT @row:=@row+1 as `row`, x.median_column
FROM median_table AS x, (SELECT @row:=0) AS r
WHERE 1
-- put some where clause here
ORDER BY x.median_column
) AS t1,
(
SELECT COUNT(*) as 'count'
FROM median_table x
WHERE 1
-- put same where clause here
) AS t2
-- the following condition will return 1 record for odd number sets, or 2 records for even number sets.
WHERE t1.row >= t2.count/2 and t1.row <= ((t2.count/2) +1)) AS t3;
要使用此功能,请按照以下3个简单步骤操作:
答案 4 :(得分:9)
我提出了一个更快的方法。
获取行数:
SELECT CEIL(COUNT(*)/2) FROM data;
然后在已排序的子查询中取中间值:
SELECT max(val) FROM (SELECT val FROM data ORDER BY val limit @middlevalue) x;
我用随机数的5x10e6数据集进行了测试,发现中位数不到10秒。
答案 5 :(得分:7)
对this page in the MySQL documentation的评论有以下建议:
-- (mostly) High Performance scaling MEDIAN function per group
-- Median defined in http://en.wikipedia.org/wiki/Median
--
-- by Peter Hlavac
-- 06.11.2008
--
-- Example Table:
DROP table if exists table_median;
CREATE TABLE table_median (id INTEGER(11),val INTEGER(11));
COMMIT;
INSERT INTO table_median (id, val) VALUES
(1, 7), (1, 4), (1, 5), (1, 1), (1, 8), (1, 3), (1, 6),
(2, 4),
(3, 5), (3, 2),
(4, 5), (4, 12), (4, 1), (4, 7);
-- Calculating the MEDIAN
SELECT @a := 0;
SELECT
id,
AVG(val) AS MEDIAN
FROM (
SELECT
id,
val
FROM (
SELECT
-- Create an index n for every id
@a := (@a + 1) mod o.c AS shifted_n,
IF(@a mod o.c=0, o.c, @a) AS n,
o.id,
o.val,
-- the number of elements for every id
o.c
FROM (
SELECT
t_o.id,
val,
c
FROM
table_median t_o INNER JOIN
(SELECT
id,
COUNT(1) AS c
FROM
table_median
GROUP BY
id
) t2
ON (t2.id = t_o.id)
ORDER BY
t_o.id,val
) o
) a
WHERE
IF(
-- if there is an even number of elements
-- take the lower and the upper median
-- and use AVG(lower,upper)
c MOD 2 = 0,
n = c DIV 2 OR n = (c DIV 2)+1,
-- if its an odd number of elements
-- take the first if its only one element
-- or take the one in the middle
IF(
c = 1,
n = 1,
n = c DIV 2 + 1
)
)
) a
GROUP BY
id;
-- Explanation:
-- The Statement creates a helper table like
--
-- n id val count
-- ----------------
-- 1, 1, 1, 7
-- 2, 1, 3, 7
-- 3, 1, 4, 7
-- 4, 1, 5, 7
-- 5, 1, 6, 7
-- 6, 1, 7, 7
-- 7, 1, 8, 7
--
-- 1, 2, 4, 1
-- 1, 3, 2, 2
-- 2, 3, 5, 2
--
-- 1, 4, 1, 4
-- 2, 4, 5, 4
-- 3, 4, 7, 4
-- 4, 4, 12, 4
-- from there we can select the n-th element on the position: count div 2 + 1
答案 6 :(得分:5)
上述大多数解决方案仅适用于表格的一个字段,您可能需要获取查询中许多字段的中位数(第50个百分位数)。
我用这个:
SELECT CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(
GROUP_CONCAT(field_name ORDER BY field_name SEPARATOR ','),
',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS `Median`
FROM table_name;
您可以将上面示例中的“50”替换为任何百分位,非常有效。
只需确保您有足够的内存用于GROUP_CONCAT,您可以使用以下内容进行更改:
SET group_concat_max_len = 10485760; #10MB max length
答案 7 :(得分:4)
我在HackerRank上找到了以下代码,它非常简单,适用于每一种情况。
SELECT M.MEDIAN_COL FROM MEDIAN_TABLE M WHERE
(SELECT COUNT(MEDIAN_COL) FROM MEDIAN_TABLE WHERE MEDIAN_COL < M.MEDIAN_COL ) =
(SELECT COUNT(MEDIAN_COL) FROM MEDIAN_TABLE WHERE MEDIAN_COL > M.MEDIAN_COL );
答案 8 :(得分:4)
建立魔术贴的答案,对于那些你必须按照另一个参数分组的东西的中位数:
SELECT grp_field, t1.val FROM (
SELECT grp_field, @rownum:=IF(@s = grp_field, @rownum + 1, 0) AS row_number
,
@s:=IF(@s = grp_field, @s, grp_field) AS sec, d.val
FROM data d, (SELECT @rownum:=0, @s:=0) r
ORDER BY grp_field, d.val
) as t1 JOIN (
SELECT grp_field, count(*) as total_rows
FROM data d
GROUP BY grp_field
) as t2
ON t1.grp_field = t2.grp_field
WHERE t1.row_number=floor(total_rows/2)+1;
答案 9 :(得分:3)
关注奇数值 - 在这种情况下给出中间两个值的平均值。
SELECT AVG(val) FROM
( SELECT x.id, x.val from data x, data y
GROUP BY x.id, x.val
HAVING SUM(SIGN(1-SIGN(IF(y.val-x.val=0 AND x.id != y.id, SIGN(x.id-y.id), y.val-x.val)))) IN (ROUND((COUNT(*))/2), ROUND((COUNT(*)+1)/2))
) sq
答案 10 :(得分:3)
您可以使用找到here的用户定义函数。
答案 11 :(得分:2)
安装并使用此mysql统计函数:http://www.xarg.org/2012/07/statistical-functions-in-mysql/
之后,计算中位数很容易:
SELECT median(x)FROM t1
答案 12 :(得分:2)
SELECT
SUBSTRING_INDEX(
SUBSTRING_INDEX(
GROUP_CONCAT(field ORDER BY field),
',',
((
ROUND(
LENGTH(GROUP_CONCAT(field)) -
LENGTH(
REPLACE(
GROUP_CONCAT(field),
',',
''
)
)
) / 2) + 1
)),
',',
-1
)
FROM
table
以上似乎对我有用。
答案 13 :(得分:2)
单个查询以归档理想中位数:
SELECT
COUNT(*) as total_rows,
IF(count(*)%2 = 1, CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL), ROUND((CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) + CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val SEPARATOR ','), ',', 50/100 * COUNT(*)), ',', -1) AS DECIMAL)) / 2)) as median,
AVG(val) as average
FROM
data
答案 14 :(得分:2)
另一个关于Velcrow的回答,但使用单个中间表并利用用于行编号的变量来获取计数,而不是执行额外的查询来计算它。同时启动计数,以便第一行是第0行,以便只使用Floor和Ceil来选择中间行。
SELECT Avg(tmp.val) as median_val
FROM (SELECT inTab.val, @rows := @rows + 1 as rowNum
FROM data as inTab, (SELECT @rows := -1) as init
-- Replace with better where clause or delete
WHERE 2 > 1
ORDER BY inTab.val) as tmp
WHERE tmp.rowNum in (Floor(@rows / 2), Ceil(@rows / 2));
答案 15 :(得分:2)
我在下面介绍的解决方案仅适用于一个查询,无需创建表,变量甚至子查询。 此外,它允许您在分组查询中获得每个组的中位数(这是我需要的!):
SELECT `columnA`,
SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(`columnB` ORDER BY `columnB`), ',', CEILING((COUNT(`columnB`)/2))), ',', -1) medianOfColumnB
FROM `tableC`
-- some where clause if you want
GROUP BY `columnA`;
它的工作原理是巧妙地使用了group_concat和substring_index。
但是,要允许大的group_concat,您必须将group_concat_max_len设置为更高的值(默认情况下为1024个字符)。 您可以像这样设置它(对于当前的sql会话):
SET SESSION group_concat_max_len = 10000;
-- up to 4294967295 in 32-bits platform.
group_concat_max_len的更多信息:https://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html#sysvar_group_concat_max_len
答案 16 :(得分:2)
或者,您也可以在存储过程中执行此操作:
DROP PROCEDURE IF EXISTS median;
DELIMITER //
CREATE PROCEDURE median (table_name VARCHAR(255), column_name VARCHAR(255), where_clause VARCHAR(255))
BEGIN
-- Set default parameters
IF where_clause IS NULL OR where_clause = '' THEN
SET where_clause = 1;
END IF;
-- Prepare statement
SET @sql = CONCAT(
"SELECT AVG(middle_values) AS 'median' FROM (
SELECT t1.", column_name, " AS 'middle_values' FROM
(
SELECT @row:=@row+1 as `row`, x.", column_name, "
FROM ", table_name," AS x, (SELECT @row:=0) AS r
WHERE ", where_clause, " ORDER BY x.", column_name, "
) AS t1,
(
SELECT COUNT(*) as 'count'
FROM ", table_name, " x
WHERE ", where_clause, "
) AS t2
-- the following condition will return 1 record for odd number sets, or 2 records for even number sets.
WHERE t1.row >= t2.count/2
AND t1.row <= ((t2.count/2)+1)) AS t3
");
-- Execute statement
PREPARE stmt FROM @sql;
EXECUTE stmt;
END//
DELIMITER ;
-- Sample usage:
-- median(table_name, column_name, where_condition);
CALL median('products', 'price', NULL);
答案 17 :(得分:2)
我的代码,没有表格或其他变量,效率很高:
SELECT
((SUBSTRING_INDEX(SUBSTRING_INDEX(group_concat(val order by val), ',', floor(1+((count(val)-1) / 2))), ',', -1))
+
(SUBSTRING_INDEX(SUBSTRING_INDEX(group_concat(val order by val), ',', ceiling(1+((count(val)-1) / 2))), ',', -1)))/2
as median
FROM table;
答案 18 :(得分:1)
这是我的方式。当然,你可以把它放到一个程序中: - )
SET @median_counter = (SELECT FLOOR(COUNT(*)/2) - 1 AS `median_counter` FROM `data`);
SET @median = CONCAT('SELECT `val` FROM `data` ORDER BY `val` LIMIT ', @median_counter, ', 1');
PREPARE median FROM @median;
EXECUTE median;
您可以避免使用变量@median_counter
,如果您将其变为:
SET @median = CONCAT( 'SELECT `val` FROM `data` ORDER BY `val` LIMIT ',
(SELECT FLOOR(COUNT(*)/2) - 1 AS `median_counter` FROM `data`),
', 1'
);
PREPARE median FROM @median;
EXECUTE median;
答案 19 :(得分:1)
由于我只需要一个中位数和百分位数的解决方案,我根据此主题中的发现做了一个简单且非常灵活的功能。我知道如果我找到易于包含在项目中的“现成”功能,我会感到很开心,所以我决定快速分享:
function mysql_percentile($table, $column, $where, $percentile = 0.5) {
$sql = "
SELECT `t1`.`".$column."` as `percentile` FROM (
SELECT @rownum:=@rownum+1 as `row_number`, `d`.`".$column."`
FROM `".$table."` `d`, (SELECT @rownum:=0) `r`
".$where."
ORDER BY `d`.`".$column."`
) as `t1`,
(
SELECT count(*) as `total_rows`
FROM `".$table."` `d`
".$where."
) as `t2`
WHERE 1
AND `t1`.`row_number`=floor(`total_rows` * ".$percentile.")+1;
";
$result = sql($sql, 1);
if (!empty($result)) {
return $result['percentile'];
} else {
return 0;
}
}
使用非常简单,例如我当前的项目:
...
$table = DBPRE."zip_".$slug;
$column = 'seconds';
$where = "WHERE `reached` = '1' AND `time` >= '".$start_time."'";
$reaching['median'] = mysql_percentile($table, $column, $where, 0.5);
$reaching['percentile25'] = mysql_percentile($table, $column, $where, 0.25);
$reaching['percentile75'] = mysql_percentile($table, $column, $where, 0.75);
...
答案 20 :(得分:1)
一种在MySQL中计算中位数的简单方法
set @ct := (select count(1) from station);
set @row := 0;
select avg(a.val) as median from
(select * from table order by val) a
where (select @row := @row + 1)
between @ct/2.0 and @ct/2.0 +1;
答案 21 :(得分:1)
我使用了两种查询方法:
这些包含在函数defn中,因此可以从一次调用返回所有值。
如果您的范围是静态的并且您的数据不会经常更改,那么预先计算/存储这些值并使用存储的值而不是每次从头开始查询可能更有效。
答案 22 :(得分:1)
这种方式似乎包括没有子查询的偶数和奇数。
pivotTable.addReportFilter(13);
答案 23 :(得分:1)
MySQL自8.0版开始支持窗口功能,您可以使用ROW_NUMBER
或DENSE_RANK
(请勿使用RANK
,因为它将相同的等级分配给相同的等级值,例如体育排名中的值):
SELECT AVG(t1.val) AS median_val
FROM (SELECT val,
ROW_NUMBER() OVER(ORDER BY val) AS rownum
FROM data) t1,
(SELECT COUNT(*) AS num_records FROM data) t2
WHERE t1.row_num IN
(FLOOR((t2.num_records + 1) / 2),
FLOOR((t2.num_records + 2) / 2));
答案 24 :(得分:1)
通常,我们可能需要不仅针对整个表计算Median,而且针对我们的ID计算聚合。换句话说,计算表中每个ID的中位数,其中每个ID都有许多记录。 (良好的性能,适用于许多SQL +修复偶数和赔率问题,更多关于不同中位数方法的表现https://sqlperformance.com/2012/08/t-sql-queries/median)
SELECT our_id, AVG(1.0 * our_val) as Median
FROM
( SELECT our_id, our_val,
COUNT(*) OVER (PARTITION BY our_id) AS cnt,
ROW_NUMBER() OVER (PARTITION BY our_id ORDER BY our_val) AS rn
FROM our_table
) AS x
WHERE rn IN ((cnt + 1)/2, (cnt + 2)/2) GROUP BY our_id;
希望有所帮助
答案 25 :(得分:0)
ORACLE 的简单解决方案:
SELECT ROUND(MEDIAN(Lat_N), 4) FROM Station;
易于理解的 MySQL 解决方案:
select case MOD(count(lat_n),2)
when 1 then (select round(S.LAT_N,4) from station S where (select count(Lat_N) from station where Lat_N < S.LAT_N ) = (select count(Lat_N) from station where Lat_N > S.LAT_N))
else (select round(AVG(S.LAT_N),4) from station S where 1 = (select count(Lat_N) from station where Lat_N < S.LAT_N ) - (select count(Lat_N) from station where Lat_N > S.LAT_N))
end from station;
说明
STATION 是表名。 LAT_N 是具有数值的列名
假设站表中有101条记录(奇数)。这意味着如果表按 asc 或 desc 排序,则中位数是第 51 条记录。
在上面对 S 表的每个 S.LAT_N 的查询中,我创建了两个表。一个用于小于 S.LAT_N 的 LAT_N 值数量,另一个用于大于 S.LAT_N 的 LAT_N 值数量。后来我比较了这两个表,如果它们匹配,那么我选择那个 S.LAT_N 值。当我检查第 51 条记录时,有 50 个值小于第 51 条记录,有 50 条记录大于第 51 条记录。如您所见,两个表中都有 50 条记录。所以这就是我们的答案。对于每个其他记录,在为比较而创建的两个表中都有不同数量的记录。所以,只有第 51 条记录符合条件。
现在假设站表中有100条记录(偶数)。这意味着如果表按 asc 或 desc 排序,则中位数是第 50 条和第 51 条记录的平均值。
与奇怪的逻辑一样,我正在创建两个表。一个用于小于 S.LAT_N 的 LAT_N 值数量,另一个用于大于 S.LAT_N 的 LAT_N 值数量。后来我比较了这两个表,如果它们的差值等于 1,那么我将选择该 S.LAT_N 值并找到平均值。当我检查第 50 条记录时,有 49 个值小于第 50 条记录,有 51 条记录大于第 50 条记录。如您所见,两个表中有 1 条记录的差异。所以这个(第 50 条记录)是我们的平均第一条记录。同样,当我检查第 51 条记录时,有 50 个值小于第 51 条记录,有 49 条记录大于第 51 条记录。如您所见,两个表中有 1 条记录的差异。所以这个(第 51 条记录)是我们的第二条平均记录。对于每个其他记录,在为比较而创建的两个表中都有不同数量的记录。因此,只有第 50 条和第 51 条记录满足条件。
答案 26 :(得分:0)
mysql中计算中位数最简单快捷的方法。
select x.col
from (select lat_n,
count(1) over (partition by 'A') as total_rows,
row_number() over (order by col asc) as rank_Order
from station ft) x
where x.rank_Order = round(x.total_rows / 2.0, 0)
答案 27 :(得分:0)
我没有将这个解决方案的性能与这里发布的其他答案进行比较,但我发现这是最容易理解的,并且涵盖了 mathematical formula 的全部范围,用于计算中位数。换句话说,这个解决方案对于偶数和奇数数据集足够健壮:
SELECT CASE
-- odd-numbered data sets:
WHEN MOD(COUNT(*), 2) = 1 THEN (SELECT median.<value> AS median
FROM
(SELECT t1.<value>
FROM (SELECT <value>,
ROW_NUMBER() OVER(ORDER BY <value>) AS rownum
FROM <data>) t1,
(SELECT COUNT(*) AS num_records FROM <data>) t2
WHERE t1.rownum =(t2.num_records) / 2) as median)
-- even-numbered data sets:
ELSE (select (low_bound.<value> + up_bound.<value>) / 2 AS median
FROM
(SELECT t1.<value>
FROM (SELECT <value>,
ROW_NUMBER() OVER(ORDER BY <value>) AS rownum
FROM <data>) t1,
(SELECT COUNT(*) AS num_records FROM <data>) t2
WHERE t1.rownum =(t2.num_records - 1) / 2) as low_bound,
(SELECT t1.<value>
FROM (SELECT <value>,
ROW_NUMBER() OVER(ORDER BY <value>) AS rownum
FROM station) t1,
(SELECT COUNT(*) AS num_records FROM data) t2
WHERE t1.rownum =(t2.num_records + 1) / 2) as up_bound)
END
FROM <data>
答案 28 :(得分:0)
尝试类似:
SELECT
CAST (AVG(val) AS DECIMAL(10,4))
FROM
(
SELECT
val,
ROW_NUMBER() OVER( ORDER BY val ) -1 AS rn,
COUNT(1) OVER () -1 AS cnt
FROM STATION
) as tmp
WHERE rn IN (FLOOR(cnt/2),CEILING (cnt/2))
**
<块引用>注意:-1 的原因是使其索引为零..即行号 现在从 0 而不是 1 开始
**
答案 29 :(得分:0)
下面的查询对于偶数或奇数行都将是完美的。在子查询中,我们发现在其前后具有相同行数的值。如果行数为奇数,having子句的计算结果为0(取消符号之前和之后的行数相同)。
类似地,对于偶数行,对于两行(中间2行),having子句的计算结果为1,因为它们(合计)之前和之后的行数相同。
在外部查询中,我们将平均输出一个值(如果是奇数行)或(两个值是偶数行)。
select avg(val) as median
from
(
select d1.val
from data d1 cross join data d2
group by d1.val
having abs(sum(sign(d1.val-d2.val))) in (0,1)
) sub
注意:如果您的表有重复的值,则上述hading子句应更改为以下条件。在这种情况下,可能存在超出原始可能性0,1的值。以下条件将使该条件动态化,并且在重复的情况下也可以工作。
having sum(case when d1.val=d2.val then 1 else 0 end)>=
abs(sum(sign(d1.val-d2.val)))
答案 30 :(得分:0)
我发现此答案非常有帮助-https://www.eversql.com/how-to-calculate-median-value-in-mysql-using-a-simple-sql-query/
SET @rowindex := -1;
SELECT
AVG(g.grade)
FROM
(SELECT @rowindex:=@rowindex + 1 AS rowindex,
grades.grade AS grade
FROM grades
ORDER BY grades.grade) AS g
WHERE
g.rowindex IN (FLOOR(@rowindex / 2) , CEIL(@rowindex / 2));
答案 31 :(得分:0)
如果这是 MySQL,现在有窗口函数,你可以这样做(假设你想四舍五入到最接近的整数 - 否则只需将 ROUND
替换为 CEIL
或 {{1 }} 或者你有什么)。以下解决方案适用于表,无论它们的行数是偶数还是奇数:
FLOOR
我认为该线程上的一些较新的答案已经采用了这种方法,但似乎人们也想多了,因此请将此视为改进版本。不管 SQL 风格如何,没有理由任何人都应该编写包含多个子查询的大量代码来获得 2021 年的中位数。但是,请注意,上述查询仅在您被要求为某个查询找到中位数时才有效。 连续系列。当然,无论行数如何,有时人们确实会区分所谓的离散中值和连续系列的插值中值 .
如果要求您查找离散系列的中位数,并且表格的行数为偶数,则上述解决方案对您不起作用,并且您应该恢复使用其他解决方案之一,例如 TheJacobTaylor's。
下面的第二个解决方案是 TheJacobTaylor 的略微修改版本,我在其中明确声明了
WITH CTE AS (
SELECT val,
ROW_NUMBER() OVER (ORDER BY val ASC) AS rn,
COUNT(*) OVER () AS total_count
FROM data
)
SELECT ROUND(AVG(val)) AS median
FROM CTE
WHERE
rn BETWEEN
total_count / 2.0 AND
total_count / 2.0 + 1;
。这也适用于具有奇数行的表,无论您是被要求查找连续还是离散系列的中位数,但当被要求查找离散系列的中位数时,我会特别使用它。否则,请使用第一种解决方案。这样,您将永远不必考虑数据包含“偶数”还是“奇数”个数据点。
CROSS JOIN
最后,您可以使用内置函数在 PostgreSQL 中轻松完成此操作。这是一个很好的解释,以及对离散中位数和内插中位数的有效总结。
https://leafo.net/guides/postgresql-calculating-percentile.html#calculating-the-median
答案 32 :(得分:0)
以下SQL代码将帮助您使用用户定义的变量计算MySQL的中位数。
create table employees(salary int);
insert into employees values(8);
insert into employees values(23);
insert into employees values(45);
insert into employees values(123);
insert into employees values(93);
insert into employees values(2342);
insert into employees values(2238);
select * from employees;
Select salary from employees order by salary;
set @rowid=0;
set @cnt=(select count(*) from employees);
set @middle_no=ceil(@cnt/2);
set @odd_even=null;
select AVG(salary) from
(select salary,@rowid:=@rowid+1 as rid, (CASE WHEN(mod(@cnt,2)=0) THEN @odd_even:=1 ELSE @odd_even:=0 END) as odd_even_status from employees order by salary) as tbl where tbl.rid=@middle_no or tbl.rid=(@middle_no+@odd_even);
&#13;
如果您正在寻找详细说明,请参阅此blog.
答案 33 :(得分:0)
create table med(id integer);
insert into med(id) values(1);
insert into med(id) values(2);
insert into med(id) values(3);
insert into med(id) values(4);
insert into med(id) values(5);
insert into med(id) values(6);
select (MIN(count)+MAX(count))/2 from
(select case when (select count(*) from
med A where A.id<B.id)=(select count(*)/2 from med) OR
(select count(*) from med A where A.id>B.id)=(select count(*)/2
from med) then cast(B.id as float)end as count from med B) C;
?column?
----------
3.5
(1 row)
OR
select cast(avg(id) as float) from
(select t1.id from med t1 JOIN med t2 on t1.id!= t2.id
group by t1.id having ABS(SUM(SIGN(t1.id-t2.id)))=1) A;
答案 34 :(得分:0)
根据@ bob的回答,这会推断查询能够返回多个中位数,并按某些标准分组。
考虑一下,例如,汽车中二手车的中位销售价格,按年份分组。
SELECT
period,
AVG(middle_values) AS 'median'
FROM (
SELECT t1.sale_price AS 'middle_values', t1.row_num, t1.period, t2.count
FROM (
SELECT
@last_period:=@period AS 'last_period',
@period:=DATE_FORMAT(sale_date, '%Y-%m') AS 'period',
IF (@period<>@last_period, @row:=1, @row:=@row+1) as `row_num`,
x.sale_price
FROM listings AS x, (SELECT @row:=0) AS r
WHERE 1
-- where criteria goes here
ORDER BY DATE_FORMAT(sale_date, '%Y%m'), x.sale_price
) AS t1
LEFT JOIN (
SELECT COUNT(*) as 'count', DATE_FORMAT(sale_date, '%Y-%m') AS 'period'
FROM listings x
WHERE 1
-- same where criteria goes here
GROUP BY DATE_FORMAT(sale_date, '%Y%m')
) AS t2
ON t1.period = t2.period
) AS t3
WHERE
row_num >= (count/2)
AND row_num <= ((count/2) + 1)
GROUP BY t3.period
ORDER BY t3.period;
答案 35 :(得分:0)
按维度分组的中位数:
$query = "update ACCESSUSERS set ACTIVE='111' where UPPER(USERNAME)=UPPER('firstname') and PINNUMBER='7777'";
mysqli_query($conn, $query);
$numrows = mysql_affected_rows();
printf("Records updated: %d\n", $numrows);
答案 36 :(得分:0)
在某些情况下,中位数计算如下:
&#34;中位数&#34;是&#34;中间&#34;按值排序时的数字列表中的值。对于偶数计数集,中位数是两个中间值的平均值。 我为此创建了一个简单的代码:
$midValue = 0;
$rowCount = "SELECT count(*) as count {$from} {$where}";
$even = FALSE;
$offset = 1;
$medianRow = floor($rowCount / 2);
if ($rowCount % 2 == 0 && !empty($medianRow)) {
$even = TRUE;
$offset++;
$medianRow--;
}
$medianValue = "SELECT column as median
{$fromClause} {$whereClause}
ORDER BY median
LIMIT {$medianRow},{$offset}";
$medianValDAO = db_query($medianValue);
while ($medianValDAO->fetch()) {
if ($even) {
$midValue = $midValue + $medianValDAO->median;
}
else {
$median = $medianValDAO->median;
}
}
if ($even) {
$median = $midValue / 2;
}
return $median;
返回的$中位数将是必需的结果: - )
答案 37 :(得分:0)
取自: http://mdb-blog.blogspot.com/2015/06/mysql-find-median-nth-element-without.html
我建议采用另一种方式,不加入, 但使用字符串
我没有用大数据表检查它, 但是小/中等表可以正常使用。
这里的好处是,它也可以通过GROUPING 工作,因此它可以返回多个项目的中位数。
这是测试表的测试代码:
DROP TABLE test.test_median
CREATE TABLE test.test_median AS
SELECT 'book' AS grp, 4 AS val UNION ALL
SELECT 'book', 7 UNION ALL
SELECT 'book', 2 UNION ALL
SELECT 'book', 2 UNION ALL
SELECT 'book', 9 UNION ALL
SELECT 'book', 8 UNION ALL
SELECT 'book', 3 UNION ALL
SELECT 'note', 11 UNION ALL
SELECT 'bike', 22 UNION ALL
SELECT 'bike', 26
以及查找每个组的中位数的代码:
SELECT grp,
SUBSTRING_INDEX( SUBSTRING_INDEX( GROUP_CONCAT(val ORDER BY val), ',', COUNT(*)/2 ), ',', -1) as the_median,
GROUP_CONCAT(val ORDER BY val) as all_vals_for_debug
FROM test.test_median
GROUP BY grp
输出:
grp | the_median| all_vals_for_debug
bike| 22 | 22,26
book| 4 | 2,2,3,4,7,8,9
note| 11 | 11
答案 38 :(得分:0)
我有一个包含大约10亿行的数据库,我们需要这些行来确定集合中的中位数年龄。排序十亿行很难,但是如果你聚合可以找到的不同值(年龄范围从0到100),你可以对这个列表进行排序,并使用一些算术魔法来找到你想要的任何百分位,如下所示:
with rawData(count_value) as
(
select p.YEAR_OF_BIRTH
from dbo.PERSON p
),
overallStats (avg_value, stdev_value, min_value, max_value, total) as
(
select avg(1.0 * count_value) as avg_value,
stdev(count_value) as stdev_value,
min(count_value) as min_value,
max(count_value) as max_value,
count(*) as total
from rawData
),
aggData (count_value, total, accumulated) as
(
select count_value,
count(*) as total,
SUM(count(*)) OVER (ORDER BY count_value ROWS UNBOUNDED PRECEDING) as accumulated
FROM rawData
group by count_value
)
select o.total as count_value,
o.min_value,
o.max_value,
o.avg_value,
o.stdev_value,
MIN(case when d.accumulated >= .50 * o.total then count_value else o.max_value end) as median_value,
MIN(case when d.accumulated >= .10 * o.total then count_value else o.max_value end) as p10_value,
MIN(case when d.accumulated >= .25 * o.total then count_value else o.max_value end) as p25_value,
MIN(case when d.accumulated >= .75 * o.total then count_value else o.max_value end) as p75_value,
MIN(case when d.accumulated >= .90 * o.total then count_value else o.max_value end) as p90_value
from aggData d
cross apply overallStats o
GROUP BY o.total, o.min_value, o.max_value, o.avg_value, o.stdev_value
;
此查询取决于您的数据库支持窗口函数(包括ROWS UNBOUNDED PRECEDING),但是如果您没有这个,则将aggData CTE与其自身连接并将所有先前总计聚合到使用的“累积”列中是一件简单的事情确定哪个值包含指定的优先级。上述样品计算p10,p25,p50(中位数),p75和p90。
-Chris
答案 39 :(得分:0)
知道确切的行数,您可以使用此查询:
SELECT <value> AS VAL FROM <table> ORDER BY VAL LIMIT 1 OFFSET <half>
<half> = ceiling(<size> / 2.0) - 1
答案 40 :(得分:0)
在阅读完之前的所有内容之后,他们与我的实际要求不符,所以我实现了自己的不需要任何程序或复杂语句,只有GROUP_CONCAT
列中的所有值我想获取MEDIAN并应用COUNT DIV BY 2我从列表中间提取值,如下面的查询所示:
(POS是我想要获得其中位数的列的名称)
(query) SELECT
SUBSTRING_INDEX (
SUBSTRING_INDEX (
GROUP_CONCAT(pos ORDER BY CAST(pos AS SIGNED INTEGER) desc SEPARATOR ';')
, ';', COUNT(*)/2 )
, ';', -1 ) AS `pos_med`
FROM table_name
GROUP BY any_criterial
我希望这可能对某些人有用,就像本网站上的许多其他评论一样。
答案 41 :(得分:0)
如果MySQL有ROW_NUMBER,则MEDIAN(受此SQL Server查询启发):
WITH Numbered AS
(
SELECT *, COUNT(*) OVER () AS Cnt,
ROW_NUMBER() OVER (ORDER BY val) AS RowNum
FROM yourtable
)
SELECT id, val
FROM Numbered
WHERE RowNum IN ((Cnt+1)/2, (Cnt+2)/2)
;
如果您有偶数条目,则使用IN。
如果你想找到每组的中位数,那么你的OVER子句中只有PARTITION BY组。
罗布
答案 42 :(得分:-1)
set @r = 0;
select
case when mod(c,2)=0 then round(sum(lat_N),4)
else round(sum(lat_N)/2,4)
end as Med
from
(select lat_N, @r := @r+1, @r as id from station order by lat_N) A
cross join
(select (count(1)+1)/2 as c from station) B
where id >= floor(c) and id <=ceil(c)