在一次查询调用中计算同一表中多列的中位数

时间:2013-07-03 12:25:12

标签: mysql duplicates median

StackOverflow救援!,我需要在一次查询调用中一次找到五列的中位数。

下面的中位数计算适用于单列,但合并后,多次使用" rownum"抛出查询。如何更新此功能以适用于多列?谢谢。要创建一个网络工具,非营利组织可以将其财务指标与用户定义的对等组进行比较。

SELECT t1_wages.totalwages_pctoftotexp AS median_totalwages_pctoftotexp
FROM (

SELECT @rownum := @rownum +1 AS  `row_number` , d_wages.totalwages_pctoftotexp
FROM data_990_c3 d_wages, (

SELECT @rownum :=0
)r_wages
WHERE totalwages_pctoftotexp >0
ORDER BY d_wages.totalwages_pctoftotexp
) AS t1_wages, (

SELECT COUNT( * ) AS total_rows
FROM data_990_c3 d_wages
WHERE totalwages_pctoftotexp >0
) AS t2_wages
WHERE 1 
AND t1_wages.row_number = FLOOR( total_rows /2 ) +1

--- [that was one median, below is another] ---

SELECT t1_solvent.solvent_days AS median_solvent_days
FROM (

SELECT @rownum := @rownum +1 AS  `row_number` , d_solvent.solvent_days
FROM data_990_c3 d_solvent, (

SELECT @rownum :=0
)r_solvent
WHERE solvent_days >0
ORDER BY d_solvent.solvent_days
) AS t1_solvent, (

SELECT COUNT( * ) AS total_rows
FROM data_990_c3 d_solvent
WHERE solvent_days >0
) AS t2_solvent
WHERE 1 
AND t1_solvent.row_number = FLOOR( total_rows /2 ) +1

[这两个 - 总共有五个我最终需要立即找到中位数]

2 个答案:

答案 0 :(得分:2)

这种事情在MySQL中是一个巨大的痛苦。如果你要做这个统计排名工作的吨位,你可能明智地使用免费的Oracle Express Edition或postgreSQL。它们都具有MEDIAN(value)聚合函数,这些函数可以是内置函数,也可以作为扩展函数使用。这是一个小小的方形展示。 http://sqlfiddle.com/#!4/53de8/6/0

但你没有问过这个问题。

在MySQL中,您的基本问题是@rownum等变量的范围。您还有一个旋转问题:也就是说,您需要将查询的行转换为列。

让我们先解决枢轴问题。你要做的是创建几个大胖查询的联合。例如:

SELECT 'median_wages' AS tag, wages AS value
  FROM (big fat query making median wages) A
 UNION
SELECT 'median_volunteer_hours' AS tag, hours AS value
  FROM (big fat query making median volunteer hours) B
 UNION
SELECT 'median_solvent_days' AS tag, days AS value
  FROM (big fat query making median solvency days) C

所以这是您在标签/值对表格中的结果。您可以像这样转动该表,以获得每行中带有值的一行。

SELECT SUM( CASE tag WHEN 'median_wages' THEN value ELSE 0 END 
          ) AS median_wages, 
SELECT SUM( CASE tag WHEN 'median_volunteer_hours' THEN value ELSE 0 END
          ) AS median_volunteer_hours, 
SELECT SUM( CASE tag WHEN 'median_solvent_days' THEN value ELSE 0 END 
          ) AS median_solvent_days
FROM (
    /* the above gigantic UNION query */
 ) Q

这就是你如何将行(从本例中的UNION查询)透视到列。这是关于该主题的教程。 http://www.artfulsoftware.com/infotree/qrytip.php?id=523

现在我们需要处理中值计算子查询。你问题中的代码看起来很不错。我没有您的数据,因此我很难对其进行评估。

但是你需要避免重用@rownum变量。在你的一个查询中调用@ rownum1,在下一个查询中调用@ rownum2,依此类推。这是一个简单的sql小提琴只做其中一个。 http://sqlfiddle.com/#!2/2f770/1/0

现在让我们建立一下,做两个不同的中位数。这是小提琴http://sqlfiddle.com/#!2/2f770/2/0,这里是UNION查询。 注意联合查询的后半部分使用@rownum2而不是@rownum

最后,这是带有旋转的完整查询。 http://sqlfiddle.com/#!2/2f770/13/0

 SELECT SUM( CASE tag WHEN 'Boston' THEN value ELSE 0 END ) AS Boston,
           SUM( CASE tag WHEN 'Bronx' THEN value ELSE 0 END ) AS Bronx   
   FROM (
 SELECT 'Boston' AS tag, pop AS VALUE
  FROM (
        SELECT @rownum := @rownum +1 AS  `row_number` , pop
          FROM pops, 
        (SELECT @rownum :=0)r
          WHERE pop >0 AND city = 'Boston'
          ORDER BY pop
        ) AS ordered_rows, 
        ( 
         SELECT COUNT( * ) AS total_rows
           FROM pops
          WHERE pop >0 AND city = 'Boston'
        ) AS rowcount
  WHERE ordered_rows.row_number = FLOOR( total_rows /2 ) +1
  UNION ALL
 SELECT 'Bronx' AS tag, pop AS VALUE
  FROM (
        SELECT @rownum2 := @rownum2 +1 AS  `row_number` , pop
          FROM pops, 
        (SELECT @rownum2 :=0)r
          WHERE pop >0 AND city = 'Bronx'
          ORDER BY pop
        ) AS ordered_rows, 
        ( 
         SELECT COUNT( * ) AS total_rows
           FROM pops
          WHERE pop >0 AND city = 'Bronx'
        ) AS rowcount
  WHERE ordered_rows.row_number = FLOOR( total_rows /2 ) +1
) D

这只是两个中位数。你需要五个。我认为很容易证明,在单个查询中,这种中值计算在MySQL中是非常难以做到的。

答案 1 :(得分:0)

假设您有一个包含三列的表,例如table(key,value1,value2)。

此查询为您提供每个键的两个值列的中值:

SELECT key,
 ((array_agg(value1 order by value1 asc) )[floor( (count(*)+1)::float/2)] + (array_agg(value1 order by value1 asc) )[ceiling( (count(*)+1)::float/2) ] )/2,
 ((array_agg(value2 order by value2 asc) )[floor( (count(*)+1)::float/2)] + (array_agg(value2 order by value2 asc) )[ceiling( (count(*)+1)::float/2) ] )/2    
FROM table 
GROUP BY key