如何从多个列中查找不同值的平均值

时间:2019-11-13 05:39:50

标签: hive

下面是来自200000行的输入数据。

Input

我正在使用以下命令查找平均值,并期望O / P:M 50%F 50%

select avg(sum(case when col1='M' then 1 end)+
       sum(case when col2='M' then 1 end)+
       sum(case when col3='M' then 1 end)+
       sum(case when col4='M' then 1 end)+
       sum(case when col5='M' then 1 end)) as M,

   avg(sum(case when col1='F' then 1 end)+
       sum(case when col2='F' then 1 end)+
       sum(case when col3='F' then 1 end)+
       sum(case when col4='F' then 1 end)+
       sum(case when col5='F' then 1 end)) as F
       from household;

但显示错误:

Error

1 个答案:

答案 0 :(得分:2)

在Hive中尝试这种幽默。一切正常。

SELECT 
    y.M1/(y.M1 + y.F1) * 100 AS M,
    y.F1/(y.M1 + y.F1) * 100 AS F
FROM (
    SELECT 
        (x.SumMCol1 + x.SumMCol2 + x.SumMCol3 + x.SumMCol4 + x.SumMCol5) AS M1,
        (x.SumFCol1 + x.SumFCol2 + x.SumFCol3 + x.SumFCol4 + x.SumFCol5) AS F1
    FROM (
        SELECT 
            SUM(IF(col1 = 'M', 1, 0)) AS SumMCol1,
            SUM(IF(col2 = 'M', 1, 0)) AS SumMCol2,
            SUM(IF(col3 = 'M', 1, 0)) AS SumMCol3,
            SUM(IF(col4 = 'M', 1, 0)) AS SumMCol4,
            SUM(IF(col5 = 'M', 1, 0)) AS SumMCol5,
            SUM(IF(col1 = 'F', 1, 0)) AS SumFCol1,
            SUM(IF(col2 = 'F', 1, 0)) AS SumFCol2,
            SUM(IF(col3 = 'F', 1, 0)) AS SumFCol3,
            SUM(IF(col4 = 'F', 1, 0)) AS SumFCol4,
            SUM(IF(col5 = 'F', 1, 0)) AS SumFCol5,
            COUNT(*) AS TotalRows
        FROM 
            household
    ) x
) y;

下面是link到SQL Fiddle的尝试:http://sqlfiddle.com/#!9/e9cf85/2

enter image description here