蜂巢计算百分比

时间:2015-01-14 12:05:14

标签: command hive percentage

我正在尝试编写一个简单的代码来计算表中不同实例的出现百分比。 我可以一气呵成吗?

下面是我的代码,它给了我错误。

select 100 * total_sum/sum(total_sum) from jav_test;

1 个答案:

答案 0 :(得分:2)

过去当我不得不做类似的事情时,这就是我采取的方法:

SELECT
  jav_test.total_sum AS total_sum,
  withsum.total_sum AS sum_of_all_total_sum,
  100 * (jav_test.total_sum / withsum.total_sum) AS percentage
FROM
  jav_test,
  (SELECT sum(total_sum) AS total_sum FROM jav_test) withsum    -- This computes sum(total_sum) here as a single-row single-column table aliased as "withsum"
;

输出中total_sumsum_of_all_total_sum列的存在只是为了让自己相信正确的数学运算 - 您感兴趣的数字是percentage,基于查询你在问题中发帖。

填充一个小的虚拟表后,结果如下:

hive> describe jav_test;
OK
total_sum                   int                                 
Time taken: 1.777 seconds, Fetched: 1 row(s)
hive> select * from jav_test;
OK
28
28
90113
90113
323694
323694
Time taken: 0.797 seconds, Fetched: 6 row(s)
hive> SELECT
    >   jav_test.total_sum AS total_sum,
    >   withsum.total_sum AS sum_of_all_total_sum,
    >   100 * (jav_test.total_sum / withsum.total_sum) AS percentage
    > FROM jav_test, (SELECT sum(total_sum) AS total_sum FROM jav_test) withsum;
...
... lots of mapreduce-related spam here
...
Total MapReduce CPU Time Spent: 3 seconds 370 msec
OK  
28  827670  0.003382990805514275
28  827670  0.003382990805514275
90113       827670  10.887551802046708
90113       827670  10.887551802046708
323694      827670  39.10906520714777
323694      827670  39.10906520714777
Time taken: 41.257 seconds, Fetched: 6 row(s)
hive>