我正在尝试编写一个简单的代码来计算表中不同实例的出现百分比。 我可以一气呵成吗?
下面是我的代码,它给了我错误。
select 100 * total_sum/sum(total_sum) from jav_test;
答案 0 :(得分:2)
过去当我不得不做类似的事情时,这就是我采取的方法:
SELECT
jav_test.total_sum AS total_sum,
withsum.total_sum AS sum_of_all_total_sum,
100 * (jav_test.total_sum / withsum.total_sum) AS percentage
FROM
jav_test,
(SELECT sum(total_sum) AS total_sum FROM jav_test) withsum -- This computes sum(total_sum) here as a single-row single-column table aliased as "withsum"
;
输出中total_sum
和sum_of_all_total_sum
列的存在只是为了让自己相信正确的数学运算 - 您感兴趣的数字是percentage
,基于查询你在问题中发帖。
填充一个小的虚拟表后,结果如下:
hive> describe jav_test;
OK
total_sum int
Time taken: 1.777 seconds, Fetched: 1 row(s)
hive> select * from jav_test;
OK
28
28
90113
90113
323694
323694
Time taken: 0.797 seconds, Fetched: 6 row(s)
hive> SELECT
> jav_test.total_sum AS total_sum,
> withsum.total_sum AS sum_of_all_total_sum,
> 100 * (jav_test.total_sum / withsum.total_sum) AS percentage
> FROM jav_test, (SELECT sum(total_sum) AS total_sum FROM jav_test) withsum;
...
... lots of mapreduce-related spam here
...
Total MapReduce CPU Time Spent: 3 seconds 370 msec
OK
28 827670 0.003382990805514275
28 827670 0.003382990805514275
90113 827670 10.887551802046708
90113 827670 10.887551802046708
323694 827670 39.10906520714777
323694 827670 39.10906520714777
Time taken: 41.257 seconds, Fetched: 6 row(s)
hive>