我在Hive中有一个表tab
,如下所示:
word | occurrences
---- | -----------
by | 10
hi | 1
same | 3
love | 6
我想使用Hive查询计算并显示单词的频率(出现次数除以整列的总和)。例如,单词'的频率为'是10 /(10 + 1 + 3 + 6)= 0.5。
我试过了:
SELECT word, occurrences, occurrences/SUM(occurrences) AS frequency
FROM tab
GROUP BY word, occurrences
ORDER BY frequency;
但它给出了这个:
word | occurrences | frequency
---- | ----------- | ---------
by | 10 | 1
hi | 1 | 1
same | 3 | 1
love | 6 | 1
我不确定我做错了什么。我的SQL不是很好。提前谢谢。
答案 0 :(得分:0)
尝试下面的sql,在这里使用SUM() OVER()
SELECT word, occurrences, occurrences/SUM(occurrences) OVER() AS frequency
FROM tab
ORDER BY frequency;
答案 1 :(得分:0)
您不需要GROUP BY
任何列,因为您希望得到分母的所有频率。
SELECT a.word, a.occurrences, a.occurrences/b.total_freq AS frequency
FROM
tab a CROSS JOIN (SELECT SUM(occurences) AS total_freq from tab) b
ORDER BY frequency;
通过交叉连接,您可以将total_freq
用于tab
表的所有行,然后在外部查询中将其用作分母。
答案 2 :(得分:0)
with a1 as
(
SELECT word, occurrences, occurrences/SUM(occurrences) OVER() AS frequency
FROM tab
ORDER BY frequency
)
select * from a1