Question

我有一张如下表格，

从tablename中选择*;

ID                   sentence
1              This is a sentence
2              This might be a test
3                     America
4                    This this

我想编写一个查询来将句子拆分为单词，并按降序获取单词的计数。我希望有一个像

这样的输出

word     count    Unique(ids)

This       4          3
a          2          2
might      1          1
.
.
.

其中count是列中出现单词的次数，Unique（ids）是具有该单词的用户数。

我在考虑以什么方式编写查询来执行此操作？

任何人都可以帮助我在蜂巢中这样做吗？

由于

Answer 1

后视图

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView

select id, word
from tablename tn lateral view explode( split( tn.sentense, ' ' ) ) tb as word

结果将是：

1 This
1 is 
1 a
1 sentense
2 This
2 might
2 be
2 a
2 test
3 america

汇总结果

获取Hive中每个单词的唯一字数

1 个答案: