假设我有一个包含列id和内容的表:
id | content
________________________
1 | abc abr abc as abs
2 | abc arc cre arc
3 | agr ann agd agd agd
我想要的是这样输出:
{"abc":2,"abr":1,"as":1, "abs":1} # for id 1
{"abc":1,"arc":2,"cre":1} # for id 2
{"agr":1,"agd":3,"ann":1} # for id 3
如何使用Hive完成任务?
答案 0 :(得分:1)
您需要this库。构建非常简单。
<强>查询强>:
ADD JAR /path/to/jar/brickhouse-0.7.1.jar;
CREATE TEMPORARY FUNCTION COLLECT AS 'brickhouse.udf.collect.CollectUDAF';
SELECT id
, COLLECT(words, c) AS count_map
FROM (
SELECT id
, words
, COUNT(*) AS c
FROM (
SELECT id, words
FROM db.tbl
LATERAL VIEW EXPLODE(SPLIT(content, ' ')) exptbl AS words ) x
GROUP BY id, words ) y
GROUP BY id
<强>输出强>:
+----+---------------------------------+
|id |count_map |
+----+---------------------------------+
|1 |{"as":1,"abs":1,"abc":2,"abr":1} |
+----+---------------------------------+
|2 |{"cre":1,"arc":2,"abc":1} |
+----+---------------------------------+
|3 |{"ann":1,"agr":1,"agd":3} |
+----+---------------------------------+