Question

您好我已将文档上传到名为Data的Hive表，其中包含以下示例行：

He is a good boy and but his brother is a bad boy.
He is a naughty boy.

表的架构是：

create table Data(
    document_data STRING)
row format delimited
fields terminated by '\n'
stored as textfile;

我想编写一个查询，只计算单词boy和naughty`的出现次数，并输出它们：

 boy 3
 naughty 1

Answer 1

在这里，我们将使用LATERAL功能，将单个行转换为多个。

SELECT
    word,
    COUNT(*)
FROM Data
WHERE
    word="boy" OR
    word="naughty"
LATERAL VIEW 
    explode(split(document_data, ' ')) lateralTable AS word GROUP BY word;

我修改了我在Word Count program in Hive找到的版本。

使用配置单元

1 个答案: