使用配置单元

时间:2015-10-23 12:26:10

标签: hadoop mapreduce hive hiveql

您好我已将文档上传到名为Data的Hive表,其中包含以下示例行:

He is a good boy and but his brother is a bad boy.
He is a naughty boy.

表的架构是:

create table Data(
    document_data STRING)
row format delimited
fields terminated by '\n'
stored as textfile;

我想编写一个查询,只计算单词boy和naughty`的出现次数,并输出它们:

 boy 3
 naughty 1 

1 个答案:

答案 0 :(得分:0)

在这里,我们将使用LATERAL功能,将单个行转换为多个。

SELECT
    word,
    COUNT(*)
FROM Data
WHERE
    word="boy" OR
    word="naughty"
LATERAL VIEW 
    explode(split(document_data, ' ')) lateralTable AS word GROUP BY word;

我修改了我在Word Count program in Hive找到的版本。