您好我已将文档上传到名为Data
的Hive表,其中包含以下示例行:
He is a good boy and but his brother is a bad boy.
He is a naughty boy.
表的架构是:
create table Data(
document_data STRING)
row format delimited
fields terminated by '\n'
stored as textfile;
我想编写一个查询,只计算单词boy
和naughty`的出现次数,并输出它们:
boy 3
naughty 1
答案 0 :(得分:0)
在这里,我们将使用LATERAL
功能,将单个行转换为多个。
SELECT
word,
COUNT(*)
FROM Data
WHERE
word="boy" OR
word="naughty"
LATERAL VIEW
explode(split(document_data, ' ')) lateralTable AS word GROUP BY word;
我修改了我在Word Count program in Hive找到的版本。