使用配置单元执行wordcount时出现问题。
我的hive命令就像
select word, count(1) as count
from (select explode(split(word, ' ' )) as word from note) w
group by word
order by count desc
limit 5
;
结果:
the 20583
of 10388
9479
and 7611
in 5226
9479是行数。我怎么摆脱这个?
答案 0 :(得分:1)
将分割功能更改为 -
split(word,'\\s+')
(而不是单个空格,一个严重的白色字符[ \t\n\x0B\f\r]
)