Question

我正在运行此配置单元查询，以便明智地依靠非结构化数据。

select a, count(*) from (select(EXPLODE(SPLIT(regexp_replace(upper(word,'[-!@#$%&*]',''))) AND EXPLODE(SPLIT(regexp_replace(UPPER(word,'[^A-Za-z0-9 ]','')))) as A from file)q group by a;

但我在下面是一个错误。无法找到解决方案。

FAILED：SemanticException [错误10014]：第1:46行错误的参数＆＃39;＆＃39;＆＃39;＆＃39;：没有类的匹配方法 org.apache.hadoop.hive.ql.udf.UDFRegExpReplace with（string）。可能选项： FUNC （字符串，字符串，字符串）

Answer 1

regexp_replace用于替换它有3个参数

(org.apache.hadoop.io.Text s, org.apache.hadoop.io.Text regex, org.apache.hadoop.io.Text replacement)

你应该使用regexp_extract

evaluate(String s, String regex)

在hive中对非结构化数据实施单词计数时出错

1 个答案: