Question

我试图通过hive运行自定义map-reduce。我为wordcount创建了示例mapper和reducer类。我按照本文的以下步骤进行操作 http://www.lichun.cc/blog/2012/06/wordcount-mapreduce-example-using-hive-on-local-and-emr/

create external table if not exists raw_lines(line string)
    ROW FORMAT DELIMITED
    stored as textfile
    location '/user/new_user/hive_mr_input';

我已将wordcount的示例行添加到/ user / new_user / hive_mr_input目录。

create external table if not exists word_count(word string, count int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
lines terminated by '\n' 
STORED AS TEXTFILE LOCATION '/user/new_user/hive_mr_output';

hive>
add file /home/new_user/hive/WordCountReducer.java;
add file /home/new_user/hive/WordCountMapper.java;

    from (
            from raw_lines
            map raw_lines.line        
            using '/user/new_user/hive/WordCountMapper.java'
            as word, count
            cluster by word) map_output
    insert overwrite table word_count
    reduce map_output.word, map_output.count
    using '/user/new_user/hive/WordCountReducer.java'
    as word,count;

当我执行上述命令时出现错误：

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An error occurred when trying to close the Operator running your custom script.

我认为这可能是因为＆＃34; \ t＆＃34;我在表创建中使用的分隔符我在Mapper类中做了一些更改，并尝试使用带逗号的文件

String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line,",");

并更改了表格结构以使用＆＃34;，＆＃34; 在word_count表创建 - ＆gt;字段由＆＃39;终止，＆＃39;
但是我得到了同样的错误。

上述代码有什么问题？

Answer 1

它现在正在工作的原因是你试图使用Java。在您指向的示例中，作者使用的是python。请参阅文档here。

您作为自定义转换提供的脚本必须是可执行文件，并且必须能够接受标准输入的输入和标准输出的输出数据。因此，您实际上可以使用任何语言，甚至是bash。其他流行的选择是python，就像你链接的文章，或ruby等。

无论选择何种语言，都必须确保解释器在所有节点以及所有需要的库中都可用，否则脚本将失败。

您正在提供Java源代码，但这不会起作用。 Hive不会为你编译代码。

你可以使用java，但你必须构建一个可执行jar。请参阅this other post on how to do that。

自定义map-reduce通过hive错误

1 个答案: