阻止Hadoop将JobClient输出发送到命令行?

时间:2012-08-15 23:34:51

标签: shell command-line hadoop terminal mapreduce

我正在尝试编写一个shell脚本,它将在伪分布式集群上执行Hadoop MapReduce作业,但是省略所有输出之前没有!。我尝试将输出管道输出到awk并以这种方式过滤,这对大多数输出​​都有效,但我仍然从JobClient到终端输出。有办法防止这种情况吗?

我的代码目前看起来像这样:

#!/bin/sh

runtimes=$1

for i in {0..$runtimes}
do
  cd ~/Documents/hadoop-1.0.3
  bin/hadoop dfs -rmr /SC_out | awk "{}"
  bin/hadoop jar ../MapReduceTests/SyntaxCounter.jar mrt.SyntaxCounter /WC_in/ /SC_out/ | awk "{}"
  bin/hadoop dfs -cat /SC_out/part* | awk "\$0~/!Map/ {print \$0}"
done

编辑:这是我想要抑制的那种输出:

12/08/15 16:45:17 INFO mapred.JobClient: Running job: job_201208151042_0128
12/08/15 16:45:18 INFO mapred.JobClient:  map 0% reduce 0%
12/08/15 16:45:31 INFO mapred.JobClient:  map 100% reduce 0%
12/08/15 16:45:43 INFO mapred.JobClient:  map 100% reduce 100%

1 个答案:

答案 0 :(得分:1)

此输出在stderr上,而不是std out,因此修改如下:

bin/hadoop jar ../MapReduceTests/SyntaxCounter.jar mrt.SyntaxCounter \
    /WC_in/ /SC_out/  2>/dev/null | awk "{}"

或者更简单地说,调用verbose参数设置为false的作业:

job.waitForCompletion(false);