Hadoop Map减少wordcount随机错误:超过MAX_FAILED_UNIQUE_FETCHES;想逃出

时间:2012-07-03 10:18:35

标签: java hadoop

我已经使用以下网站的manul安装和配置hadoop作为单个节点。

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/#running-a-mapreduce-job

我编译了wordcount示例并运行它,但需要很长时间并生成Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

hduser@aptunix0043:/usr/local/hadoop/src$ hadoop jar WordCount.jar org/apache/hadoop/examples/WordCount input  ot

****hdfs://localhost:54310/user/hduser/input
12/07/03 02:52:35 INFO input.FileInputFormat: Total input paths to process : 1
12/07/03 02:52:36 INFO mapred.JobClient: Running job: job_201207030248_0002
12/07/03 02:52:37 INFO mapred.JobClient:  map 0% reduce 0%
12/07/03 02:52:52 INFO mapred.JobClient:  map 100% reduce 0%
12/07/03 03:21:26 INFO mapred.JobClient: Task Id :attempt_201207030248_0002_r_000000_0, Status : FAILED 
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

12/07/03 03:21:47 WARN mapred.JobClient: Error reading task outputConnection timed out
12/07/03 03:22:08 WARN mapred.JobClient: Error reading task outputConnection timed out
 /user/hduser/input/*12/07/03 03:50:01 INFO mapred.JobClient: Task Id :      attempt_201207030248_0002_r_000000_1, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
12/07/03 03:50:22 WARN mapred.JobClient: Error reading task outputConnection timed out
12/07/03 03:50:43 WARN mapred.JobClient: Error reading task outputConnection timed out
12/07/03 04:18:35 INFO mapred.JobClient: Task Id :  attempt_201207030248_0002_r_000000_2, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
12/07/03 04:18:56 WARN mapred.JobClient: Error reading task outputConnection timed out
12/07/03 04:19:17 WARN mapred.JobClient: Error reading task outputConnection timed out
12/07/03 04:47:15 INFO mapred.JobClient: Job complete: job_201207030248_0002
12/07/03 04:47:15 INFO mapred.JobClient: Counters: 23
12/07/03 04:47:15 INFO mapred.JobClient:   Job Counters
12/07/03 04:47:15 INFO mapred.JobClient:     Launched reduce tasks=4
12/07/03 04:47:15 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=12948
12/07/03 04:47:15 INFO mapred.JobClient:     Total time spent by all reduces waiting  after reserving slots (ms)=0
12/07/03 04:47:15 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
12/07/03 04:47:15 INFO mapred.JobClient:     Launched map tasks=1
12/07/03 04:47:15 INFO mapred.JobClient:     Data-local map tasks=1
12/07/03 04:47:15 INFO mapred.JobClient:     Failed reduce tasks=1
12/07/03 04:47:15 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=16469
12/07/03 04:47:15 INFO mapred.JobClient:   FileSystemCounters
12/07/03 04:47:15 INFO mapred.JobClient:     HDFS_BYTES_READ=661744
12/07/03 04:47:15 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=288616
12/07/03 04:47:15 INFO mapred.JobClient:   File Input Format Counters
12/07/03 04:47:15 INFO mapred.JobClient:     Bytes Read=661630
12/07/03 04:47:15 INFO mapred.JobClient:   Map-Reduce Framework
12/07/03 04:47:15 INFO mapred.JobClient:     Map output materialized bytes=267085
12/07/03 04:47:15 INFO mapred.JobClient:     Combine output records=18040
12/07/03 04:47:15 INFO mapred.JobClient:     Map input records=12761
12/07/03 04:47:15 INFO mapred.JobClient:     Physical memory (bytes) snapshot=183209984
12/07/03 04:47:15 INFO mapred.JobClient:     Spilled Records=18040
12/07/03 04:47:15 INFO mapred.JobClient:     Map output bytes=1086716
12/07/03 04:47:15 INFO mapred.JobClient:     CPU time spent (ms)=1940
12/07/03 04:47:15 INFO mapred.JobClient:     Total committed heap usage  (bytes)=162856960
12/07/03 04:47:15 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=393482240
12/07/03 04:47:15 INFO mapred.JobClient:     Combine input records=109844
12/07/03 04:47:15 INFO mapred.JobClient:     Map output records=109844
12/07/03 04:47:15 INFO mapred.JobClient:     SPLIT_RAW_BYTES=114

任何线索?

2 个答案:

答案 0 :(得分:1)

为了帮助那些在互联网上搜索并像我这样访问此页面的人,您可能会遇到2个问题

  1. DNS解析 - 确保在安装hadoop时为每台主机使用完全限定的域名

  2. 防火墙 - 防火墙可能会阻止端口50060,50030以及基于您的hadoop分发的更多端口(cloudera为7182,7180)

答案 1 :(得分:0)

之前我遇到此错误,它是由DNS问题引起的。你在基于Linux的发行版中运行吗?如果是这样,请确保所有/ etc / hosts都是同步的,在我的情况下,我使用了每个节点“slave1 192.168.1.23”等的别名......但是这与盒子名称不匹配所以我不得不改变它,或者你可以改变你的盒子名称,以匹配hadoop conf中“奴隶”对应的那个。