我正在使用Hadoop上的PartialBuilder实现从Apache Mahout 0.9(org.apache.mahout.classifier.df.mapreduce.BuildForest)运行一个示例,但无论我尝试什么,我都会收到错误。
错误是:
14/12/10 10:58:36 INFO mapred.JobClient: Running job: job_201412091528_0004
14/12/10 10:58:37 INFO mapred.JobClient: map 0% reduce 0%
14/12/10 10:58:50 INFO mapred.JobClient: map 10% reduce 0%
14/12/10 10:59:44 INFO mapred.JobClient: map 20% reduce 0%
14/12/10 11:32:23 INFO mapred.JobClient: Task Id : attempt_201412091528_0004_m_000000_0, Status : FAILED
java.io.IOException: All datanodes 127.0.0.1:50010 are bad. Aborting...
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3290)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2200(DFSClient.java:2783)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2987)
datanode日志文件中没有明显的错误:
2014-12-10 11:32:19,157 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:50010, dest: /127.0.0.1:62662, bytes: 549024, op: HDFS_READ, cliID: DFSClient_attempt_201412091528_0004_m_000000_0_1249767243_1, offset: 0, srvID: DS-957555695-10.0.1.9-50010-1418164764736, blockid: blk_-7548306051547444322_12804, duration: 2012297405000
2014-12-10 11:32:25,511 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:50010, dest: /127.0.0.1:64109, bytes: 1329, op: HDFS_READ, cliID: DFSClient_attempt_201412091528_0004_m_000000_1_-1362169490_1, offset: 0, srvID: DS-957555695-10.0.1.9-50010-1418164764736, blockid: blk_4126779969492093101_12817, duration: 285000
2014-12-10 11:32:29,496 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:50010, dest: /127.0.0.1:64110, bytes: 67633152, op: HDFS_READ, cliID: DFSClient_attempt_201412091528_0004_m_000000_1_-1362169490_1, offset: 0, srvID: DS-9575556
...或在namenode文件中。 jobtracker只重复datanode日志中的错误。失败几分钟之前的一个错误是EOF错误,这可能是也可能不是PartialBuilder的问题:
2014-12-10 12:12:22,060 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-957555695-10.0.1.9-50010-1418164764736, infoPort=50075, ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 65557 bytes
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:296)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:340)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:404)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:582)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:404)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
at java.lang.Thread.run(Thread.java:695)
我可以直接读写文件到DFS。我甚至可以在一小部分数据上运行这个工作,但是我无法让这个Map / Reduce工作正常工作。知道我做错了吗?
关于我的设置的注意事项:
hdfs-site.xml设置:
答案 0 :(得分:2)
在弄乱了一百万个设置后,没有一个工作,我终于通过大幅减少分割大小来解决这个问题:
-Dmapred.max.split.size=16777216
这使得数据集的映射器数量从10增加到40,这使得它们能够正确完成。现在我已经解决了这个问题,我将稳步增加分割大小以找到正确的数字。 (对于随机森林,您应该找到最大的分割以获得最佳结果。)
不幸的是,我不知道为什么分割大小导致“所有数据节点都很糟糕。正在中止”错误,因为这不是我期望的错误。