当hadoop不在同一主机中时,从eclipse执行MapReduce时出错

时间:2015-08-28 13:24:01

标签: eclipse hadoop mapreduce cloudera

我看到几个帖子涉及这个主题和各种网站,也在stackoverflow中,但我不明白真正的问题是什么,也不知道我能做些什么来解决它。事实上我对Hadoop很新。

所以,我正在使用Cloudera Quickstart 5.4.2虚拟机。我使用嵌入式eclipse开发了典型的WordCount示例,一切运行正常。

现在我正在尝试在虚拟机外的另一个eclipse中执行相同的代码并获得ConnectException。

与VM的连接正常,我的“输出”目录的创建已经完成,但在执行map()/ reduce()任务之前就失败了。

为了确定我的情况: 主机:

  • CentOS 6.6 x64与Oracle JDK 1.7.0_67
  • Hadoop 2.6.0(来自Apache)已下载并解压缩到/ opt / hadoop,$ HADOOP_HOME已设置且$ HADOOP_HOME / bin已添加到$ PATH
  • Eclipse Luna和Maven 3.0.4
  • 命令hdfs dfs -ls hdfs:// quickstart:8020 / user / cloudera列出我放入此路径的所有文件

VM:

  • Cloudera Quickstart 5.4.2(hadoop 2.6.0-cdh5.4.2)
  • HDFS正常运行(已检入Cloudera Manager)

我在互联网上看到的是文件mapred-site.xml中的一些“Hadoop客户端配置”。如果我创建它,我的主机中不存在此文件?在哪里?

我的错误是:

15/08/28 15:21:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/08/28 15:21:10 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
15/08/28 15:21:10 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/08/28 15:21:10 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/08/28 15:21:10 WARN mapreduce.JobSubmitter: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
15/08/28 15:21:10 INFO input.FileInputFormat: Total input paths to process : 1
15/08/28 15:21:10 INFO mapreduce.JobSubmitter: number of splits:1
15/08/28 15:21:11 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local13835919_0001
15/08/28 15:21:11 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
15/08/28 15:21:11 INFO mapreduce.Job: Running job: job_local13835919_0001
15/08/28 15:21:11 INFO mapred.LocalJobRunner: OutputCommitter set in config null
15/08/28 15:21:11 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
15/08/28 15:21:11 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
15/08/28 15:21:11 INFO mapred.LocalJobRunner: Waiting for map tasks
15/08/28 15:21:11 INFO mapred.LocalJobRunner: Starting task: attempt_local13835919_0001_m_000000_0
15/08/28 15:21:11 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
15/08/28 15:21:11 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
15/08/28 15:21:11 INFO mapred.MapTask: Processing split: hdfs://192.168.111.128:8020/user/cloudera/input/download/pg4363.txt:0+408781
15/08/28 15:21:11 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
15/08/28 15:21:11 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
15/08/28 15:21:11 INFO mapred.MapTask: soft limit at 83886080
15/08/28 15:21:11 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
15/08/28 15:21:11 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
15/08/28 15:21:11 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
15/08/28 15:21:11 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader.
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
    at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3108)
    at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:778)
    at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:693)
    at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:354)
    at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:621)
    at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:847)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:897)
    at java.io.DataInputStream.read(DataInputStream.java:100)
    at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
    at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
    at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:143)
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:183)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
    at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
    at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
15/08/28 15:21:11 WARN hdfs.DFSClient: Failed to connect to /127.0.0.1:50010 for block, add to deadNodes and continue. java.net.ConnectException: Connection refused
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
    at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3108)
    at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:778)
    at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:693)
    at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:354)
    at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:621)
    at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:847)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:897)
    at java.io.DataInputStream.read(DataInputStream.java:100)
    at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
    at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
    at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:143)
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:183)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
    at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
    at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
15/08/28 15:21:11 INFO hdfs.DFSClient: Could not obtain BP-286282631-127.0.0.1-1433865208026:blk_1073742256_1432 from any node: java.io.IOException: No live nodes contain block BP-286282631-127.0.0.1-1433865208026:blk_1073742256_1432 after checking nodes = [DatanodeInfoWithStorage[127.0.0.1:50010,DS-3299869f-57b5-40b1-9917-9d69cd32f1d2,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[127.0.0.1:50010,DS-3299869f-57b5-40b1-9917-9d69cd32f1d2,DISK] Dead nodes:  DatanodeInfoWithStorage[127.0.0.1:50010,DS-3299869f-57b5-40b1-9917-9d69cd32f1d2,DISK]. Will get new block locations from namenode and retry...
15/08/28 15:21:11 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 1838.582183063173 msec.
15/08/28 15:21:12 INFO mapreduce.Job: Job job_local13835919_0001 running in uber mode : false
15/08/28 15:21:12 INFO mapreduce.Job:  map 0% reduce 0%
15/08/28 15:21:13 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader.
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
    at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3108)
    at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:778)
    at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:693)
    at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:354)
    at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:621)
    at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:847)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:897)
    at java.io.DataInputStream.read(DataInputStream.java:100)
    at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
    at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
    at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:143)
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:183)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
    at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
    at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
15/08/28 15:21:13 WARN hdfs.DFSClient: Failed to connect to /127.0.0.1:50010 for block, add to deadNodes and continue. java.net.ConnectException: Connection refused
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
    at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3108)
    at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:778)
    at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:693)
    at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:354)
    at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:621)
    at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:847)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:897)
    at java.io.DataInputStream.read(DataInputStream.java:100)
    at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
    at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
    at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:143)
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:183)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
    at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
    at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
15/08/28 15:21:13 INFO hdfs.DFSClient: Could not obtain BP-286282631-127.0.0.1-1433865208026:blk_1073742256_1432 from any node: java.io.IOException: No live nodes contain block BP-286282631-127.0.0.1-1433865208026:blk_1073742256_1432 after checking nodes = [DatanodeInfoWithStorage[127.0.0.1:50010,DS-3299869f-57b5-40b1-9917-9d69cd32f1d2,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[127.0.0.1:50010,DS-3299869f-57b5-40b1-9917-9d69cd32f1d2,DISK] Dead nodes:  DatanodeInfoWithStorage[127.0.0.1:50010,DS-3299869f-57b5-40b1-9917-9d69cd32f1d2,DISK]. Will get new block locations from namenode and retry...
15/08/28 15:21:13 WARN hdfs.DFSClient: DFS chooseDataNode: got # 2 IOException, will wait for 4651.463132320157 msec.

我不明白为什么有任何尝试连接到127.0.0.1:50010,因为我的hadoopclúster不在此主机上,但在快速启动VM(192.168.xy,所有/ etc / hosts都启动了)到目前为止)。我想我必须配置一些东西,但我不知道是什么,或者在哪里......

我会在Windows 7 x64上尝试相同的事情,配置winutils等等,我也有同样的例外。

谢谢!

1 个答案:

答案 0 :(得分:0)

好吧,它似乎是Cloudera Quickstart VM配置问题。它适用于Hortonworks Sandbox。

区别在于: - hortonworks沙箱配置为使用" real" IP以定位其节点 - cloudera VM配置为使用" localhost" IP用于定位其节点(此处为数据节点)。

因此,当我尝试从Windows中的Cloudera虚拟集群中执行MR时,由资源管理器返回的配置为127.0.0.1而不是真实的IP,当然,我没有任何数据我本地机器上的节点。