在伪分布式模式下运行Hadoop时,我应该为hadoop.tmp.dir使用哪个目录?

时间:2012-10-05 15:13:49

标签: linux ubuntu configuration hadoop hbase

默认情况下,Hadoop将hadoop.tmp.dir设置为/ tmp文件夹。这是一个问题,因为当你重新启动时,/ tmp会被Linux消灭,导致JobTracker出现这个可爱的错误:

2012-10-05 07:41:13,618 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s).    
...    
2012-10-05 07:41:22,636 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
2012-10-05 07:41:22,643 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: null
java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)    

我发现修复此问题的唯一方法是重新格式化您的名称节点,该节点重建/ tmp / hadoop-root文件夹,当然重新启动时会再次将其删除。

所以我继续创建了一个名为/ hadoop_temp的文件夹,并为所有用户提供了对它的读/写访问权限。然后我在core-site.xml中设置了这个属性:

 <property>
          <name>hadoop.tmp.dir</name>
          <value>file:///hadoop_temp</value>
 </property>

当我重新格式化我的名字节点时,Hadoop似乎很高兴,给我这个消息:

12/10/05 07:58:54 INFO common.Storage: Storage directory file:/hadoop_temp/dfs/name has been successfully formatted.

但是,当我查看/ hadoop_temp时,我注意到该文件夹​​是空的。然后当我重新启动Hadoop并检查我的JobTracker日志时,我看到了:

2012-10-05 08:02:41,988 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
...
2012-10-05 08:02:51,010 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
2012-10-05 08:02:51,011 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: null
java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on connection exception: java.net.ConnectException: Connection refused

当我查看我的namenode日志时,我看到了:

2012-10-05 08:00:31,206 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /opt/hadoop/hadoop-0.20.2/file:/hadoop_temp/dfs/name does not exist.
2012-10-05 08:00:31,212 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /opt/hadoop/hadoop-0.20.2/file:/hadoop_temp/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.

所以,显然我没有正确配置。 Hadoop仍希望在/ tmp文件夹中看到它的文件,即使我在core-site.xml中将hadoop.tmp.dir设置为/ hadoop_temp。我做错了什么? hadoop.tmp.dir接受的“正确”值是什么?

奖金问题:我应该为hbase.tmp.dir使用什么?

系统信息:

Ubuntu 12.04, Apache Hadoop .20.2, Apache HBase .92.1

谢谢你看看!

2 个答案:

答案 0 :(得分:3)

感谢Hadoop在Hadoop邮件列表上帮助我解决这个问题。引用他:

“在基于0.20.x或1.x的版本中,不要使用hadoop.tmp.dir的file:///前缀。”

我拿出了文件://前缀,它起作用了。

答案 1 :(得分:0)

同样使用Hbase 0.94 *,您必须指定

<property> <name>hbase.cluster.distributed</name> <value>true</value> </property>