将数据节点添加到hadoop集群

时间:2012-05-28 06:25:43

标签: hadoop

当我使用start-all.sh启动hadoopnode1时,它成功启动主服务器和从服务器上的服务(请参阅slave的jps命令输出)。但是,当我试图看到管理屏幕从属节点中的活动节点没有显示。即使我从master运行hadoop fs -ls /命令它运行完美,但是从salve它显示错误消息

@hadoopnode2:~/hadoop-0.20.2/conf$ hadoop fs -ls /
12/05/28 01:14:20 INFO ipc.Client: Retrying connect to server: hadoopnode1/192.168.1.120:8020. Already tried 0 time(s).
12/05/28 01:14:21 INFO ipc.Client: Retrying connect to server: hadoopnode1/192.168.1.120:8020. Already tried 1 time(s).
12/05/28 01:14:22 INFO ipc.Client: Retrying connect to server: hadoopnode1/192.168.1.120:8020. Already tried 2 time(s).
12/05/28 01:14:23 INFO ipc.Client: Retrying connect to server: hadoopnode1/192.168.1.120:8020. Already tried 3 time(s).
.
.
.
12/05/28 01:14:29 INFO ipc.Client: Retrying connect to server: hadoopnode1/192.168.1.120:8020. Already tried 10 time(s).

看起来slave(hadoopnode2)无法找到/连接主节点(hadoopnode1)

请指出我缺少的东西?

以下是主节点和从节点的设置 - 附: - 主和从运行相同版本的Linux和Hadoop和SSH工作正常, 因为我可以从主节点

启动从属设备

对master(hadooopnode1)和slave(hadoopnode2)上的core-site.xml,hdfs-site.xml和mapred-site.xml也有相同的设置

操作系统 - Ubuntu 10 Hadoop版本 -

oop@hadoopnode1:~/hadoop-0.20.2/conf$ hadoop version
Hadoop 0.20.2
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707
Compiled by chrisdo on Fri Feb 19 08:07:34 UTC 2010

- 大师(hadoopnode1)

hadoop@hadoopnode1:~/hadoop-0.20.2/conf$ uname -a
Linux hadoopnode1 2.6.35-32-generic #67-Ubuntu SMP Mon Mar 5 19:35:26 UTC 2012 i686 GNU/Linux

hadoop@hadoopnode1:~/hadoop-0.20.2/conf$ jps
9923 Jps
7555 NameNode
8133 TaskTracker
7897 SecondaryNameNode
7728 DataNode
7971 JobTracker

masters -> hadoopnode1
slaves -> hadoopnode1
hadoopnode2

- 奴隶(hadoopnode2)

hadoop@hadoopnode2:~/hadoop-0.20.2/conf$ uname -a
Linux hadoopnode2 2.6.35-32-generic #67-Ubuntu SMP Mon Mar 5 19:35:26 UTC 2012 i686 GNU/Linux

hadoop@hadoopnode2:~/hadoop-0.20.2/conf$ jps
1959 DataNode
2631 Jps
2108 TaskTracker

masters - hadoopnode1

core-site.xml
hadoop@hadoopnode2:~/hadoop-0.20.2/conf$ cat core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/var/tmp/hadoop/hadoop-${user.name}</value>
                <description>A base for other temp directories</description>
        </property>

        <property>
                <name>fs.default.name</name>
                <value>hdfs://hadoopnode1:8020</value>
                <description>The name of the default file system</description>
        </property>

</configuration>

hadoop@hadoopnode2:~/hadoop-0.20.2/conf$ cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
                <name>mapred.job.tracker</name>
                <value>hadoopnode1:8021</value>
                <description>The host and port that the MapReduce job tracker runs at.If "local", then jobs are run in process as a single map</description>
        </property>
</configuration>

hadoop@hadoopnode2:~/hadoop-0.20.2/conf$ cat hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>2</value>
                <description>Default block replication</description>
        </property>
</configuration>

7 个答案:

答案 0 :(得分:2)

看起来问题不仅仅是奴隶,而且还有主节点(hadoopnode1)。当我从master检查日志时,我看到了相同的错误,它可以连接到hadoopnode1

从主节点(hadoopnode1)记录。我将环回地址更改为127.0.0.1

2012-05-30 20:54:31,760 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoopnode1/127.0.0.1:8020. Already tried 0 time(s).
2012-05-30 20:54:32,761 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoopnode1/127.0.0.1:8020. Already tried 1 time(s).
2012-05-30 20:54:33,764 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoopnode1/127.0.0.1:8020. Already tried 2 time(s).
2012-05-30 20:54:34,764 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoopnode1/127.0.0.1:8020. Already tried 3 time(s).
.
.
hadoopnode1/127.0.0.1:8020. Already tried 8 time(s).
2012-05-30 20:54:40,782 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoopnode1/127.0.0.1:8020. Already tried 9 time(s).
2012-05-30 20:54:40,784 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: null
java.net.ConnectException: Call to hadoopnode1/127.0.0.1:8020 failed on connection exception: java.net.ConnectException: Connection refused
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)

这是我的/ etc / hosts文件

192.168.1.120   hadoopnode1     # Added by NetworkManager
127.0.0.1       localhost.localdomain   localhost hadoopnode1
::1     hadoopnode1     localhost6.localdomain6 localhost6
192.168.1.121   hadoopnode2
# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

我真的很困惑,这将如何运作。我正在尝试从过去15天开始制作集群。任何帮助表示赞赏。

@ Raze2dust-我删除了所有tmp文件,但是现在问题看起来还有别的。我的想法。更多的名称解析问题

@William Yao - 没有安装curl,但是我能够互相ping服务器并且还能够连接到SSH

答案 1 :(得分:1)

在Web GUI中,您可以看到群集具有的节点数。如果看到的内容少于预期,请确保master上的/ etc / hosts文件仅作为主机(对于2节点集群)。

192.168.0.1 master
192.168.0.2 slave

如果你看到任何127.0 ..... ip然后注释掉,因为Hadoop会首先看到它们作为主机。 我有上面的问题,我解决了上面的方式。希望这会有所帮助。

答案 2 :(得分:1)

通过sudo jps查看您的服务 主人不应该显示你需要做什么

Restart Hadoop
Go to /app/hadoop/tmp/dfs/name/current
Open VERSION (i.e. by vim VERSION)
Record namespaceID
Go to /app/hadoop/tmp/dfs/data/current
Open VERSION (i.e. by vim VERSION)
Replace the namespaceID with the namespaceID you recorded in step 4.

这应该有效。祝你好运

答案 3 :(得分:0)

检查namenode和datanode日志。 (应该在$HADOOP_HOME/logs/)。最有可能的问题可能是namenode和datanode ID不匹配。从所有节点中删除hadoop.tmp.dir并再次格式化namenode($HADOOP_HOME/bin/hadoop namenode -format),然后重试。

答案 4 :(得分:0)

我认为在奴隶2.奴隶2应该听同一个端口8020,而不是在8021听。

答案 5 :(得分:0)

将新节点主机名添加到从属文件并启动数据节点&amp;新节点上的任务跟踪器。

答案 6 :(得分:0)

你的情况确实有两个错误。

can't connect to hadoop master node from slave

那是网络问题。测试一下:卷曲192.168.1.120:8020。

正常响应:curl:(52)来自服务器的空回复

在我的情况下,我发现主机未找到错误。所以,只需看看防火墙设置

data node down:

这是hadoop问题。 Raze2dust的方法很好。如果您在日志中看到不兼容的namespaceIDs错误,这是另一种方法:

停止hadoop并在/ current / VERSION中编辑namespaceID的值以匹配当前namenode的值,然后启动hadoop。

您始终可以使用以下网址检查可用的数据节点:hadoop fsck /