Question

我是Hadoop生态系统的新手。

我最近在单节点集群上尝试了Hadoop（2.7.1）而没有任何问题，并决定转向具有1个namenode和2个datanode的多节点集群。

然而，我面临一个奇怪的问题。 无论我尝试运行什么职位，都会遇到以下消息：

在网络界面上：

YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register

并在cli中：

16/01/05 17:52:53 INFO mapreduce.Job: Running job: job_1451083949804_0001

他们甚至没有开始，在这一点上，我不确定我需要做出哪些改变才能使其发挥作用。

以下是我试图解决的问题：

在所有节点上禁用防火墙
设置较低的资源限制
在不同的机器，路由器和发行版下进行配置

我真的很感激任何帮助（即使是一点点提示）正确的方向。

我已按照这些说明（配置）：

Answer 1

我终于解决了这个问题。发布详细步骤以供将来参考。（仅适用于测试环境）

Hadoop（2.7.1）多节点群集配置

确保您拥有一个没有主机隔离的可靠网络。静态IP分配是优选的，或者至少具有非常长的DHCP租用。另外，所有节点（Namenode / master＆amp; Datanodes / slaves）都应该有一个具有相同密码的公共用户帐户;如果您不在，请在所有节点上创建此类用户帐户。在所有节点上使用相同的用户名和密码会使事情变得不那么复杂。
[在所有计算机上] 首先为单节点群集配置所有节点。您可以使用我在here上发布的脚本。

在新终端中执行这些命令

[在所有机器上] ↴

def download(path):

    def generate():
        with open(path, 'rb') as file_handler:
            while True:
                chunk = file_handler.read(1024)
                if not chunk:
                    break
                yield chunk
    return Response(generate(), direct_passthrough=True, mimetype='application/octet-stream',
                    headers={'Content-Disposition': 'attachment;filename={}'.format(os.path.basename(path))})

http_server = HTTPServer(WSGIContainer(APP))
http_server.listen(PORT, address='0.0.0.0')
IOLoop.instance().start()

[仅限Namenode / master] ↴

stop-dfs.sh;stop-yarn.sh;jps
rm -rf /tmp/hadoop-$USER

[仅限Datanodes / slaves] ↴

rm -rf ~/hadoop_store/hdfs/datanode

[在所有计算机上] 为群集中的所有节点添加IP地址和相应的主机名。
```
rm -rf ~/hadoop_store/hdfs/namenode
```
主机
```
sudo nano /etc/hosts
```

[在所有机器上] 配置iptables

允许您计划通过防火墙用于各种Hadoop守护程序的默认或自定义端口

OR

更容易，禁用iptables

关于RedHat，如发行版（Fedora，CentOS）

xxx.xxx.xxx.xxx master
xxx.xxx.xxx.xxy slave1
xxx.xxx.xxx.xxz slave2
# Additionally you may need to remove lines like "xxx.xxx.xxx.xxx localhost", "xxx.xxx.xxx.xxy localhost", "xxx.xxx.xxx.xxz localhost" etc if they exist.
# However it's okay keep lines like "127.0.0.1 localhost" and others.

sudo systemctl disable firewalld
sudo systemctl stop firewalld

[仅限Namenode / master] 从Namenode（master）获取对所有Datnode（从属）的ssh访问。
```
sudo ufw disable
```
通过运行ssh-copy-id -i ~/.ssh/id_rsa.pub $USER@slave1 ssh-copy-id -i ~/.ssh/id_rsa.pub $USER@slave2，ping slave1，ssh slave1，ping slave2等确认相关内容。您应该有适当的回复。（请记住通过键入ssh slave2或关闭终端退出每个ssh会话。为了更安全，我还确保所有节点都能够相互访问，而不仅仅是Namenode / master。）< / p>
[在所有计算机上] 编辑core-site.xml文件
```
exit
```
芯-site.xml中
```
nano /usr/local/hadoop/etc/hadoop/core-site.xml
```

[在所有计算机上] 编辑yarn-site.xml文件

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>master:9000</value>
        <description>NameNode URI</description>
    </property>
</configuration>

纱-site.xml中

nano /usr/local/hadoop/etc/hadoop/yarn-site.xml

[在所有机器上] 修改奴隶文件，删除文本＆＃34; localhost＆＃34;并添加从属主机名

<configuration>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master</value>
        <description>The hostname of the RM.</description>
    </property>
    <property>
         <name>yarn.nodemanager.aux-services</name>
         <value>mapreduce_shuffle</value>
    </property>
    <property>
         <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
         <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
</configuration>

从站

nano /usr/local/hadoop/etc/hadoop/slaves

（我想这只在Namenode / master上也可以使用但是我在所有机器上都这样做了。还要注意，在这个配置中，master只表现为资源管理器，这就是我的意图。）

[在所有计算机上] 修改hdfs-site.xml文件，将属性slave1 slave2的值更改为＆gt; 1（至少到集群中的从站数量;这里我有两个从站，所以我将其设置为2）
[仅限Namenode / master] （重新）通过namenode格式化HDFS
```
dfs.replication
```
[可选]
- 从master＆lt; hdfs-site.xml文件中删除hdfs namenode -format属性。
- 从所有slave的hdfs-site.xml文件中删除dfs.datanode.data.dir属性。

测试（仅在Namenode / master上执行）

dfs.namenode.name.dir

等待几秒钟，映射器和减速器应该开始。

这些链接帮助我解决了这个问题：

Answer 2

I met the same problem when I ran

"hadoop jar hadoop-mapreduce-examples-2.6.4.jar wordcount /calculateCount/ /output"

this command stopped there,

I tracked the job, and find "there are 15 missing blocks, and they are all corrupted"

then I did the following: 1) ran "hdfs fsck / " 2) ran "hdfs fsck / -delete " 3) added "-A INPUT -p tcp -j ACCEPT" to /etc/sysconfig/iptables on the two datanodes 4) ran "stop-all.sh and start-all.sh"

everything goes well

I think the firewall is the key point.

YarnApplicationState：ACCEPTED：等待分配，启动和注册AM容器

2 个答案: