第一次运行Hadoop MapReduce字数失败了吗?

时间:2016-07-16 11:23:04

标签: hadoop mapreduce

第一次失败时运行Hadoop字数计算示例。这就是我正在做的事情:

  1. 格式名称节点:$HADOOP_HOME/bin/hdfs namenode -format

  2. 启动HDFS / YARN:

    $HADOOP_HOME/sbin/start-dfs.sh
    $HADOOP_HOME/sbin/start-yarn.sh
    $HADOOP_HOME/sbin/yarn-daemon.sh start nodemanager
    
  3. 运行wordcount:hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount input output

  4. (让我们说输入文件夹已经在HDFS中了我不会把每一个命令放在这里)

    输出:

    16/07/17 01:04:34 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/172.20.0.2:8032
    16/07/17 01:04:35 INFO input.FileInputFormat: Total input paths to process : 2
    16/07/17 01:04:35 INFO mapreduce.JobSubmitter: number of splits:2
    16/07/17 01:04:36 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1468688654488_0001
    16/07/17 01:04:36 INFO impl.YarnClientImpl: Submitted application application_1468688654488_0001
    16/07/17 01:04:36 INFO mapreduce.Job: The url to track the job: http://hadoop-master:8088/proxy/application_1468688654488_0001/
    16/07/17 01:04:36 INFO mapreduce.Job: Running job: job_1468688654488_0001
    16/07/17 01:04:46 INFO mapreduce.Job: Job job_1468688654488_0001 running in uber mode : false
    16/07/17 01:04:46 INFO mapreduce.Job:  map 0% reduce 0%
    Terminated
    

    然后HDFS崩溃,因此我无法访问http://localhost:50070/

    然后我重新开始传送(重复步骤2),重新运行示例,一切都很好。

    如何在第一次运行时修复它?我的HDFS第一次显然没有数据,也许这就是问题所在?

    更新

    运行一个更简单的例子也失败了:

    hadoop@8f98bf86ceba:~$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples*.jar pi 3 3
    
    Number of Maps  = 3
    Samples per Map = 3
    Wrote input for Map #0
    Wrote input for Map #1
    Wrote input for Map #2
    Starting Job
    16/07/17 03:21:28 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/172.20.0.3:8032
    16/07/17 03:21:29 INFO input.FileInputFormat: Total input paths to process : 3
    16/07/17 03:21:29 INFO mapreduce.JobSubmitter: number of splits:3
    16/07/17 03:21:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1468696855031_0001
    16/07/17 03:21:31 INFO impl.YarnClientImpl: Submitted application application_1468696855031_0001
    16/07/17 03:21:31 INFO mapreduce.Job: The url to track the job: http://hadoop-master:8088/proxy/application_1468696855031_0001/
    16/07/17 03:21:31 INFO mapreduce.Job: Running job: job_1468696855031_0001
    16/07/17 03:21:43 INFO mapreduce.Job: Job job_1468696855031_0001 running in uber mode : false
    16/07/17 03:21:43 INFO mapreduce.Job:  map 0% reduce 0%
    

    同样的问题,HDFS终止

2 个答案:

答案 0 :(得分:0)

您的帖子看起来不完整,无法在此处推断出错误。我的猜测是hadoop-mapreduce-examples-2.7.2-sources.jar不是你想要的。您更有可能需要hadoop-mapreduce-examples-2.7.2.jar包含.class个文件,而不是源文件。

答案 1 :(得分:0)

在成功运行MapReduce作业之前,必须首次重新启动HDFS。这是因为HDFS在第一次运行时创建了一些数据,但是停止它可以清理它的状态,因此MapReduce作业可以在之后通过YARN运行。

所以我的解决方案是:

  1. 启动Hadoop:$HADOOP_HOME/sbin/start-dfs.sh
  2. 停止Hadoop:$HADOOP_HOME/sbin/stop-dfs.sh
  3. 再次启动Hadoop:$HADOOP_HOME/sbin/start-dfs.sh