hadoop map中的文件系统访问异常减少

时间:2015-08-23 07:11:25

标签: hadoop mapreduce hdfs

我在Ubuntu 14.04上配置了Hadoop 2.6.0。我最初运行wordcount map reduce程序,以了解地图减少作业。我在访问文件系统时遇到了一些问题。我在/opt/hadoop2.6.0中拥有Hadoop主目录。

  1. 驱动程序

        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        // configuration should contain reference to your namenode
         FileSystem hdfs =FileSystem.get(new Configuration()); 
        Path workingDir=hdfs.getWorkingDirectory();
    
        Path newFolderPath= new Path("/output");
    
        newFolderPath=Path.mergePaths(workingDir, newFolderPath);
    
        if(hdfs.exists(newFolderPath))
    
        {
    
            hdfs.delete(newFolderPath, true); //Delete existing Directory
    
        }
        hdfs.mkdirs(newFolderPath); 
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job,newFolderPath );
        System.exit(job.waitForCompletion(true) ? 0 : 1); //line no. 76
        // job.submit();
    
  2. 芯-site.xml中

    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://localhost:9000</value>
        </property>
        <property>
            <name>hadoop.tmp.dir</name> 
            <value>/app/hadoop/tmp</value> 
        </property>
    </configuration>
    
  3. HDFS-site.xml中

    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>/opt/hadoop-2.6.0/dfs/name</value>
        </property>
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>/opt/hadoop-2.6.0/dfs/data</value>
        </property>
        <property>
            <name>dfs.permissions.enabled</name>
            <value>false</value>
        </property>
        <property>
            <name>dfs.http.address</name>
            <value>localhost:50070</value>
        </property>
        <property>
            <name>dfs.webhdfs.enabled</name>
            <value>true</value>
        </property>
    </configuration>
    
  4. 纱-site.xml中

    <configuration>
    
      <!-- Site specific YARN configuration properties --> <property>
            <name>yarn.nodemanager.aux-services</name>  
            <value>mapreduce_shuffle</value> </property> <property>      
            <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>   
              <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property>
     <property>
             <name>yarn.application.classpath</name>
             <value>
                  %HADOOP_HOME%\etc\hadoop,
                  %HADOOP_HOME%\share\hadoop\common\*,
                  %HADOOP_HOME%\share\hadoop\common\lib\*,
                  %HADOOP_HOME%\share\hadoop\hdfs\*,
                  %HADOOP_HOME%\share\hadoop\hdfs\lib\*,
                  %HADOOP_HOME%\share\hadoop\mapreduce\*,
                  %HADOOP_HOME%\share\hadoop\mapreduce\lib\*,
                  %HADOOP_HOME%\share\hadoop\yarn\*,
                  %HADOOP_HOME%\share\hadoop\yarn\lib\*
             </value> 
        </property> 
    

  5. 运行地图reduce jar:

     hadoop jar /home/ifs-admin/wordcount.jar  WordCount /user/ifs/input 
    

    执行例外:

        15/08/23 12:12:25 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
        15/08/23 12:12:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
        Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
        hdfs://localhost:9000/user/ifs-admin/output already exists
        at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
        at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:562)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
        at WordCount.main(WordCount.java:76)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
    
  6. 如果删除输出目录,则会显示以下错误:

       Exception in thread "main" ENOENT: No such file or directory
        at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmodImpl(Native Method)
        at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmod(NativeIO.java:230)
        at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:652)
        at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:490)
        at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:599)
        at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:182)
        at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
        at WordCount.main(WordCount.java:68)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
    
  7. 如何解决这个问题?

2 个答案:

答案 0 :(得分:0)

试试这段代码 注意 - 不要创建目录,hadoop会自动执行

FileSystem hdfs = FileSystem.get(new URI("hdfs://localhost:9000"),
            conf); 
    Path workingDir=hdfs.getWorkingDirectory();

    Path newFolderPath= new Path("/output");
    newFolderPath=Path.mergePaths(workingDir, newFolderPath);
    if(hdfs.exists(newFolderPath))

    {
        hdfs.delete(newFolderPath); //Delete existing Directory

    }
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job,newFolderPath );
    System.exit(job.waitForCompletion(true) ? 0 : 1); //line no. 76
    // job.submit();

答案 1 :(得分:0)

当我们在/ opt文件夹下设置temp.dir或name节点目录或数据节点目录时,Hadoop无法在hdfs中创建目录。 访问: https://unix.stackexchange.com/questions/11544/what-is-the-difference-between-opt-and-usr-local

我已将core-site.xml中的hadoop.tmp.dir更改为/ usr / local / hadoop / dfs / data。