在运行hadoop字数时,映射100%减少0%

时间:2013-10-24 05:25:35

标签: java hadoop mapreduce

我想在eclipse上对集群运行hadoop字数。但是我收到了一个错误。我更改了输出目录,但程序行为没有变化。 你能帮我解决一下这个错误:

     2013-10-23 23:06:13,783 WARN  [main] conf.Configuration  
(Configuration.java:warnOnceIfDeprecated(816)) - session.id is deprecated. Instead, use 
dfs.metrics.session-id
2013-10-23 23:06:13,794 INFO  [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) -   
Initializing JVM Metrics with processName=JobTracker, sessionId=
2013-10-23 23:06:13,829 INFO  [main] jvm.JvmMetrics (JvmMetrics.java:init(71)) - Cannot 
initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-10-23 23:06:13,915 WARN  [main] util.NativeCodeLoader (NativeCodeLoader.java:
<clinit>(62)) - Unable to load native-hadoop library for your platform... using   
builtin-java classes where applicable
2013-10-23 23:06:13,947 WARN  [main] mapreduce.JobSubmitter 
(JobSubmitter.java:copyAndConfigureFiles(138)) - Hadoop command-line option parsing not 
performed. Implement the Tool interface and execute your application with ToolRunner to 
remedy this.
2013-10-23 23:06:13,962 WARN  [main] mapreduce.JobSubmitter 
(JobSubmitter.java:copyAndConfigureFiles(247)) - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2013-10-23 23:06:13,978 WARN  [main] snappy.LoadSnappy (LoadSnappy.java:<clinit>(46)) - 
Snappy native library not loaded
2013-10-23 23:06:13,985 INFO  [main] mapred.FileInputFormat 
(FileInputFormat.java:listStatus(233)) - Total input paths to process : 1
2013-10-23 23:06:14,107 INFO  [main] mapreduce.JobSubmitter 
(JobSubmitter.java:submitJobInternal(368)) - number of splits:1
2013-10-23 23:06:14,167 WARN  [main] conf.Configuration 
(Configuration.java:warnOnceIfDeprecated(816)) - mapred.output.value.class is   
deprecated. Instead, use mapreduce.job.output.value.class
2013-10-23 23:06:14,168 WARN  [main] conf.Configuration 
(Configuration.java:warnOnceIfDeprecated(816)) - mapred.job.name is deprecated. 
Instead, use mapreduce.job.name
2013-10-23 23:06:14,169 WARN  [main] conf.Configuration 
(Configuration.java:warnOnceIfDeprecated(816)) - mapred.input.dir is deprecated. 
Instead, use mapreduce.input.fileinputformat.inputdir
2013-10-23 23:06:14,169 WARN  [main] conf.Configuration 
(Configuration.java:warnOnceIfDeprecated(816)) - mapred.output.dir is deprecated. 
 Instead, use mapreduce.output.fileoutputformat.outputdir
2013-10-23 23:06:14,169 WARN  [main] conf.Configuration 
(Configuration.java:warnOnceIfDeprecated(816)) - mapred.map.tasks is deprecated. 
Instead, use mapreduce.job.maps
2013-10-23 23:06:14,170 WARN  [main] conf.Configuration 
(Configuration.java:warnOnceIfDeprecated(816)) - mapred

和MyHadoopDriver是:

package org.orzota.bookx.mappers;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

public class MyHadoopDriver {

public static void main(String[] args) {
    JobClient client = new JobClient();
    JobConf conf = new JobConf(
            org.orzota.bookx.mappers.MyHadoopDriver.class);
    conf.setJobName("BookCrossing1.0");


    // TODO: specify output types
    conf.setOutputKeyClass(Text.class);
    conf.setOutputValueClass(IntWritable.class);

    // TODO: specify a mapper
    conf.setMapperClass(org.orzota.bookx.mappers.MyHadoopMapper.class);

    // TODO: specify a reducer
    conf.setReducerClass(org.orzota.bookx.mappers.MyHadoopReducer.class);
    /////////////////////////////////////////////
    conf.setInputFormat(TextInputFormat.class);
    conf.setOutputFormat(TextOutputFormat.class);

    ///////////////////////////////////////////////
    FileInputFormat.setInputPaths(conf, new Path(args[0]));
    FileOutputFormat.setOutputPath(conf, new Path(args[1]));

    ////////////////////////////////////////////////
    client.setConf(conf);
    try {
        JobClient.runJob(conf);
    } catch (Exception e) {
        e.printStackTrace();
    }
}

}

2 个答案:

答案 0 :(得分:1)

Hadoop代码库中有两个FileOutputFormat类,您选择了已弃用的类。

你到处都有“org.apache.hadoop.mapred”,你应该使用“org.apache.hadoop.mapreduce”。我打赌这会让你更接近工作。

答案 1 :(得分:0)

你可以尝试一下这个,这个是新的API,非常容易理解。

使用新API

Wordcount example

如果您真的想坚持使用旧版API,请查看here 真正重要的是你使用Toolrunner.run(..)启动应用程序,我的猜测是你没有,因为你的代码在主函数中。