即使在使用纱线运行时存在mapper,Map-reduce作业也会给出ClassNotFound异常吗?

时间:2016-03-24 01:52:07

标签: hadoop mapreduce

我正在运行一个hadoop作业,当我在伪分布模式下运行没有纱线的情况下工作正常,但是在使用纱线运行时它给了我类找不到异常

16/03/24 01:43:40 INFO mapreduce.Job: Task Id : attempt_1458775953882_0002_m_000003_1, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.hadoop.keyword.count.ItemMapper not found
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
    at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class com.hadoop.keyword.count.ItemMapper not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
    ... 8 more

以下是job

的源代码
Configuration conf = new Configuration();
conf.set("keywords", args[2]);

Job job = Job.getInstance(conf, "item count");
job.setJarByClass(ItemImpl.class);
job.setMapperClass(ItemMapper.class);
job.setReducerClass(ItemReducer.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);

这是我正在运行的命令

hadoop jar ~/itemcount.jar /user/rohit/tweets /home/rohit/outputs/23mar-yarn13 vodka,wine,whisky

建议后编辑代码

package com.hadoop.keyword.count;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Mapper.Context;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.json.simple.JSONObject;
import org.json.simple.parser.JSONParser;
import org.json.simple.parser.ParseException;

public class ItemImpl {

    public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration();
        conf.set("keywords", args[2]);

        Job job = Job.getInstance(conf, "item count");
        job.setJarByClass(ItemImpl.class);
        job.setMapperClass(ItemMapper.class);
        job.setReducerClass(ItemReducer.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }


    public static class ItemMapper extends Mapper<Object, Text, Text, IntWritable> {

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        JSONParser parser = new JSONParser();

        @Override
        public void map(Object key, Text value, Context output) throws IOException,
                InterruptedException {

            JSONObject tweetObject = null;

            String[] keywords = this.getKeyWords(output);

            try {
                tweetObject = (JSONObject) parser.parse(value.toString());
            } catch (ParseException e) {
                e.printStackTrace();
            }
            if (tweetObject != null) {
                String tweetText = (String) tweetObject.get("text");

                if(tweetText == null){
                    return;
                }

                tweetText = tweetText.toLowerCase();
    /*          StringTokenizer st = new StringTokenizer(tweetText);

                ArrayList<String> tokens = new ArrayList<String>();

                while (st.hasMoreTokens()) {
                    tokens.add(st.nextToken());
                }*/

                for (String keyword : keywords) {
                    keyword = keyword.toLowerCase();
                    if (tweetText.contains(keyword)) {
                        output.write(new Text(keyword), one);
                    }
                }
                output.write(new Text("count"), one);
            }

        }

        String[] getKeyWords(Mapper<Object, Text, Text, IntWritable>.Context context) {

            Configuration conf = (Configuration) context.getConfiguration();
            String param = conf.get("keywords");

            return param.split(",");

        }
    }

    public static class ItemReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

        @Override
        protected void reduce(Text key, Iterable<IntWritable> values, Context output)
                throws IOException, InterruptedException {

            int wordCount = 0;

            for (IntWritable value : values) {
                wordCount += value.get();
            }

            output.write(key, new IntWritable(wordCount));
        }
    }
}

3 个答案:

答案 0 :(得分:3)

以完全分布式模式运行TaskTracker / NodeManager(运行映射器的东西)在一个单独的JVM中运行,听起来你的类没有进入JVM的类路径。

尝试在作业调用中使用-libjars <csv,list,of,jars>命令行arg。这将使Hadoop将jar分发到TaskTracker JVM并从该jar加载您的类。 (注意,这会将jar复制到群集中的每个节点,并使其仅适用于该特定作业。如果您有大量工作需要调用的公共库,那么您需要查看使用Hadoop分布式缓存。)

您可能还想在yarn -jar ...启动工作时尝试hadoop -jar ...,因为这是启动纱线作业的新/首选方式。

答案 1 :(得分:0)

您可以查看itemcount.jar的内容吗?(jar -tvf itemcount.jar)。我曾经遇到过这个问题,只是发现罐子里没有.class。

答案 2 :(得分:0)

我几天前也遇到了同样的错误。

  • 更改地图并将类减少为静态修复我的问题。
  • 制作地图并减少类内部类。
  • 控制map和reduce类的构造函数(i / o值和覆盖语句)
  • 检查你的jar命令

旧的

hadoop jar~ / itemcount.jar / user / rohit / tweets / home / rohit / outputs / 23mar-yarn13伏特加,葡萄酒,威士忌

hadoop jar~ / itemcount.jar com.hadoop.keyword.count.ItemImpl / user / rohit / tweets / home / rohit / outputs / 23mar-yarn13伏特加,葡萄酒,威士忌

  • 在指定.jar文件
  • 后添加packageName。 mainclass

<强>的try-catch

try {
         tweetObject = (JSONObject) parser.parse(value.toString());
         } catch (Exception e) { **// Change ParseException to Exception if you don't only expect Parse error**
          e.printStackTrace();
         return; **// return from function in case of any error**
            }
}

扩展已配置并实施工具

public class ItemImpl extends Configured implements Tool{
public static void main (String[] args) throws Exception{
    int res =ToolRunner.run(new ItemImpl(), args);
    System.exit(res);
        }

    @Override
    public int run(String[] args) throws Exception { 

        Job job=Job.getInstance(getConf(),"ItemImpl ");
        job.setJarByClass(this.getClass());

        job.setJarByClass(ItemImpl.class);
        job.setMapperClass(ItemMapper.class);
        job.setReducerClass(ItemReducer.class);
        job.setMapOutputKeyClass(Text.class);//probably not essential but make it certain and clear
        job.setMapOutputValueClass(IntWritable.class); //probably not essential but make it certain and clear
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
 add public static map
 add public static reduce
 I'm not an expert about this topic but This  implementation is from one of my working projects. Try this if doesn't work for you I would suggest you check the libraries you added to your project.

可能第一步会解决它但是 如果这些步骤不起作用,请与我们分享代码。