Question

在我的映射器代码中，我使用的是JTS.jar的第三方库。我需要将它放在hadoop的分布式缓存上，以便所有节点都可以访问它。我在this链接处发现-libjars可用于执行此操作。

我现在使用

执行我的代码

hadoop jar -libjars JTS.jar my_jar.jar classname inputFiles outputFiles。

但这不起作用。关于如何解决这个问题的任何建议？

Answer 1

尝试使用命令行参数的正确顺序。我认为错误信息很有启发性。

hadoop jar my_jar.jar classname -libjars JTS.jar inputFiles outputFiles

Answer 2

在不同的努力中，我尝试关注this链接。

1）我使用以下方法将Jar库复制到hadoop：

hadoop fs -copyFromLocal JTS.jar /someHadoopFolder/JTS.jar

2）然后我修改了我的配置如下：

        Configuration conf = new Configuration();

    Job job = new Job(conf);
    job.setJobName("TEST JOB");

    List<String> other_args = parseArguments(args, job);

    DistributedCache.addFileToClassPath(new Path("/someHadoopFolder/JTS.jar"), conf);

    job.setMapOutputKeyClass(LongWritable.class);
    job.setMapOutputValueClass(Text.class);

    job.setOutputKeyClass(LongWritable.class);
    job.setOutputValueClass(Text.class);

    job.setMapperClass(myMapper.class);
    //job.setCombinerClass(myReducer.class);
    //job.setReducerClass(myReducer.class);

    job.setInputFormatClass(TextInputFormat.class);   
    job.setOutputFormatClass(TextOutputFormat.class);


    String inPath = other_args.get(0);
    String outPath = other_args.get(1);     
    TextInputFormat.setInputPaths(job, inPath);
    TextOutputFormat.setOutputPath(job, new Path(outPath));

    TextInputFormat.setMinInputSplitSize(job, 32 * MEGABYTES);
    TextInputFormat.setMaxInputSplitSize(job, 32 * MEGABYTES);

    job.setJarByClass(myFile.class);

    job.waitForCompletion(true);

3）然后教程说“在mapper中使用缓存的文件”，所以我的mapper看起来像这样：

    public static class myMapper extends Mapper<LongWritable, Text, LongWritable, Text>{
       private Path[] localArchives;
       private Path[] localFiles;

       public void configure(Configuration conf) throws IOException {
         localArchives = DistributedCache.getLocalCacheArchives(conf);
         localFiles = DistributedCache.getLocalCacheFiles(conf);
       }



    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{
            //ENVELOPE IS FROM THE JTS.JAR library
        Envelope e1 = new Envelope(-180, 85, 180, -85);
            context.write(key, value);

    }

}

尽管做了所有这些，但代码仍然通过抛出“Class bout found”而失败。有什么帮助吗？

Answer 3

我想我迟到了，这样做的一种方法是将jar文件复制到hadoop的安装文件夹下。至于，我在/ usr / local / hadoop / share / hadoop / common中完成了XXX.jars（第三方jar），然后将这些文件添加为外部jar文件。

这解决了我的问题，如果你不想这样做，另一种方式是在导出HADOOP_CLASSPATH = / XXX / example.jar中包含外部jar文件的目录/文件路径：...

在我的Map-Reduce作业中包含第三方库（使用分布式缓存）

3 个答案: