Question

我正在尝试使用Hadoop Mapreduce实现“查找相似项目”。

以下是我的驱动程序代码（用于带状排列）

    Path inputPath = new Path(args[0]);
    Path outputPath = new Path(args[1]);
    Configuration shinglingConf = new Configuration();
    Job jobShingling = new Job(shinglingConf, "Shingling");
    jobShingling.setJarByClass(FindingSimilarItems.class);
    //jobShingling.setMapperClass(Map.class);
    jobShingling.setReducerClass(Reduce.class);
    jobShingling.setOutputKeyClass(TextInputFormat.class);
    jobShingling.setOutputValueClass(TextInputFormat.class);

    for(int i = 1; i <= 101; i++) {
        if(i < 10) MultipleInputs.addInputPath(jobShingling, new Path(args[0] + "/00" + Integer.toString(i) + ".txt"), TextInputFormat.class, Map.class);
        else if(i >= 10 && i < 100) 
            MultipleInputs.addInputPath(jobShingling, new Path(args[0] + "/0" + Integer.toString(i) + ".txt"), TextInputFormat.class, Map.class);
        else MultipleInputs.addInputPath(jobShingling, new Path(args[0] + "/" + Integer.toString(i) + ".txt"), TextInputFormat.class, Map.class);
    }
    FileOutputFormat.setOutputPath(jobShingling, outputPath);
    jobShingling.waitForCompletion(true);

我知道了

Error: java.io.IOException: Initialization of all the collectors failed. Error in last collector was :class org.apache.hadoop.mapreduce.lib.input.TextInputFormat
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:415)
at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:698)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: java.lang.ClassCastException: class org.apache.hadoop.mapreduce.lib.input.TextInputFormat
at java.lang.Class.asSubclass(Class.java:3404)
at org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:881)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1004)
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402)
... 9 more

有解决这个问题的主意吗？

谢谢！

Hadoop Mapreduce错误：java.io.IOException：所有收集器的初始化失败

0 个答案: