Mapper不会发出任何东西

时间:2015-02-19 22:17:16

标签: hadoop mapreduce tesseract hadoop2

我有一个mapreduce作业,它接受我之前构建的序列文件。序列文件具有图像文件名作为键,图像的字节表示为值。我的映射器应该采用每个图像,然后使用基于Tess4J的名为Tesseract的图像到文本库来处理它们。映射器运行并且不会抛出任何异常,但令人惊讶的是输出文件夹是空的并且没有生成文件。这是我的映射器代码:

import java.awt.image.BufferedImage;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import javax.imageio.ImageIO;
import net.sourceforge.tess4j.*;
import org.apache.commons.io.output.ByteArrayOutputStream;
import org.apache.hadoop.io.ByteWritable;
import org.apache.hadoop.io.BytesWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

 public class testM extends Mapper<Text, BytesWritable, Text, Text> {

  public void map(Text ikey, BytesWritable ivalue, Context context) throws IOException, InterruptedException {

    //Read Current Image from File.
    BufferedImage img = ImageIO.read(new ByteArrayInputStream(ivalue.getBytes()));
    Tesseract instance = Tesseract.getInstance();           

    try {
        String text = instance.doOCR(img);          
        context.write(new Text("fff"), new Text("fff"));    

    } catch (TesseractException e) {

        context.write(new Text("fff"), new Text("fff"));            
        e.printStackTrace();
    }
    //String result = instance.doOCR(img);  

}
}

这是驱动程序代码

 public static void main(String[] args) throws Exception {

    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "Image2Text");
    job.setJarByClass(driver.class);
    job.setMapperClass(testM.class);

    // TODO: specify a reducer
    job.setReducerClass(Reducer.class);

    // TODO: specify output types
    //job.setOutputKeyClass(Text.class);
    //job.setOutputValueClass(Text.class);
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(Text.class);

    //input format
    job.setInputFormatClass(SequenceFileInputFormat.class);

    // TODO: specify input and output DIRECTORIES (not files)
    FileInputFormat.setInputPaths(job, new Path("inSeq"));
    FileOutputFormat.setOutputPath(job, new Path("out"));

    if (!job.waitForCompletion(true))
        return;
}

我尝试输出“fff”只是为了确保映射器正常工作,但正如我所说它不输出任何东西。如果我删除行String text = instance.doOCR(img);一切正常。我检查了我的序列文件的内容并查看了img的值,两者看起来都很好。有谁知道问题是什么?

0 个答案:

没有答案