如何将输出文件名从reducer中的part-00000更改为inputfile name

时间:2014-12-15 16:30:35

标签: java hadoop2

目前,我可以在映射器中实现从part-00000到自定义文件名的名称更改。我是通过inputSplit来做到这一点的。我在reducer中尝试重命名文件但是,fileSplit方法不适用于reducer。那么,是否有一种最佳方法可以将reducer的输出重命名为inputfile name。以下是我在mapper中实现它的方法。

@Override
    public void setup(Context con) throws IOException, InterruptedException {
        fileName = ((FileSplit) con.getInputSplit()).getPath().getName();
        fileName = fileName.substring(0,36);
        outputName = new Text(fileName);  

        final Path baseOutputPath = FileOutputFormat.getOutputPath(con);
        final Path outputFilePath = new Path(baseOutputPath, fileName);
        TextOutputFormat<IntWritable, Text> write = new TextOutputFormat<IntWritable, Text>() {
        @Override
        public Path getDefaultWorkFile(TaskAttemptContext context, String extension) throws IOException {
        return outputFilePath;

1 个答案:

答案 0 :(得分:1)

这就是hadoop wiki所说的:

You can subclass the OutputFormat.java class and write your own. You can locate and browse the code of TextOutputFormat, MultipleOutputFormat.java, etc. for reference. It might be the case that you only need to do minor changes to any of the existing Output Format classes. To do that you can just subclass that class and override the methods you need to change. 

如果您需要使用密钥和输入文件格式,那么您可以创建MultipleOutputFormat的子类来控制输出文件名。