使用hive自定义outputformat来处理日志文件

时间:2012-07-23 09:08:45

标签: hadoop hive output-formatting

我想使用hive版本0.7.0处理日志文件,我设置了custem inputformat和outputformat。在inputformat中,我将“\ n”替换为“@#@”,并在outputformat中我想更改回“\ n”。测试后我的inputformat运行良好,但我的outputformat不起作用。我想知道为什么。这是代码。谢谢!

public class ErrlogOutputFormat, V extends Writable>
    extends HiveIgnoreKeyTextOutputFormat {

    public static class CustomRecordWriter implements RecordWriter{

        RecordWriter writer;
        BytesWritable bytesWritable;

        public CustomRecordWriter(RecordWriter writer) {
            this.writer = writer;
            bytesWritable = new BytesWritable();
        }

        @Override
        public void write(Writable w) throws IOException {
            //String str = ((Text) w).toString().replaceAll("@#@","\n");
            String[] str = ((Text) w).toString().split("@#@");
            StringBuffer sb = new StringBuffer();
            for(String s:str){
                sb.append(s).append("\n");
            }
            Text txtReplace = new Text(sb.toString());
            System.out.println("------------------------");
            System.out.println(txtReplace.toString());
            System.out.println("------------------------");

            // Get input data
            // Encode
            byte[] output = txtReplace.getBytes();

            bytesWritable.set(output, 0, output.length);

            writer.write(bytesWritable);
        }

        @Override
        public void close(boolean abort) throws IOException {
            writer.close(abort);
        }

    }

    @Override
    public RecordWriter getHiveRecordWriter(JobConf jc, Path finalOutPath,
            Class valueClass, boolean isCompressed,
            Properties tableProperties, Progressable progress)
            throws IOException {

        CustomRecordWriter writer = new CustomRecordWriter(super
                .getHiveRecordWriter(jc, finalOutPath, BytesWritable.class,
                        isCompressed, tableProperties, progress));

        return writer;
    }
}

0 个答案:

没有答案