Question

我正在尝试使用以下命令压缩火花输出，但是我的输出未压缩，为什么知道？我这里没有完整的代码，但是输入了问题所需的内容。

           try (final JavaSparkContext context = new JavaSparkContext(sc)) 
                   {

            context.hadoopConfiguration().set("mapreduce.output.basename", prefix);
            context.hadoopConfiguration().set("mapreduce.output.fileoutputformat.compress.codec",
                    "com.hadoop.compression.lzo.LzopCodec");              
 uncompressed.coalesce(count).saveAsNewAPIHadoopFile(
                            output,
                            NullWritable.class,
                            Text.class,
                            TextOutputFormat.class,
                            context.hadoopConfiguration());

}

我在这里做错了什么？据我所知

context.hadoopConfiguration().set("mapreduce.output.fileoutputformat.compress.codec",
                    "com.hadoop.compression.lzo.LzopCodec");  should do the trick.

Answer 1

我缺少context.hadoopConfiguration（）。set（“ mapreduce.output.fileoutputformat.compress”，“ true”）;

现在可以使用。

压缩火花输出文件

1 个答案: