mrjob limit filesize for output files

时间:2016-07-11 22:15:24

标签: python mapreduce emr mrjob

Does anyone know how to limit the max size of s3 output files (part-r-00000, part-r-00001 ... etc) from mrjob?

I'm compressing the output if that makes any difference using the following in my .mrjob.conf file:

jobconf:
 mapred.output.compress: 'true'
 mapred.output.compression.codec: org.apache.hadoop.io.compress.GzipCodec

Thanks in advance, Conor

0 个答案:

没有答案