Does anyone know how to limit the max size of s3 output files (part-r-00000, part-r-00001 ... etc) from mrjob?
I'm compressing the output if that makes any difference using the following in my .mrjob.conf file:
jobconf:
mapred.output.compress: 'true'
mapred.output.compression.codec: org.apache.hadoop.io.compress.GzipCodec
Thanks in advance, Conor