我在某处读到,如果我们在创建Mapper / Reducer时定义输出可写,并且在Mapper / Reducer中我们应该只设置可写的值,而不是为每个输出记录创建可写的。
例如(伪代码):
map(){
IntWritable idWritable = new IntWritable(outputValue);
emit(idWritable);
}
比以下更优秀:
{{1}}
这是真的吗?在创建Mapper / Reducer时定义输出可写入是否真的是一个好习惯,它将用于所有输出记录?
答案 0 :(得分:1)
Yes this is true. In your second example you're creating a brand new IntWritable
every time you process a record. This requires overhead for new memory allocation, and also means that the old IntWritable
has to be garbage collected at some point. If you're processing millions of records and using a complex Writable
(say with several ints
and Strings
), the heap can be filled very quickly.
Alternately, by just re-setting the value within the same object, no new memory needs to be allocated and no garbage collection needs to take place. It's much faster, but I can recommend doing your own experiments to confirm this.