I have a mapper, a combiner and a reducer. As I know, combiner comes before shuffle & sorting phase. But, in my case, the output from the mapper is coming sorted to the combiner.
hadoop jar hadoop_streeaming.jar \
-input some_folder \
-output some_folder \
-mapper mapper.py \
-combiner combine.py \
-file mapper.py \
-file combine.py
I want the results from Mapper comes unsorted to the Combiner.
For example:
I have this text:
mary
has
a
big
cat
this text is coming to the combiner in this form:
a
big
cat
has
mary
Bur, I don't want the output sorted before combiner.