我正在尝试使用hadoop-streaming执行仅限映射器的作业。 基本上我有一个包含6个字段的csv输入文件,第6个字段是工资。我想按降序对工资文件进行排序。 shell脚本“task1.1_map.sh”(tr -d'$'| sort -t,-n -r -k6)
可以很好地工作输入文件:
DPT ,NAME ,ADDRESS ,TTL # ,PC ,SAL-RATE
868,B J SANDIFORD,DEPARTMENT OF CITYWIDE ADM,12702,X,$5.00
868,C A WIGFALL,DEPARTMENT OF CITYWIDE ADM,12702,X,$5.00
69,A E A-AWOSOGBA,HRA/DEPARTMENT OF SOCIAL S,52311,A,$51955.00
868,K D AABY,DEPARTMENT OF CITYWIDE ADM,10209,X,$12.00
56,I D AADIL,POLICE DEPARTMENT,71012,A,$46953.00
69,M AAKIRI,HRA/DEPARTMENT OF SOCIAL S,56056,A,$33000.00
464,A AALAI,CUNY QUEENSBOROUGH COMMUNI,4607,N,$73.53
998,A V AALEVIK,N.Y.C. TRANSIT AUTHORITY,402,2,$33280.00
998,M AAMIR,N.Y.C. TRANSIT AUTHORITY,00T07,4,$60878.00
输出文件:
notroot@ubuntu:~/lab/hdfs/datan1/current/subdir60$ hadoop fs -cat /output/Task1.3/part-00000
Warning: $HADOOP_HOME is deprecated.
464,A AALAI,CUNY QUEENSBOROUGH COMMUNI,4607,N,73.53
56,I D AADIL,POLICE DEPARTMENT,71012,A,46953.00
69,A E A-AWOSOGBA,HRA/DEPARTMENT OF SOCIAL S,52311,A,51955.00
69,M AAKIRI,HRA/DEPARTMENT OF SOCIAL S,56056,A,33000.00
868,B J SANDIFORD,DEPARTMENT OF CITYWIDE ADM,12702,X,5.00
868,C A WIGFALL,DEPARTMENT OF CITYWIDE ADM,12702,X,5.00
868,K D AABY,DEPARTMENT OF CITYWIDE ADM,10209,X,12.00
998,A V AALEVIK,N.Y.C. TRANSIT AUTHORITY,402,2,33280.00
998,M AAMIR,N.Y.C. TRANSIT AUTHORITY,00T07,4,60878.00
DPT ,NAME ,ADDRESS ,TTL # ,PC ,SAL-RATE
当我使用以下命令时,为什么它不在mapper中排序?
hadoop jar /home/notroot/lab/software/hadoop-1.0.3/contrib/streaming/hadoop-streaming-1.0.3.jar \
-D stream.map.input.field.separator=, \
-input /input/Civil_List_2016.csv -output /output/Task1.3 \
-mapper /home/notroot/lab/data/mapreduce_data_examples/jpmc_exercise/task1.1_map.sh