hadoop流中的键值对

时间:2015-10-26 05:26:02

标签: hadoop hadoop-streaming

我正在尝试使用hadoop-streaming执行仅限映射器的作业。 基本上我有一个包含6个字段的csv输入文件,第6个字段是工资。我想按降序对工资文件进行排序。 shell脚本“task1.1_map.sh”(tr -d'$'| sort -t,-n -r -k6)

可以很好地工作

输入文件:

DPT     ,NAME    ,ADDRESS ,TTL #   ,PC      ,SAL-RATE  
868,B J  SANDIFORD,DEPARTMENT OF CITYWIDE ADM,12702,X,$5.00  
868,C A  WIGFALL,DEPARTMENT OF CITYWIDE ADM,12702,X,$5.00  
69,A E A-AWOSOGBA,HRA/DEPARTMENT OF SOCIAL S,52311,A,$51955.00  
868,K D AABY,DEPARTMENT OF CITYWIDE ADM,10209,X,$12.00  
56,I D AADIL,POLICE DEPARTMENT,71012,A,$46953.00  
69,M   AAKIRI,HRA/DEPARTMENT OF SOCIAL S,56056,A,$33000.00  
464,A   AALAI,CUNY QUEENSBOROUGH COMMUNI,4607,N,$73.53  
998,A V AALEVIK,N.Y.C. TRANSIT AUTHORITY,402,2,$33280.00  
998,M   AAMIR,N.Y.C. TRANSIT AUTHORITY,00T07,4,$60878.00  

输出文件:

notroot@ubuntu:~/lab/hdfs/datan1/current/subdir60$ hadoop fs -cat /output/Task1.3/part-00000
Warning: $HADOOP_HOME is deprecated.

464,A   AALAI,CUNY QUEENSBOROUGH COMMUNI,4607,N,73.53  
56,I D AADIL,POLICE DEPARTMENT,71012,A,46953.00  
69,A E A-AWOSOGBA,HRA/DEPARTMENT OF SOCIAL S,52311,A,51955.00  
69,M   AAKIRI,HRA/DEPARTMENT OF SOCIAL S,56056,A,33000.00  
868,B J  SANDIFORD,DEPARTMENT OF CITYWIDE ADM,12702,X,5.00  
868,C A  WIGFALL,DEPARTMENT OF CITYWIDE ADM,12702,X,5.00  
868,K D AABY,DEPARTMENT OF CITYWIDE ADM,10209,X,12.00  
998,A V AALEVIK,N.Y.C. TRANSIT AUTHORITY,402,2,33280.00  
998,M   AAMIR,N.Y.C. TRANSIT AUTHORITY,00T07,4,60878.00  
DPT     ,NAME    ,ADDRESS ,TTL #   ,PC      ,SAL-RATE  

当我使用以下命令时,为什么它不在mapper中排序?

hadoop jar /home/notroot/lab/software/hadoop-1.0.3/contrib/streaming/hadoop-streaming-1.0.3.jar \  
-D stream.map.input.field.separator=, \  
-input /input/Civil_List_2016.csv -output /output/Task1.3 \  
-mapper /home/notroot/lab/data/mapreduce_data_examples/jpmc_exercise/task1.1_map.sh

0 个答案:

没有答案