在Hadoop上运行Perl脚本

时间:2015-04-13 08:07:02

标签: perl hadoop mapreduce hadoop-streaming

我有一个生成数据的Perl脚本。

#! /usr/bin/perl -w
for ($i=0; $i<3; $i++)
{
        $str1 = "";
        $str2 = "";
        $str3 = "";
        $str4 = "";
        $str5 = "";
        $str6 = "";
        $str7 = "";
        $str8 = "";
        $str9 = "";
        @chars=('a'..'z','A'..'Z','_');
    for(1..10){
        $str1.=$chars[rand @chars];
        $str2 =$chars[rand @chars];
        $str3 =$chars[rand @chars];
        $str4 =$chars[rand @chars];
        $str5 =$chars[rand @chars];
        $str6 =$chars[rand @chars];
        $str7 =$chars[rand @chars];
        $str8 =$chars[rand @chars];
        $str9 =$chars[rand @chars];
    }
        print "$i:str1:str2:str3:str4:str5:str6:str7:str8:str9\n";
}

当我使用Hadoop Streaming运行脚本时,如下所示:

#!/usr/bin
hadoop fs -rm -R /user/oracle/output
echo "Start time :" `date` >> run_time_perl_hadoop.log
hadoop jar /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop-0.20-        mapreduce/contrib/streaming/hadoop-streaming.jar \
-D mapreduce.map.tasks=1 \
-input /user/oracle/perl_test/data_generator_hadoop_tarun.pl \
-output /user/oracle/output \
-mapper data_generator_hadoop_tarun.pl \
-file data_generator_hadoop_tarun.pl


echo "End time :" `date` >> run_time_perl_hadoop.log

它生成6行而不是3行。

知道为什么吗?

0 个答案:

没有答案