所以我编写了两个Perl脚本来练习Map Reduce。该程序应该计算我放在目录中的一堆文本文件中的所有单词。
这是我的mapper.pl
#!/usr/bin/perl
use 5.010;
use strict;
use warnings;
while(my $line = <>) {
my @words = split(' ', $line);
foreach my $word(@words) {
print "$word \t 1\n";
}
}
这是我的reducer.pl
#!/bin/usr/perl
use 5.010;
use warnings;
my $currentWord = "";
my $currentCount = 0;
##Use this block for testing the reduce script with some test data.
#Open the test file
#open(my $fh, "<", "testdata.txt");
#while(!eof $fh) {}
while(my $line = <>) {
#Remove the \n
chomp $line;
#Index 0 is the word, index 1 is the count value
my @lineData = split('\t', $line);
my $word = $lineData[0];
my $count = $lineData[1];
if($currentWord eq $word) {
$currentCount = $currentCount + $count;
} else {
if($currentWord ne "") {
#Output the key we're finished working with
print "$currentWord \t $currentCount \n";
}
#Switch the current variables over to the next key
$currentCount = $count;
$currentWord = $word;
}
}
#deal with the last loop
print "$currentWord \t $currentCount \n";
所以当我使用hadoop streaming命令运行它们时:
bin/hadoop jar contrib/streaming/hadoop-streaming-1.1.2.jar -file /home/hduser/countWords/mapper.pl -mapper /home/hduser/countWords/mapper.pl -file /home/hduser/countWords/reducer.pl -reducer /home/hduser/countWords/reducer.pl -input /user/hduser/testData/* -output /user/hduser/testData/output/*
我收到以下错误:
13/07/19 11:36:33 INFO streaming.StreamJob: map 0% reduce 0%
13/07/19 11:36:39 INFO streaming.StreamJob: map 9% reduce 0%
13/07/19 11:36:40 INFO streaming.StreamJob: map 64% reduce 0%
13/07/19 11:36:41 INFO streaming.StreamJob: map 73% reduce 0%
13/07/19 11:36:44 INFO streaming.StreamJob: map 82% reduce 0%
13/07/19 11:36:45 INFO streaming.StreamJob: map 100% reduce 0%
13/07/19 11:36:49 INFO streaming.StreamJob: map 100% reduce 11%
13/07/19 11:36:53 INFO streaming.StreamJob: map 100% reduce 0%
13/07/19 11:37:02 INFO streaming.StreamJob: map 100% reduce 17%
13/07/19 11:37:03 INFO streaming.StreamJob: map 100% reduce 33%
13/07/19 11:37:06 INFO streaming.StreamJob: map 100% reduce 17%
13/07/19 11:37:08 INFO streaming.StreamJob: map 100% reduce 0%
13/07/19 11:37:16 INFO streaming.StreamJob: map 100% reduce 33%
13/07/19 11:37:21 INFO streaming.StreamJob: map 100% reduce 0%
13/07/19 11:37:31 INFO streaming.StreamJob: map 100% reduce 33%
13/07/19 11:37:35 INFO streaming.StreamJob: map 100% reduce 17%
13/07/19 11:37:38 INFO streaming.StreamJob: map 100% reduce 100%
13/07/19 11:37:38 INFO streaming.StreamJob: To kill this job, run:
13/07/19 11:37:38 INFO streaming.StreamJob: /usr/local/hadoop/libexec/../bin/hadoop job -Dmapred.job.tracker=shiv0:54311 -kill job_201307031312_0065
13/07/19 11:37:38 INFO streaming.StreamJob: Tracking URL: http://shiv0:50030/jobdetails.jsp?jobid=job_201307031312_0065
13/07/19 11:37:38 ERROR streaming.StreamJob: Job not successful. Error: # of failed Reduce Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201307031312_0065_r_000001
13/07/19 11:37:38 INFO streaming.StreamJob: killJob... Streaming Command Failed!
我一直想弄清楚我现在做错了一段时间,而且我一直在摸不着头脑。有人对我如何诊断这个有任何建议吗?
答案 0 :(得分:0)
bin / hadoop jar contrib / streaming / hadoop-streaming-1.1.2.jar -file /home/hduser/countWords/mapper.py -mapper /home/hduser/countWords/mapper.py -file / home / hduser /countWords/reducer.py -reducer /home/hduser/countWords/reducer.py -input / user / hduser / testData / * -output / user / hduser / testData / output / *
你为什么要调用.py文件?你不应该调用perl文件,即reducer.pl而不是reducer.py
答案 1 :(得分:0)
我的极其愚蠢的错误.. reducer.pl的shbang行不正确。我有
#!/bin/usr/perl
而不是
#!/usr/bin/perl