当我在本地设置上执行MapReduce作业时,我从reducer获得所需的输出,而EMR上的相同代码不会产生任何输出。我有1个主服务器和10个核心的集群设置。
这是输出。没有显示错误
Map-Reduce Framework
Map input records=3000
Map output records=378
Map output bytes=36054
Map output materialized bytes=40448
Input split bytes=1420
Combine input records=0
Combine output records=0
Reduce input groups=179
Reduce shuffle bytes=40448
Reduce input records=378
Reduce output records=0
Spilled Records=756
Shuffled Maps =380
Failed Shuffles=0
Merged Map outputs=380
GC time elapsed (ms)=23484
CPU time spent (ms)=125780
Physical memory (bytes) snapshot=9989242880
Virtual memory (bytes) snapshot=52768247808
Total committed heap usage (bytes)=6517702656
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=711180681
File Output Format Counters
Bytes Written=0
遵循reducer代码:
def reducer(self, key, val):
best = -60
best_name = None
lat = 0
longi = 0
yr = 0
genre = None
for hot, name,lat,longi,yr,genre in val:
if hot > best:
best = hot
best_name = name
lat = lat
longi = longi
yr = yr
genre = genre
yield (key,(best,best_name,lat,longi,yr,genre))