我在使用Hadoop 2.7.3的Amazon EMR 5.5.2上运行地图减少Hadoop作业。
我最近将EMR升级到使用Hadoop 2.8.0的5.12.1。
对于相同的输入负载,我的新群集运行速度相当慢。
我无法找出原因。也许我需要调整一些性能参数。
以下是地图缩小作业计数器。看看这些计数器,任何人都可以对哪些性能参数出错有任何见解?
职位计数器
File System Counters
FILE: Number of bytes read=1087
FILE: Number of bytes written=24787084
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=15840
HDFS: Number of bytes written=0
HDFS: Number of read operations=132
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
S3N: Number of bytes read=0
S3N: Number of bytes written=4315
S3N: Number of read operations=0
S3N: Number of large read operations=0
S3N: Number of write operations=0
Job Counters
Launched map tasks=132
Launched reduce tasks=7
Other local map tasks=132
Total time spent by all maps in occupied slots (ms)=1576936320
Total time spent by all reduces in occupied slots (ms)=26894720
Total time spent by all map tasks (ms)=2463963
Total time spent by all reduce tasks (ms)=42023
Total vcore-milliseconds taken by all map tasks=2463963
Total vcore-milliseconds taken by all reduce tasks=42023
Total megabyte-milliseconds taken by all map tasks=50461962240
Total megabyte-milliseconds taken by all reduce tasks=860631040
Map-Reduce Framework
Map input records=12523
Map output records=2
Map output bytes=3236
Map output materialized bytes=15935
Input split bytes=15840
Combine input records=0
Combine output records=0
Reduce input groups=1
Reduce shuffle bytes=15935
Reduce input records=2
Reduce output records=8
Spilled Records=4
Shuffled Maps =924
Failed Shuffles=0
Merged Map outputs=924
GC time elapsed (ms)=64327
CPU time spent (ms)=2737480
Physical memory (bytes) snapshot=166237839360
Virtual memory (bytes) snapshot=2760473792512
Total committed heap usage (bytes)=187218526208