我们看到一个很慢的复制阶段:
reduce > copy task(attempt_1559832449421_0209_m_000006_0 succeeded at 0.03 MB/s) Aggregated copy rate(21 of 22 at 0.54 MB/s)
reducer的日志包含
2019-04-01 19:14:46,919 WARN [fetcher#10] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to shuffle for fetcher#10 org.apache.hadoop.fs.ChecksumException: Checksum Error at org.apache.hadoop.mapred.IFileInputStream.doRead(IFileInputStream.java:212) at org.apache.hadoop.mapred.IFileInputStream.readWithChecksum(IFileInputStream.java:189) at org.apache.hadoop.mapreduce.task.reduce.OnDiskMapOutput.shuffle(OnDiskMapOutput.java:103) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:562) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:348) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:198) 2019-04-01 19:14:46,920 WARN [fetcher#10] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to shuffle output of attempt_1559832449421_0209_m_000010_0 from phdp100.g:13562 java.io.IOException: org.apache.hadoop.fs.ChecksumException: Checksum Error at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:566) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:348) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:198) Caused by: org.apache.hadoop.fs.ChecksumException: Checksum Error at org.apache.hadoop.mapred.IFileInputStream.doRead(IFileInputStream.java:212) at org.apache.hadoop.mapred.IFileInputStream.readWithChecksum(IFileInputStream.java:189) at org.apache.hadoop.mapreduce.task.reduce.OnDiskMapOutput.shuffle(OnDiskMapOutput.java:103) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:562) ... 2 more 2019-04-01 19:14:46,920 WARN [fetcher#10] org.apache.hadoop.mapreduce.task.reduce.Fetcher: copyMapOutput failed for tasks [attempt_1559832449421_0209_m_000010_0] 2019-04-01 19:14:46,921 INFO [fetcher#10] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Reporting fetch failure for attempt_1559832449421_0209_m_000010_0 to MRAppMaster. 2019-04-01 19:14:46,921 INFO [fetcher#10] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: phdp100.g:13562 freed by fetcher#10 in 118716ms 2019-04-01 19:16:36,779 INFO [fetcher#10] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: attempt_1559832449421_0209_m_000010_0: Shuffling to disk since 367993615 is greater than maxSingleShuffleLimit (58274612) 2019-04-01 19:16:36,830 INFO [fetcher#10] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#10 about to shuffle output of map attempt_1559832449421_0209_m_000010_0 decomp: 367993615 len: 27373924 to DISK
我们的cdh为5.16。可能是什么问题?