为什么此MapReduce中的最终减少步骤会非常缓慢? (HiveQL,HDFS MapReduce)

时间:2018-06-27 08:40:20

标签: performance hive mapreduce hdfs reduce

一些背景信息:

我正在使用Dataiku DSS,HDFS和分区数据集。我有一个特定的作业正在运行(Hive查询),它有两个输入数据集-一个是非常大的分区数据集,另一个是小的(〜250行,2列)非分区数据集。我们叫分区表A和未分区表B。

问题:

查询具有以下格式,

SELECT a.f1, f2, ..., fn
  FROM A as a
  LEFT JOIN B as b
  ON a.f1 = b.f1
WHERE {PARTITION_FILTER}

这是MapReduce作业的当前输出(请记住该作业仍在运行):

[09:05:53] [INFO] [dku.utils]  - INFO  : Total jobs = 4
[09:05:53] [INFO] [dku.utils]  - INFO  : Starting task [Stage-10:CONDITIONAL] in serial mode
[09:05:53] [INFO] [dku.utils]  - INFO  : Stage-11 is filtered out by condition resolver.
[09:05:53] [INFO] [dku.utils]  - INFO  : Stage-1 is selected by condition resolver.
[09:05:53] [INFO] [dku.utils]  - INFO  : Launching Job 1 out of 4
[09:05:53] [INFO] [dku.utils]  - INFO  : Starting task [Stage-1:MAPRED] in serial mode
[09:05:53] [INFO] [dku.utils]  - INFO  : Number of reduce tasks not specified. Estimated from input data size: 307
[09:05:53] [INFO] [dku.utils]  - INFO  : In order to change the average load for a reducer (in bytes):
[09:05:53] [INFO] [dku.utils]  - INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
[09:05:53] [INFO] [dku.utils]  - INFO  : In order to limit the maximum number of reducers:
[09:05:53] [INFO] [dku.utils]  - INFO  :   set hive.exec.reducers.max=<number>
[09:05:53] [INFO] [dku.utils]  - INFO  : In order to set a constant number of reducers:
[09:05:53] [INFO] [dku.utils]  - INFO  :   set mapreduce.job.reduces=<number>
[09:05:53] [INFO] [dku.utils]  - INFO  : number of splits:75
[09:05:53] [INFO] [dku.utils]  - INFO  : Submitting tokens for job: job_1529508387832_10211
[09:05:53] [INFO] [dip.hiveserver2.log.sniffer]  - sniffed applicationId => application_1529508387832_10211/
[09:05:53] [INFO] [dku.utils]  - INFO  : Kill Command = /opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/lib/hadoop/bin/hadoop job  -kill job_1529508387832_10211
[09:05:53] [INFO] [dku.utils]  - INFO  : Hadoop job information for Stage-1: number of mappers: 75; number of reducers: 307
[09:05:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:05:47,749 Stage-1 map = 0%,  reduce = 0%
[09:06:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:06:48,444 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 278.18 sec
[09:06:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:06:51,682 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 373.35 sec
[09:07:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:00,159 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 501.46 sec
[09:07:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:02,235 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 539.6 sec
[09:07:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:07,472 Stage-1 map = 4%,  reduce = 0%, Cumulative CPU 1389.81 sec
[09:07:13] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:10,605 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 2172.52 sec
[09:07:13] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:12,574 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 2577.81 sec
[09:07:13] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:13,604 Stage-1 map = 8%,  reduce = 0%, Cumulative CPU 2865.44 sec
[09:07:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:15,747 Stage-1 map = 9%,  reduce = 0%, Cumulative CPU 3110.21 sec
[09:07:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:19,898 Stage-1 map = 10%,  reduce = 0%, Cumulative CPU 4080.2 sec
[09:07:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:21,988 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU 4522.48 sec
[09:07:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:23,015 Stage-1 map = 12%,  reduce = 0%, Cumulative CPU 4755.96 sec
[09:07:33] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:29,335 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 5710.85 sec
[09:07:33] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:31,407 Stage-1 map = 14%,  reduce = 0%, Cumulative CPU 5948.34 sec
[09:07:38] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:34,555 Stage-1 map = 15%,  reduce = 0%, Cumulative CPU 6399.6 sec
[09:07:38] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:37,663 Stage-1 map = 16%,  reduce = 0%, Cumulative CPU 6811.22 sec
[09:07:43] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:38,695 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 7087.68 sec
[09:07:43] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:39,729 Stage-1 map = 20%,  reduce = 0%, Cumulative CPU 7288.22 sec
[09:07:43] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:40,769 Stage-1 map = 22%,  reduce = 0%, Cumulative CPU 7520.54 sec
[09:07:43] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:41,903 Stage-1 map = 24%,  reduce = 0%, Cumulative CPU 7771.37 sec
[09:07:43] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:42,930 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 7936.9 sec
[09:07:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:45,035 Stage-1 map = 27%,  reduce = 0%, Cumulative CPU 8254.78 sec
[09:07:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:46,075 Stage-1 map = 29%,  reduce = 0%, Cumulative CPU 8428.35 sec
[09:07:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:47,111 Stage-1 map = 30%,  reduce = 0%, Cumulative CPU 8661.23 sec
[09:07:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:48,153 Stage-1 map = 31%,  reduce = 0%, Cumulative CPU 8834.37 sec
[09:07:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:49,193 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 8983.68 sec
[09:07:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:50,227 Stage-1 map = 35%,  reduce = 0%, Cumulative CPU 9149.94 sec
[09:07:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:51,263 Stage-1 map = 37%,  reduce = 0%, Cumulative CPU 9268.9 sec
[09:07:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:52,301 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 9415.86 sec
[09:07:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:53,352 Stage-1 map = 39%,  reduce = 0%, Cumulative CPU 9540.63 sec
[09:07:58] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:54,381 Stage-1 map = 41%,  reduce = 0%, Cumulative CPU 9711.54 sec
[09:07:58] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:55,421 Stage-1 map = 43%,  reduce = 0%, Cumulative CPU 9823.52 sec
[09:07:58] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:56,453 Stage-1 map = 46%,  reduce = 0%, Cumulative CPU 10010.83 sec
[09:07:58] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:57,492 Stage-1 map = 49%,  reduce = 0%, Cumulative CPU 10081.9 sec
[09:07:58] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:58,532 Stage-1 map = 53%,  reduce = 0%, Cumulative CPU 10230.13 sec
[09:08:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:59,576 Stage-1 map = 56%,  reduce = 0%, Cumulative CPU 10392.61 sec
[09:08:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:00,607 Stage-1 map = 58%,  reduce = 0%, Cumulative CPU 10483.38 sec
[09:08:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:01,649 Stage-1 map = 63%,  reduce = 0%, Cumulative CPU 10618.16 sec
[09:08:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:02,672 Stage-1 map = 66%,  reduce = 0%, Cumulative CPU 10684.82 sec
[09:08:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:03,695 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 10701.95 sec
[09:08:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:04,720 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 10767.21 sec
[09:08:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:05,750 Stage-1 map = 71%,  reduce = 0%, Cumulative CPU 10849.92 sec
[09:08:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:06,780 Stage-1 map = 73%,  reduce = 0%, Cumulative CPU 10924.45 sec
[09:08:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:07,902 Stage-1 map = 76%,  reduce = 0%, Cumulative CPU 11000.21 sec
[09:08:13] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:09,965 Stage-1 map = 77%,  reduce = 0%, Cumulative CPU 11013.58 sec
[09:08:13] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:10,991 Stage-1 map = 78%,  reduce = 0%, Cumulative CPU 11057.76 sec
[09:08:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:14,216 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 11157.51 sec
[09:08:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:17,362 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU 11392.85 sec
[09:08:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:20,460 Stage-1 map = 83%,  reduce = 0%, Cumulative CPU 11610.7 sec
[09:08:28] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:26,106 Stage-1 map = 84%,  reduce = 0%, Cumulative CPU 11781.65 sec
[09:08:28] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:28,163 Stage-1 map = 85%,  reduce = 0%, Cumulative CPU 11788.58 sec
[09:08:38] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:35,410 Stage-1 map = 86%,  reduce = 0%, Cumulative CPU 12167.24 sec
[09:08:38] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:38,540 Stage-1 map = 86%,  reduce = 1%, Cumulative CPU 12317.09 sec
[09:08:43] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:39,583 Stage-1 map = 86%,  reduce = 2%, Cumulative CPU 12329.8 sec
[09:08:43] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:40,631 Stage-1 map = 86%,  reduce = 3%, Cumulative CPU 12333.61 sec
[09:08:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:47,927 Stage-1 map = 87%,  reduce = 3%, Cumulative CPU 12651.77 sec
[09:08:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:48,970 Stage-1 map = 88%,  reduce = 4%, Cumulative CPU 12826.37 sec
[09:08:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:51,092 Stage-1 map = 88%,  reduce = 5%, Cumulative CPU 12857.19 sec
[09:08:58] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:54,214 Stage-1 map = 90%,  reduce = 5%, Cumulative CPU 13037.63 sec
[09:09:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:59,396 Stage-1 map = 91%,  reduce = 5%, Cumulative CPU 13117.71 sec
[09:09:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:00,440 Stage-1 map = 92%,  reduce = 5%, Cumulative CPU 13238.06 sec
[09:09:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:01,485 Stage-1 map = 93%,  reduce = 5%, Cumulative CPU 13249.8 sec
[09:09:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:05,660 Stage-1 map = 94%,  reduce = 5%, Cumulative CPU 13306.0 sec
[09:09:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:06,706 Stage-1 map = 97%,  reduce = 5%, Cumulative CPU 13393.5 sec
[09:09:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:07,751 Stage-1 map = 97%,  reduce = 6%, Cumulative CPU 13409.12 sec
[09:09:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:08,795 Stage-1 map = 98%,  reduce = 6%, Cumulative CPU 13433.07 sec
[09:09:13] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:09,835 Stage-1 map = 98%,  reduce = 8%, Cumulative CPU 13474.03 sec
[09:09:13] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:10,874 Stage-1 map = 98%,  reduce = 9%, Cumulative CPU 13484.64 sec
[09:09:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:14,004 Stage-1 map = 100%,  reduce = 11%, Cumulative CPU 13580.71 sec
[09:09:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:15,118 Stage-1 map = 100%,  reduce = 13%, Cumulative CPU 13619.15 sec
[09:09:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:16,160 Stage-1 map = 100%,  reduce = 16%, Cumulative CPU 13707.2 sec
[09:09:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:17,210 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 14456.75 sec
[09:09:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:18,258 Stage-1 map = 100%,  reduce = 39%, Cumulative CPU 14708.07 sec
[09:09:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:19,291 Stage-1 map = 100%,  reduce = 40%, Cumulative CPU 14768.29 sec
[09:09:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:20,329 Stage-1 map = 100%,  reduce = 41%, Cumulative CPU 14834.88 sec
[09:09:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:21,360 Stage-1 map = 100%,  reduce = 42%, Cumulative CPU 14902.4 sec
[09:09:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:22,399 Stage-1 map = 100%,  reduce = 45%, Cumulative CPU 15040.16 sec
[09:09:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:23,433 Stage-1 map = 100%,  reduce = 47%, Cumulative CPU 15165.58 sec
[09:09:28] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:28,627 Stage-1 map = 100%,  reduce = 63%, Cumulative CPU 15792.29 sec
[09:09:33] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:30,711 Stage-1 map = 100%,  reduce = 64%, Cumulative CPU 15889.21 sec
[09:09:33] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:31,753 Stage-1 map = 100%,  reduce = 65%, Cumulative CPU 15898.95 sec
[09:09:33] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:32,789 Stage-1 map = 100%,  reduce = 66%, Cumulative CPU 15927.26 sec
[09:09:33] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:33,822 Stage-1 map = 100%,  reduce = 71%, Cumulative CPU 16086.93 sec
[09:09:38] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:34,866 Stage-1 map = 100%,  reduce = 90%, Cumulative CPU 16711.13 sec
[09:09:38] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:35,907 Stage-1 map = 100%,  reduce = 93%, Cumulative CPU 16795.34 sec
[09:09:38] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:36,952 Stage-1 map = 100%,  reduce = 95%, Cumulative CPU 16881.47 sec
[09:09:38] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:37,995 Stage-1 map = 100%,  reduce = 96%, Cumulative CPU 16891.18 sec
[09:09:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:44,249 Stage-1 map = 100%,  reduce = 97%, Cumulative CPU 16958.21 sec
[09:09:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:45,292 Stage-1 map = 100%,  reduce = 98%, Cumulative CPU 17011.88 sec
[09:09:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:47,378 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 17055.07 sec
[09:10:49] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:10:47,421 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 17545.89 sec
[09:11:49] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:11:47,872 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 17764.45 sec
[09:12:49] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:12:48,287 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 18330.86 sec
[09:13:49] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:13:48,855 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 19232.58 sec
...
...
...
...
...
[10:22:20] [INFO] [dku.utils]  - INFO  : 2018-06-27 10:22:17,545 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 26857.28 sec
[10:23:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 10:23:17,654 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 26974.96 sec
[10:24:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 10:24:18,112 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 27081.0 sec
[10:25:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 10:25:18,964 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 27187.52 sec
[10:26:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 10:26:19,404 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 27287.58 sec

如您所见,还原阶段的最后1%花费了一段时间(到目前为止,总共1h32min过去了1h27min)。

我在查找资源时遇到了麻烦,这些资源可以清楚,简洁地说明为什么会这样。值得一提的是,我对MapReduce有一个非常基本的了解,但是目前正在研究HDFS上下文中的MapReduce。根据我的阅读,由于执行JOIN的性质,大部分的reduce任务可能固定在一个reducer上。​​

如果任何人都可以对正在发生的事情进行高层次的解释,请向我指出一些好的资源,并提供一些技巧,告诉我如何在将来的工作中通过更改Hive查询来避免这种情况,我们将不胜感激

1 个答案:

答案 0 :(得分:2)

如果最终的reducer是一个join,那么它看起来像是join key中的歪斜。 首先检查两件事:

检查b.f1连接密钥是否重复:

select b.f1, count(*) cnt from B b 
 group by b.f1 
having count(*)>1 order by cnt desc;

检查a.f1的分布:

select a.f1, count(*) cnt from A a
 group by a.f1  
order by cnt desc
limit 10;

此查询将显示倾斜的键。

如果存在偏斜(具有相同值的太多行),则分别连接偏斜键,请使用并集所有:

SELECT a.f1, f2, ..., fn
  FROM ( select * from A where f1 = skewed_value) as a --skewed
  LEFT JOIN B as b
  ON a.f1 = b.f1
WHERE {PARTITION_FILTER}
UNION ALL
SELECT a.f1, f2, ..., fn
  FROM ( select * from A where f1 != skewed_value) as a --all other
  LEFT JOIN B as b
  ON a.f1 = b.f1
WHERE {PARTITION_FILTER}

最后,如果不存在歪斜和重复的问题,请尝试增加reducer的并行性: 获取每个reducer配置的当前字节数

设置hive.exec.reducers.bytes.per.reducer;

通常,这将返回大约1G的值。尝试除以二,在查询之前设置新值,并检查它将启动多少个减速器和性能。成功的标准是更多的减速器已经开始使用并且性能得到了提高。

set hive.exec.reducers.bytes.per.reducer=67108864;

每个减速器的字节数越少,启动的减速器越多,从而增加了并行度;

更新:尝试启用map-join ,您的第二张表足够小以适合内存,mapjoin将完全不使用reducer,并且在reducer上倾斜也没有问题。

如何启用mapjoin:https://stackoverflow.com/a/49154414/2700344