如何让hive同时运行mapreduce作业?

时间:2012-01-15 07:24:04

标签: hadoop mapreduce hive

我是hive的新手,我遇到了一个问题,

我在这样的蜂巢中有一张桌子:

create table td(id int, time string, ip string, v1 bigint, v2 int, v3 int,
v4 int, v5 bigint, v6 int)  PARTITIONED BY(dt STRING)
ROW FORMAT DELIMITED FIELDS
TERMINATED BY ','  lines TERMINATED BY '\n' ;  

我运行了一个类似的SQL:

from td
INSERT OVERWRITE  DIRECTORY '/tmp/total.out' select count(v1)
INSERT OVERWRITE  DIRECTORY '/tmp/totaldistinct.out' select count(distinct v1)
INSERT OVERWRITE  DIRECTORY '/tmp/distinctuin.out' select distinct v1

INSERT OVERWRITE  DIRECTORY '/tmp/v4.out' select v4 , count(v1), count(distinct v1) group by v4
INSERT OVERWRITE  DIRECTORY '/tmp/v3v4.out' select v3, v4 , count(v1), count(distinct v1) group by v3, v4

INSERT OVERWRITE  DIRECTORY '/tmp/v426.out' select count(v1), count(distinct v1)  where v4=2 or v4=6
INSERT OVERWRITE  DIRECTORY '/tmp/v3v426.out' select v3, count(v1), count(distinct v1) where v4=2 or v4=6 group by v3

INSERT OVERWRITE  DIRECTORY '/tmp/v415.out' select count(v1), count(distinct v1)  where v4=1 or v4=5
INSERT OVERWRITE  DIRECTORY '/tmp/v3v415.out' select v3, count(v1), count(distinct v1) where v4=1 or v4=5 group by v3

它有效,输出结果就是我想要的。

但是有一个问题,hive会生成9个mapreduce作业并逐个运行这些作业。

我对此查询运行解释,并收到以下消息:

STAGE DEPENDENCIES:
  Stage-9 is a root stage
  Stage-0 depends on stages: Stage-9
  Stage-10 depends on stages: Stage-9
  Stage-1 depends on stages: Stage-10
  Stage-11 depends on stages: Stage-9
  Stage-2 depends on stages: Stage-11
  Stage-12 depends on stages: Stage-9
  Stage-3 depends on stages: Stage-12
  Stage-13 depends on stages: Stage-9
  Stage-4 depends on stages: Stage-13
  Stage-14 depends on stages: Stage-9
  Stage-5 depends on stages: Stage-14
  Stage-15 depends on stages: Stage-9
  Stage-6 depends on stages: Stage-15
  Stage-16 depends on stages: Stage-9
  Stage-7 depends on stages: Stage-16
  Stage-17 depends on stages: Stage-9
  Stage-8 depends on stages: Stage-17
似乎第9-17阶段对应于mapreduce作业0-8
但是从上面的解释信息来看,阶段10-17仅取决于阶段9,
所以我有一个问题,为什么工作1-8不能同时运行?

或者如何让作业1-8同时运行?

非常感谢你的帮助!

1 个答案:

答案 0 :(得分:5)

在hive-default.xml中,有一个名为“hive.exec.parallel”的属性,它可以并行执行job。默认值为“false”。您可以将其更改为“true”以获得此功能。您可以使用另一个属性“hive.exec.parallel.thread.number”来控制最多可以并行执行的作业数。

有关详情:https://issues.apache.org/jira/browse/HIVE-549