如何在Hadoop2中指定Hive查询的超级化?

时间:2014-06-06 23:54:47

标签: java hadoop

Hadoop 2中有一项名为 uberization 的新功能。例如,this reference说:

  

Uberization是运行MapReduce作业的所有任务的可能性   如果作业足够小,ApplicationMaster的JVM。这样,你   避免从ResourceManager请求容器的开销   并要求NodeManagers启动(假设很小)任务。

我无法分辨的是,这是否只是在幕后神奇地发生,还是需要为此发生一些事情?例如,在进行Hive查询时是否有设置(或提示)来实现此目的?你能指定“足够小”的门槛吗?

另外,我很难找到关于这个概念的东西 - 它是用另一个名字吗?

2 个答案:

答案 0 :(得分:4)

我在Arun Murthy的YARN Book中找到了关于"超级工作的详细信息":

  

当多个映射器和缩减器组合使用单个时,会发生Uber作业   容器。在Uber Jobs的配置中有四个核心设置   mapred-site.xml选项如表9.3所示。

这是表9.3:

|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.enable     | Whether to enable the small-jobs "ubertask" optimization,  |
|                                   | which runs "sufficiently small" jobs sequentially within a |
|                                   | single JVM. "Small" is defined by the maxmaps, maxreduces, |
|                                   | and maxbytes settings. Users may override this value.      |
|                                   | Default = false.                                           |
|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.maxmaps    | Threshold for the number of maps beyond which the job is   |
|                                   | considered too big for the ubertasking optimization.       |
|                                   | Users may override this value, but only downward.          |
|                                   | Default = 9.                                               |
|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.maxreduces | Threshold for the number of reduces beyond which           |
|                                   | the job is considered too big for the ubertasking          |
|                                   | optimization. Currently the code cannot support more       |
|                                   | than one reduce and will ignore larger values. (Zero is    |
|                                   | a valid maximum, however.) Users may override this         |
|                                   | value, but only downward.                                  |
|                                   | Default = 1.                                               |
|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.maxbytes   | Threshold for the number of input bytes beyond             |
|                                   | which the job is considered too big for the uber-          |
|                                   | tasking optimization. If no value is specified,            |
|                                   | `dfs.block.size` is used as a default. Be sure to          |
|                                   | specify a default value in `mapred-site.xml` if the        |
|                                   | underlying file system is not HDFS. Users may override     |
|                                   | this value, but only downward.                             |
|                                   | Default = HDFS block size.                                 |
|-----------------------------------+------------------------------------------------------------|

我还不知道是否有特定于Hive的方式设置此项,或者您是否只是将上述内容与Hive一起使用。

答案 1 :(得分:1)

当组合多个映射器和缩减器以在Application Master中执行时,会发生Uber作业。假设,要执行的作业具有 MAX Mappers< = 9; MAX Reducers< = 1 ,然后资源管理器(RM)创建一个Application Master,并使用其自己的JVM在Application Master中很好地执行作业。

SET mapreduce.job.ubertask.enable = TRUE;

因此,使用Uberised作业的优势在于,应用程序主机执行的往返开销,通过向资源管理器(RM)请求容器以及RM将容器分配给Application master来消除。