Hadoop 2中有一项名为 uberization 的新功能。例如,this reference说:
Uberization是运行MapReduce作业的所有任务的可能性 如果作业足够小,ApplicationMaster的JVM。这样,你 避免从ResourceManager请求容器的开销 并要求NodeManagers启动(假设很小)任务。
我无法分辨的是,这是否只是在幕后神奇地发生,还是需要为此发生一些事情?例如,在进行Hive查询时是否有设置(或提示)来实现此目的?你能指定“足够小”的门槛吗?
另外,我很难找到关于这个概念的东西 - 它是用另一个名字吗?
答案 0 :(得分:4)
我在Arun Murthy的YARN Book中找到了关于"超级工作的详细信息":
当多个映射器和缩减器组合使用单个时,会发生Uber作业 容器。在Uber Jobs的配置中有四个核心设置 mapred-site.xml选项如表9.3所示。
这是表9.3:
|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.enable | Whether to enable the small-jobs "ubertask" optimization, |
| | which runs "sufficiently small" jobs sequentially within a |
| | single JVM. "Small" is defined by the maxmaps, maxreduces, |
| | and maxbytes settings. Users may override this value. |
| | Default = false. |
|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.maxmaps | Threshold for the number of maps beyond which the job is |
| | considered too big for the ubertasking optimization. |
| | Users may override this value, but only downward. |
| | Default = 9. |
|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.maxreduces | Threshold for the number of reduces beyond which |
| | the job is considered too big for the ubertasking |
| | optimization. Currently the code cannot support more |
| | than one reduce and will ignore larger values. (Zero is |
| | a valid maximum, however.) Users may override this |
| | value, but only downward. |
| | Default = 1. |
|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.maxbytes | Threshold for the number of input bytes beyond |
| | which the job is considered too big for the uber- |
| | tasking optimization. If no value is specified, |
| | `dfs.block.size` is used as a default. Be sure to |
| | specify a default value in `mapred-site.xml` if the |
| | underlying file system is not HDFS. Users may override |
| | this value, but only downward. |
| | Default = HDFS block size. |
|-----------------------------------+------------------------------------------------------------|
我还不知道是否有特定于Hive的方式设置此项,或者您是否只是将上述内容与Hive一起使用。
答案 1 :(得分:1)
当组合多个映射器和缩减器以在Application Master中执行时,会发生Uber作业。假设,要执行的作业具有 MAX Mappers< = 9; MAX Reducers< = 1 ,然后资源管理器(RM)创建一个Application Master,并使用其自己的JVM在Application Master中很好地执行作业。
SET mapreduce.job.ubertask.enable = TRUE;
因此,使用Uberised作业的优势在于,应用程序主机执行的往返开销,通过向资源管理器(RM)请求容器以及RM将容器分配给Application master来消除。