Question

我有一个简单的工作流程，它将mapreduce作业作为shell动作执行。提交作业后，其状态变为“正在运行”并保持不变，但永远不会结束。 mapreduce集群显示有两个作业在运行，一个属于shell应用程序启动器，另一个属于实际的mapreduce作业。但是，mapreduce作业的那个显示为UNASSIGNED，进度为零（这意味着它已经启动）。

有趣的是，当我杀死oozie作业时，mapreduce作业实际上开始运行并成功完成。看起来shell启动器正在阻止它。

P.S。这是一个简单的工作流程，没有可能导致它等待的开始或结束日期。

Answer 1

当作业处于“UNASSIGNED”状态时，通常意味着资源管理器（RM）无法为作业分配容器。检查用户和队列的容量配置。给予他们更多的能力应该有所帮助。

使用Hadoop 2.7和容量调度程序时，需要检查以下属性：

import Orange
iris = Orange.data.Table("iris")
# Add some weights to the iris dataset
weight = Orange.feature.Continuous("weight")
weight_id = -10
iris.domain.add_meta(weight_id, weight)
iris.add_meta_attribute(weight, 1.0)
for i in range(50, 150):
     iris[i][weight] = 10

# Train a tree classifier on weighted data.
clsf = Orange.classification.tree.TreeLearner(iris, weight_id)

# Evaluate learner performance on weighted data
results = Orange.evaluation.testing.cross_validation(
    [Orange.classification.tree.TreeLearner,
     Orange.classification.bayes.NaiveLearner],
    (iris, weight_id)  # Note how you pass the weight id to testing functions
)
auc = Orange.evaluation.scoring.AUC(results)
ca = Orange.evaluation.scoring.CA(results)

详细了解这些属性 Hadoop: Capacity Scheduler - Queue Properties

Answer 2

请根据您的内存资源考虑以下情况

容器数量取决于blocksize的数量。如果您有2 GB块大小的2 GB数据，Yarn会创建4个地图，1个减少。在运行mapreduce时，我们应该遵循一些规则来提交mapreduce作业。（这应该适用于小型集群）

您应该在RAM DISK和CORES中配置以下属性。

<property>
    <description>The minimum allocation for every container request at the RM,
    in MBs. Memory requests lower than this won't take effect,
    and the specified value will get allocated at minimum.</description>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>512</value>
  </property>

  <property>
    <description>The maximum allocation for every container request at the RM,
    in MBs. Memory requests higher than this won't take effect,
    and will get capped to this value.</description>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>2048</value>
  </property>


 <property>
  <name>yarn.nodemanager.resource.memory-mb</name>
  <value>2048</value>
 </property>

根据内存资源设置Java堆大小。根据mapreduce确保使用yarn-site.xml中的上述属性将有效地成功。

Oozie的工作陷入了运行状态

2 个答案: