如何在oozie中为并行作业单独处理错误清理

时间:2014-10-14 19:38:51

标签: parallel-processing oozie

我必须在oozie中运行一组并行作业,我可以使用oozie中的fork选项运行。 现在我面临的问题是,如果一个作业失败,其余的作业也会失败,因为我在每个作业都在调用kill控制节点时出错。 我在网上搜索了很多,但我无法找到如何为每一项工作单独处理错误清理。

任何帮助都将不胜感激。

我的workflow.xml如下:

<workflow-app name="WorkFlowForSshAction" xmlns="uri:oozie:workflow:0.1">
<start to="copyfroms3tohdfs"/>
<action name="copyfroms3tohdfs">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${s3tohdfsscript}</command>
<capture-output/>
</ssh>
<ok to="createhivetables"/>
<error to="killAction"/>
</action>


<action name="createhivetables">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${createhivetablesscript}</command>
<capture-output/>
</ssh>
<ok to="gold__pos_denorm_trn_itm_offr"/>
<error to="killAction"/>
</action>
<action name="gold__pos_denorm_trn_itm_offr">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${denormalizationscript}</command>
<capture-output/>
</ssh>
<ok to="forknode"/>
<error to="killAction"/>
</action>
<fork name="forknode">
        <path start="gold__dypt_pos_trn_offr"/>
        <path start="gold__hr_pos_trn_offr"/>
                <path start="approach3"/>
                <path start="aproach11"/>
                <path start="aproach12"/>
                <path start="aproach13"/>
                <path start="aproach14"/>
                <path start="aproach15"/>
                <path start="aproach16"/>
                <path start="aproach17"/>

</fork>
<action name="gold__dypt_pos_trn_offr">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${daypartscript}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="gold__hr_pos_trn_offr">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${hourscript}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="approach3">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${approach3script}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="aproach11">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${approach11script}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="aproach12">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${approach12script}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="aproach13">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${approach13script}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="aproach14">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${approach14script}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="aproach15">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${approach15script}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="aproach16">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${approach16script}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<action name="aproach17">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${CMNodeLogin}</host>
<command>${approach17script}</command>
<capture-output/>
</ssh>
<ok to="joinnode"/>
<error to="killAction"/>
</action>
<join name="joinnode" to="end"/>
<kill name="killAction">
<message>"Killed job due to error"</message>
</kill>
<end name="end"/>
</workflow-app>

1 个答案:

答案 0 :(得分:2)

创建一个新节点(主要是java),它将为您执行清理活动。还将所有“错误到”操作路由到此新节点。您将能够使用EL函数识别实际导致错误的节点 - $ {wf:lastErrorNode()} 。将此作为清理处理节点的参数传递,以便在java内部,您可以执行任何您希望清理的逻辑(使用java hdfs API)。

新节点如下:

<action name="myCleanUpAction">
<java>
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <main-class>com.foo.CleanUpMain</main-class>
        <arg>${wf:lastErrorNode()}</arg>
        <arg>any useful argument1</arg>
        <arg>any useful argument2</arg>
    </java>
    <ok to="fail"/>
    <error to="fail"/>
</action>