从命令行获取oozie sqoop操作的日志

时间:2017-08-19 11:26:14

标签: hadoop command-line-interface cloudera sqoop oozie

我正在使用sqoop操作的oozie工作流程,我希望从命令行获取这些sqoop操作的日志(stdout)。

这是我认为可行的:

oozie job -info 0000002-170819062150496-oozie-oozi-W

打印操作的job_id:

------------------------------------------------------------------------------------------------------------------------------------
0000002-170819062150496-oozie-oozi-W@sqoop-import-shard_1                     OK        job_1503124774831_0013 SUCCEEDED  -         
------------------------------------------------------------------------------------------------------------------------------------
0000002-170819062150496-oozie-oozi-W@sqoop-import-shard_2                     OK        job_1503124774831_0014 SUCCEEDED  -         
------------------------------------------------------------------------------------------------------------------------------------

然后我用:

mapred job -logs job_1503124774831_0013

但是这只给出了mapreduce日志。我可以从Hue看到的Sqoop stdout日志,如处理的记录数, - last-value(在增量的情况下)不可用。

有办法获得它们吗?

编辑:添加sqoop配置和工作流文件。

sqoop_import_config.txt

import
--connect
${connect}
--username
${username}
--password
${pwd}
--hive-delims-replacement
\001
--fields-terminated-by
\003
--null-string
\\N
--null-non-string
\\N
--target-dir
/data/${table}/${shard}
--query
SELECT ${columns} from ${table} WHERE $CONDITIONS
--split-by
id
--boundary-query
select min(id), max(id) from ${table}
--m
${numMappers}
--incremental
lastmodified
--last-value
${lastValue}
--check-column
updated_at
--merge-key
id

workflow.xml

<workflow-app name="${tableName}_${type}_Sqoop" xmlns="uri:oozie:workflow:0.5">
    <credentials>
        <credential name="hive2" type="hive2">
            <property>
                <name>hive2.jdbc.url</name>
                <value>${hive2JdbcUrl}</value>
            </property>
            <property>
                <name>hive2.server.principal</name>
                <value>${hive2MetastorePrincipal}</value>
            </property>
        </credential>
    </credentials>
    <start to="sqoop-import-fork"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <fork name="sqoop-import-fork">
    <path start="sqoop-import-shard_1"/>
    <path start="sqoop-import-shard_2"/></fork>

    <action name="sqoop-import-shard_1"> <sqoop xmlns="uri:oozie:sqoop-action:0.2"><arg>--options-file</arg> <arg>${tableName}_shard_1_import.txt</arg></sqoop> <ok to="sqoop-import-join"/> <error to="email-b1f2"/> </action>
    <action name="sqoop-import-shard_2"> <sqoop xmlns="uri:oozie:sqoop-action:0.2"><arg>--options-file</arg> <arg>${tableName}_shard_2_import.txt</arg></sqoop> <ok to="sqoop-import-join"/> <error to="email-b1f2"/> </action>
    <action name="move-data" cred="hive2">
    . . .

1 个答案:

答案 0 :(得分:0)

在查看public void printFields(Object obj) throws Exception { Class<?> objClass = obj.getClass(); Field[] fields = objClass.getFields(); for(Field field : fields) { String name = field.getName(); Object value = field.get(obj); System.out.println(name + ": " + value.toString()); } } 之后找到了获取stdout日志的方法。

使用以下方式从oozie收到job_id之后:

mapred job help

使用以下方式获取作业的尝试列表:

oozie job -info 0000002-170819062150496-oozie-oozi-W

  • 任务类型的有效值为REDUCE MAP。
  • 任务状态的有效值正在运行,已完成

所以我跑了:mapred job -list-attempt-ids <job-id> <task-type> <task-state>给了我mapred job -list-attempt-ids job_1503124774831_0022 MAP completed

现在,我能够使用以下方式获取具有sqoop数据的尝试日志:

attempt_1503124774831_0022_m_000000_0