使用Oozie在Shell动作中创建/运行Sqoop作业

时间:2018-12-20 13:15:54

标签: hadoop hive sqoop oozie hive-metastore

我正在尝试使用Oozie在Shell脚本中运行Sqoop作业。 请注意,我在本地计算机(具有12G RAM的vm)中使用cdh5,并使用HUE来构建工作流程。

我创建了一个Sqoop作业,该作业在Shell脚本中从Mysql提取数据到HDFS。然后使用Oozie运行它:

sqoop job --create testmetastore --meta-connect jdbc:hsqldb:hsql://localhost:16000/sqoop -- import --connect jdbc:mysql://localhost:3306/retail_db --table EMPLOYEE --username root --password cloudera --target-dir hdfs://localhost:8020/user/cloudera/EMPLOYEES -m 1

工作已创建(我在执行cmd列表时可以找到它,请参见下面):

sqoop job --list --meta-connect jdbc:hsqldb:hsql://localhost:16000/sqoop

sqoop job --list

我什至在终端上执行了该作业,并且该作业有效(请参见下面的cmd):

sqoop job --meta-connect jdbc:hsqldb:hsql://localhost:16000/sqoop --exec testmetastore

我删除了最后一个导入的文件夹,并试图再次重新执行作业,但这一次是在Oozie中。它给出了一个错误(见下文):

18/12/23 10:47:06 INFO mapreduce.ImportJobBase: counters are unavailable. To get this information, 
18/12/23 10:47:06 INFO mapreduce.ImportJobBase: you will need to enable the completed job store on 
18/12/23 10:47:06 INFO mapreduce.ImportJobBase: the jobtracker with:
18/12/23 10:47:06 INFO mapreduce.ImportJobBase: mapreduce.jobtracker.persist.jobstatus.active = true
18/12/23 10:47:06 INFO mapreduce.ImportJobBase: mapreduce.jobtracker.persist.jobstatus.hours = 1
18/12/23 10:47:06 INFO mapreduce.ImportJobBase: A jobtracker restart is required for these settings
18/12/23 10:47:06 INFO mapreduce.ImportJobBase: to take effect.
18/12/23 10:47:06 ERROR tool.ImportTool: Import failed: Import job failed!
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]

这是sqoop-site.xml:

          <property>
    <name>sqoop.metastore.client.enable.autoconnect</name>
    <value>false</value>
    <description>If true, Sqoop will connect to a local metastore
      for job management when no other metastore arguments are
      provided.
    </description>
  </property>

  <property>
    <name>sqoop.metastore.client.autoconnect.url</name>
    <value>jdbc:hsqldb:hsql://localhost:16000</value>
    <description>The connect string to use when connecting to a
      job-management metastore. If unspecified, uses ~/.sqoop/.
      You can specify a different path here.
    </description>
  </property>
  <property>
    <name>sqoop.metastore.client.autoconnect.username</name>
    <value>SA</value>
    <description>The username to bind to the metastore.
    </description>
  </property>
  <property>
    <name>sqoop.metastore.client.autoconnect.password</name>
    <value></value>
    <description>The password to bind to the metastore.
    </description>
  </property>

  <property>
    <name>sqoop.metastore.client.record.password</name>
    <value>true</value>
    <description>If true, allow saved passwords in the metastore.
    </description>
  </property>

  <property>
    <name>sqoop.metastore.server.location</name>
    <value>/tmp/sqoop-metastore/shared.db</value>
    <description>Path to the shared metastore database files.
    If this is not set, it will be placed in ~/.sqoop/.
    </description>
  </property>

  <property>
    <name>sqoop.metastore.server.port</name>
    <value>16000</value>
    <description>Port that this metastore should listen on.
    </description>
  </property>

这是工作流程.xml:

<workflow-app name="MyWorkflow" xmlns="uri:oozie:workflow:0.5">
<start to="shell-7268"/>
<kill name="Kill">
    <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="shell-7268">
    <shell xmlns="uri:oozie:shell-action:0.1">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <exec>/user/cloudera/sqpexec.sh</exec>
        <file>/user/cloudera/sqpexec.sh#sqpexec.sh</file>
        <file>/user/cloudera/sqoop-site.xml#sqoop-site.xml</file>
          <capture-output/>
    </shell>
    <ok to="End"/>
    <error to="Kill"/>
</action>
<end name="End"/>

以下是HUE的视图: workflow in HUE

我之所以尝试在Oozie的shell脚本中运行一个sqoop作业,是因为我想创建一个sqoop作业来重新导入最后导入的值,以便它可以递增表,然后使用Oozie对其进行调度。因此,这只是第一步测试!

可以帮忙吗? 谢谢,

0 个答案:

没有答案